langgraph-plan-execute-validate (PEV)

The missing third node in LangGraph Plan → Execute → Validate (PEV) workflows — a structured Validator with confidence scoring, per-step retry, and automatic replanning.

Why this template exists

I built and deployed multi-agent platforms in production inside a regulated life sciences environment — systems that automate scientific workflows, search tens of millions of research records, and route LLM decisions through quality gates before acting on them.

The standard LangGraph plan-and-execute pattern has two nodes: Plan and Execute. That's fine for demos. In production, we learned quickly that execution quality is not binary — an agent can technically complete a step while producing output that is incomplete, hallucinated, or missing a critical detail. Without a quality gate, those failures propagate silently to the next step.

We added a third node: a Validator that scores every step output (0.0–1.0) against the original intent. Steps below the threshold are retried with the validator's feedback injected into the next attempt. If retries are exhausted, the entire plan is regenerated with the failure context. Nothing moves forward until it passes.

This template packages that pattern. It is extracted from real production agent infrastructure, not assembled from documentation.

What makes this different

Feature	Standard plan-execute	This template
Planning node	✓	✓
Execution node with tool calls	✓	✓
Validation + confidence score	✗	✓ 0.0 – 1.0
Per-step retry with feedback injection	✗	✓ configurable
Automatic replanning on exhausted retries	✗	✓ with failure context
Multi-model cost optimisation	✗	✓ haiku/sonnet split
Full audit trail (every attempt)	✗	✓ operator.add accumulator
Structured outputs (no string parsing)	✗	✓ Pydantic models

Target Audience & Use Cases

This project is built for AI Architects and Senior Engineers moving beyond demo-grade agents into reliable, production-ready autonomous systems.

1. The Production Scaffold (Clone & Build)

Use this as a foundation for new agents. Define your domain-specific tools in PEVConfig, tune the prompts.py for your use case, and deploy as a microservice or MCP server.

2. Reference Architecture (Pattern Extraction)

If you have an existing LangGraph project, use this as a reference implementation to add a Validator + Router quality gate to your current Plan-Execute flows.

3. Quality & Cost Benchmarking

Use the benchmark_reliability.py pattern to test how different validator_model choices (e.g., Haiku vs. Sonnet) affect your "Correction Rate" and total token spend.

Quick Start

git clone https://github.com/ManjunathGovindaraju/langgraph-plan-execute-validate.git
cd langgraph-plan-execute-validate
uv sync
cp .env.example .env   # add your ANTHROPIC_API_KEY

Five lines to run:

from pev import create_pev_graph, initial_state, PEVConfig

graph = create_pev_graph(PEVConfig(pass_threshold=0.85))
result = graph.invoke(initial_state("Research the top 3 vector databases"))

print(result["status"])          # "complete"
print(result["step_results"])    # scored audit trail for every step

Run an example:

python examples/research_agent.py       # web search + validate
python examples/code_review_agent.py    # strict threshold, shows retries
python examples/data_analysis_agent.py  # no tools, LLM reasoning only

Architecture

System Overview

graph TD
    User(["👤 Caller"])

    subgraph PEVGraph["PEV Graph  —  LangGraph StateGraph"]
        direction TB
        Planner["🧠 Planner\nclaude-haiku\nStructured JSON output"]
        Executor["⚙️ Executor\nclaude-sonnet\nTool-call loop"]
        Validator["🔍 Validator\nclaude-haiku\nConfidence score 0–1"]
        Router["🔀 Router\nPure Python — no LLM"]

        Planner --> Executor
        Executor --> Validator
        Validator --> Router
    end

    Tools["🛠️ Tools\n(Tavily, custom, etc.)"]
    LangSmith["📊 LangSmith\nObservability"]

    User -->|"initial_state(task)"| Planner
    Router -->|"score ≥ threshold  next step"| Executor
    Router -->|"retry_count < max"| Executor
    Router -->|"retries exhausted"| Planner
    Router -->|"complete / failed"| User
    Executor <-->|"tool calls"| Tools
    PEVGraph -.->|"traces"| LangSmith

State Machine

stateDiagram-v2
    [*] --> Planning : graph.invoke()
    Planning --> Executing : plan generated
    Executing --> Validating : step complete
    Validating --> Executing : score ≥ threshold · more steps remain
    Validating --> Complete  : score ≥ threshold · final step
    Validating --> Executing : score < threshold · retry_count < max_retries
    Validating --> Planning  : score < threshold · retry_count ≥ max_retries · replan_count < max_replans
    Validating --> Failed    : all limits exhausted
    Complete --> [*]
    Failed   --> [*]

Router Decision Tree

flowchart TD
    A["Router reads score, retry_count,\nreplan_count, current_step_idx"]
    B{"score >= pass_threshold?"}
    C{"Last step in plan?"}
    D["✅ complete\nroute → END"]
    E["➡️ advance idx\nreset retry_count\nroute → executor"]
    F{"retry_count < max_retries?"}
    G["🔁 retry\nincrement retry_count\nroute → executor"]
    H{"replan_count < max_replans?"}
    I["🔄 replan\nroute → planner"]
    J["❌ failed\nset error message\nroute → END"]

    A --> B
    B -->|"yes"| C
    C -->|"yes"| D
    C -->|"no"| E
    B -->|"no"| F
    F -->|"yes"| G
    F -->|"no"| H
    H -->|"yes"| I
    H -->|"no"| J

Request Lifecycle

sequenceDiagram
    actor U  as Caller
    participant P  as Planner (haiku)
    participant E  as Executor (sonnet)
    participant T  as Tools
    participant V  as Validator (haiku)
    participant R  as Router

    U->>P: task
    P-->>E: plan [step1, step2, step3]

    loop Each step
        E->>T: tool_call(args)
        T-->>E: result
        E-->>V: pending_result
        V-->>R: score + feedback
        alt score ≥ threshold
            R-->>E: next step
        else retry available
            R-->>E: retry + feedback injected
        else retries exhausted
            R-->>P: replan + failure context
        else all limits hit
            R-->>U: status=failed
        end
    end

    R-->>U: status=complete · step_results[]

Full-Stack AI Architecture

This platform is designed to work seamlessly with the FastMCP Production Template.

The Brain (PEV): This repository. It handles high-level reasoning, multi-step planning, and quality-controlled execution.
The Hands (MCP): Standardized tools, resources, and prompts exposed via FastMCP.

By separating Orchestration (PEV) from Capabilities (MCP), you can update your business logic and data integrations without redeploying your core agent logic.

from langchain_mcp_adapters.tools import load_mcp_tools
from pev import create_pev_graph, PEVConfig

# Load tools from your FastMCP production server
tools = load_mcp_tools("uv", ["run", "path/to/server.py"])

# Orchestrate them with PEV's quality gates
graph = create_pev_graph(PEVConfig(tools=tools, pass_threshold=0.85))

Configuration

from pev import PEVConfig

cfg = PEVConfig(
    # Model routing — cheap models for bookkeeping, capable for execution
    planner_model   = "claude-haiku-4-5-20251001",   # structured JSON only
    executor_model  = "claude-sonnet-4-6",            # tool calls + reasoning
    validator_model = "claude-haiku-4-5-20251001",   # score + one sentence

    # Quality gate
    pass_threshold = 0.80,    # score ≥ this → step passes

    # Loop guards
    max_retries  = 2,         # retries per step before escalating to replan
    max_replans  = 1,         # full replanning cycles before marking failed

    # Tools available to the executor (planner and validator never see tools)
    tools = [TavilySearchResults(max_results=3)],
)

Parameter	Default	Description
`planner_model`	`claude-haiku-4-5-20251001`	Model for planning — structured output only
`executor_model`	`claude-sonnet-4-6`	Model for execution — tool calls + reasoning
`validator_model`	`claude-haiku-4-5-20251001`	Model for validation — scoring only
`pass_threshold`	`0.80`	Minimum score [0.0–1.0] for a step to pass
`max_retries`	`2`	Retries per step before triggering replan
`max_replans`	`1`	Full replanning cycles before marking failed
`tools`	`[]`	LangChain tools available to the executor

The Audit Trail

Every attempt — including retries — is preserved in step_results via operator.add. Nothing is ever overwritten.

result["step_results"]
# [
#   StepResult(step="Search for X",  score=0.55, attempts=1, feedback="Missing Y"),
#   StepResult(step="Search for X",  score=0.88, attempts=2, feedback="Good."),
#   StepResult(step="Summarise X",   score=0.92, attempts=1, feedback="Complete."),
# ]

This is the operational signal that matters: you can see exactly where the agent struggled, what feedback it received, and how many attempts each step took.

Cost Model

The three-model design cuts per-run cost by ~60–70% vs routing everything through a capable model.

graph LR
    subgraph Cheap["claude-haiku  ~$0.25 / 1M tokens"]
        P["Planner\nJSON output only"]
        V["Validator\nScore + one sentence"]
    end
    subgraph Capable["claude-sonnet  ~$3 / 1M tokens"]
        E["Executor\nTool calls + reasoning"]
    end

Typical cost for a 3-step task: ~$0.01 vs ~$0.027 if using sonnet throughout.

Running Tests

# Unit tests — no API calls, runs in ~5 seconds
uv run pytest tests/ -m "not slow" -v

# Integration tests — requires ANTHROPIC_API_KEY
uv run pytest tests/ -m slow -v

Test coverage:

File	What it tests
`test_planner.py`	First-plan vs replan, state resets, step injection
`test_executor.py`	Context injection, retry feedback, tool-call loop, unknown tools
`test_validator.py`	Score/feedback writing, score clamping, StepResult audit trail
`test_retry_replan.py`	Every router branch — 12 routing scenarios
`test_graph.py`	Config validation, graph compilation, `initial_state`, dispatch edge

Project Structure

langgraph-plan-execute-validate/
├── src/pev/
│   ├── __init__.py          # Public API: create_pev_graph, initial_state, PEVConfig, PEVState
│   ├── graph.py             # StateGraph wiring + router node + _dispatch edge
│   ├── state.py             # PEVState TypedDict, StepResult, Status
│   ├── config.py            # PEVConfig dataclass with validation
│   ├── prompts.py           # All prompt templates (one place, easy to tune)
│   └── nodes/
│       ├── planner.py       # Planner node — structured output, replan-aware
│       ├── executor.py      # Executor node — tool-call loop, context injection
│       └── validator.py     # Validator node — confidence scoring, audit append
├── examples/
│   ├── research_agent.py    # Tavily web search
│   ├── code_review_agent.py # Strict threshold, shows retry flow
│   └── data_analysis_agent.py # No tools, LLM reasoning only
├── tests/
│   ├── conftest.py          # Mock LLM fixtures, state builders
│   ├── test_planner.py
│   ├── test_executor.py
│   ├── test_validator.py
│   ├── test_retry_replan.py # Router decision tree — most critical tests
│   └── test_graph.py        # Compilation, config validation, dispatch
├── docs/
│   └── architecture.md      # Full architecture with 10 Mermaid diagrams
└── .github/workflows/
    └── ci.yml               # Ruff + pytest (unit only) + Codecov

Extending

Custom validator — override the pass logic entirely:

from pev.nodes.validator import make_validator_node
from pev.config import PEVConfig

# The validator prompt is in pev/prompts.py — edit VALIDATOR_SYSTEM to
# change scoring criteria without touching node logic.

Custom tools — any LangChain BaseTool or MCP Tools:

from langchain_mcp_adapters.tools import load_mcp_tools

# Connect to an MCP server (e.g., your FastMCP production server)
mcp_tools = load_mcp_tools("uv", ["run", "server.py"])

cfg = PEVConfig(tools=mcp_tools)

Async execution:

result = await graph.ainvoke(initial_state("Your task"))

Author

Manjunath Govindaraju — Principal Software Engineer with 23 years building production systems. Currently focused on AI platform engineering: multi-agent orchestration, MCP servers at scale, async data pipelines, and enterprise Kubernetes deployments.

LinkedIn · GitHub · fastmcp-production-template

License

MIT — use freely in production, no attribution required.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

langgraph-plan-execute-validate (PEV)

Why this template exists

What makes this different

Target Audience & Use Cases

1. The Production Scaffold (Clone & Build)

2. Reference Architecture (Pattern Extraction)

3. Quality & Cost Benchmarking

Quick Start

Architecture

System Overview

State Machine

Router Decision Tree

Request Lifecycle

Full-Stack AI Architecture

Configuration

The Audit Trail

Cost Model

Running Tests

Project Structure

Extending

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

langgraph-plan-execute-validate (PEV)

Why this template exists

What makes this different

Target Audience & Use Cases

1. The Production Scaffold (Clone & Build)

2. Reference Architecture (Pattern Extraction)

3. Quality & Cost Benchmarking

Quick Start

Architecture

System Overview

State Machine

Router Decision Tree

Request Lifecycle

Full-Stack AI Architecture

Configuration

The Audit Trail

Cost Model

Running Tests

Project Structure

Extending

Author

License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages