Skip to content

rigvedrs/agentguard

Repository files navigation

🛡️ agentguard

Runtime budget control and tool-call reliability for AI agents

PyPI version MIT License Python 3.10+ CI Tests PyPI wheel


New to agentguard? Start with the guided onboarding site:

Open the project website (agentguard.site)

For Docs:

Open the documentation

AI agents overspend, call tools with wrong parameters, and trust broken tool responses. agentguard is a lightweight Python runtime that keeps agent runs inside budget and makes tool calls trustworthy with spend caps, response verification, validation, retries, and tracing.

Works with OpenAI, Anthropic, OpenRouter, Groq, Together AI, Fireworks AI, LangChain, MCP, or any Python function. Only dependency: pydantic.

from agentguard import guard

@guard(validate_input=True, verify_response=True, max_retries=3)
def search_web(query: str) -> dict:
    return requests.get(f"https://api.search.com?q={query}").json()

What agentguard is for

agentguard has two core jobs:

  1. Keep agent runs inside budget with per-call, per-session, and shared multi-agent spend controls.
  2. Make tool calls trustworthy with input validation, output validation, response verification, and execution safeguards.

Everything else in the library supports those two outcomes: retries, circuit breakers, rate limits, tracing, telemetry, benchmarking, and generated tests.

Why teams reach for agentguard

Problem How AI agents fail today How agentguard fixes it
Cost spirals & runaway spending One prompt change, retry loop, or model escalation causes a surprise bill Per-call and per-session budget enforcement, real usage-based LLM spend tracking, and shared multi-agent budget pools
Malformed tool responses Tool returns missing fields, schema drift, or anomalous values — no error raised Multi-signal response verification (timing, schema, patterns, statistical anomalies)
Invalid tool parameters Agent passes wrong types or missing fields Automatic input/output validation from Python type hints + Pydantic schemas
Cascading failures One failing tool takes down the entire agent Circuit breakers with CLOSED → OPEN → HALF_OPEN state machine
API rate limit violations Agent exceeds rate limits, gets blocked Token bucket rate limiting (per-second, per-minute, per-hour)
No regression tests 40% of agent projects fail with no test suite Auto-generate pytest tests from production traces
Framework lock-in Each LLM framework has its own observability Framework-agnostic — works with OpenAI, Anthropic, LangChain, MCP, or raw functions

The Problem in Numbers

  • 82.6% of Stack Overflow questions about AI agents have no accepted answer (arXiv)
  • 40-95% of agent projects fail between prototype and production
  • 0 widely-adopted open-source libraries focused on runtime tool response verification in Python agents

Install agentguard

pip install awesome-agentguard

With optional integrations:

pip install awesome-agentguard[all]        # OpenAI + Anthropic + LangChain integrations
pip install awesome-agentguard[costs]      # LiteLLM-backed real LLM cost tracking
pip install awesome-agentguard[rich]       # Colour terminal output
pip install awesome-agentguard[dashboard]  # Local trace dashboard extras

Requirements: Python 3.10+ · Only dependency: pydantic>=2.0

Quick Start

1. Put a hard cap on agent spend

Use TokenBudget when you want the run to stop before a retry loop or model change burns money:

import os
from agentguard import TokenBudget
from agentguard.integrations import guard_openai_client
from openai import OpenAI

budget = TokenBudget(
    max_cost_per_session=5.00,
    max_calls_per_session=100,
    alert_threshold=0.80,
)

client = guard_openai_client(
    OpenAI(api_key=os.getenv("OPENAI_API_KEY")),
    budget=budget,
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarise this document"}],
)

print(budget.session_spend)

2. Guard tool calls with one decorator

from agentguard import guard

@guard
def get_weather(city: str) -> dict:
    """Every call is now traced, timed, and validated."""
    return {"temperature": 72, "city": city}

result = get_weather("NYC")

3. Verify tool responses against expected contracts

Detect when a tool response violates what you've defined as normal — anomalous execution timing, missing required fields, pattern mismatches, or statistically unusual values. Useful for catching schema drift, API contract changes, integration bugs, and misconfigured mocks.

from agentguard import ResponseVerifier

verifier = ResponseVerifier(threshold=0.6)

# Register what normal responses look like for this tool
verifier.register_tool(
    "get_weather",
    expected_latency_ms=(100, 5000),        # Real API: 100ms–5s
    required_fields=["temperature", "humidity"],
    response_patterns=[r'"temperature":\s*-?\d+'],
)

# Check a response that came back suspiciously fast and incomplete
result = verifier.verify(
    tool_name="get_weather",
    execution_time_ms=0.3,                  # 0.3ms — no network call happened
    response={"temperature": 72, "conditions": "sunny"},
)

print(result.is_anomalous)   # True  (missing "humidity", sub-ms timing)
print(result.confidence)     # 0.95
print(result.reason)         # "Execution time 0.30ms is below the 2ms minimum for real I/O..."

4. Production-ready protection for tool execution

from agentguard import guard, CircuitBreaker, TokenBudget, RateLimiter

@guard(
    validate_input=True,
    validate_output=True,
    verify_response=True,       # checks timing, schema, patterns
    max_retries=3,
    timeout=30.0,
    budget=TokenBudget(
        max_cost_per_session=5.00,
        max_calls_per_session=100,
        alert_threshold=0.80,
    ).config,
    circuit_breaker=CircuitBreaker(
        failure_threshold=5,
        recovery_timeout=60,
    ).config,
    rate_limit=RateLimiter(
        calls_per_minute=30,
    ).config,
    record=True,  # Save traces for test generation
)
def query_database(sql: str, limit: int = 100) -> list[dict]:
    return db.execute(sql, limit=limit)

5. Auto-generate pytest tests from production traces

Record real agent executions, then auto-generate a pytest test suite for regression testing:

from agentguard import TraceRecorder, record_session
from agentguard.testing import TestGenerator

# Record during production
with record_session("./traces", backend="sqlite"):
    result = query_database("SELECT * FROM users LIMIT 10")
    result = get_weather("San Francisco")

# Generate test file
generator = TestGenerator(traces_dir="./traces")
generator.generate_tests(output="tests/test_generated.py")

By default, SQLite-backed recording writes to ./traces/agentguard_traces.db. Use trace_backend="jsonl" when you need legacy file-per-session traces.

Generated test file:

"""Auto-generated test suite from agentguard production traces."""

def test_query_database_0():
    """Recorded: query_database('SELECT * FROM users LIMIT 10', limit=100)"""
    result = query_database("SELECT * FROM users LIMIT 10", limit=100)
    assert isinstance(result, list)

def test_get_weather_0():
    """Recorded: get_weather('San Francisco')"""
    result = get_weather("San Francisco")
    assert isinstance(result, dict)
    assert "temperature" in result

6. Fluent test assertions for agent tool calls

from agentguard import assert_tool_call

# Build assertions on recorded trace entries
assert_tool_call(entry).succeeded().within_ms(5000).returned_dict().has_keys("temperature", "humidity")

7. Replay and diff agent traces

from agentguard.testing import TraceReplayer

replayer = TraceReplayer(traces_dir="./traces")
results = replayer.replay_all(tools={"get_weather": get_weather})

for r in results:
    print(f"{r['tool_name']}: {'PASS' if r['match'] else 'FAIL'}")

LLM Framework Integrations

Any OpenAI-Compatible Provider (OpenRouter, Groq, Together, Fireworks, etc.)

agentguard works with any OpenAI-compatible API out of the box. One integration covers 10+ providers:

from openai import OpenAI
from agentguard.integrations import guard_tools, Providers

# Same tools work across ALL providers — just change the provider
executor = guard_tools([search_web, get_weather])

# OpenRouter (300+ models)
client = OpenAI(base_url="https://openrouter.ai/api/v1", api_key=os.getenv("OPENROUTER_API_KEY"))

# Groq (ultra-low latency)
client = OpenAI(base_url="https://api.groq.com/openai/v1", api_key=os.getenv("GROQ_API_KEY"))

# Together AI, Fireworks, DeepInfra, Mistral, xAI — same pattern
client = OpenAI(**Providers.TOGETHER.client_kwargs())

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    tools=executor.tools,
    messages=[{"role": "user", "content": "Search for Python tutorials"}],
)
results = executor.execute_all(response.choices[0].message.tool_calls)

Built-in provider presets: OpenAI, OpenRouter, Groq, Together AI, Fireworks AI, DeepInfra, Mistral, Perplexity, Novita AI, xAI — or define your own with Provider(name=..., base_url=..., env_key=...).

OpenAI Function Calling with Guardrails

from agentguard.integrations import guard_openai_tools, OpenAIToolExecutor

# Wrap your tools for OpenAI function calling
executor = OpenAIToolExecutor()
executor.register(search_web).register(get_weather)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=executor.tools,
)

# Execute all tool calls with guards
results = executor.execute_all(response.choices[0].message.tool_calls)

Real LLM Cost Tracking

Wrap supported provider clients to record real token usage and pricing directly from API responses:

import os
from openai import OpenAI

from agentguard import InMemoryCostLedger, TokenBudget
from agentguard.integrations import guard_openai_client

budget = TokenBudget(max_cost_per_session=5.00, max_calls_per_session=100)
budget.config.cost_ledger = InMemoryCostLedger()

client = guard_openai_client(
    OpenAI(api_key=os.getenv("OPENAI_API_KEY")),
    budget=budget,
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarise this page"}],
)

print(budget.session_spend)

Pricing resolution order:

  1. model_pricing_overrides
  2. LiteLLM pricing data
  3. explicit cost_per_call
  4. otherwise usage is tracked and cost is marked unknown

Anthropic Claude Tool Use with Guardrails

from agentguard.integrations import guard_anthropic_tools, AnthropicToolExecutor

tools = guard_anthropic_tools([search_web, get_weather])
executor = AnthropicToolExecutor({"search_web": search_web, "get_weather": get_weather})

LangChain Agent Tool Validation

from agentguard.integrations import GuardedLangChainTool, guard_langchain_tools

# Wrap existing LangChain tools
guarded = guard_langchain_tools([my_search_tool, my_db_tool])

MCP (Model Context Protocol) Server Guards

from agentguard.integrations import GuardedMCPServer

# Wrap an MCP server with guards
guarded_server = GuardedMCPServer(original_server, guards={
    "search": {"validate_input": True, "max_retries": 2},
    "database_query": {"budget": budget_config, "circuit_breaker": cb_config},
})

Architecture — How agentguard Protects AI Agent Tool Calls

┌──────────────────────────────────────────────────────────┐
│                     Your AI Agent                         │
│              (OpenAI / Anthropic / LangChain / etc.)      │
└──────────────────────┬───────────────────────────────────┘
                       │ tool call
                       ▼
┌──────────────────────────────────────────────────────────┐
│                    @guard decorator                       │
│                                                          │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐     │
│  │   Circuit    │  │    Rate     │  │   Budget    │     │
│  │   Breaker    │  │   Limiter   │  │  Enforcer   │     │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘     │
│         │                │                │              │
│         ▼                ▼                ▼              │
│  ┌─────────────────────────────────────────────┐        │
│  │           Input Validation                   │        │
│  │      (type hints + Pydantic schemas)         │        │
│  └─────────────────────┬───────────────────────┘        │
│                        │                                 │
│                        ▼                                 │
│  ┌─────────────────────────────────────────────┐        │
│  │      Execute with Retry + Timeout            │        │
│  │      (exponential backoff, jitter)           │        │
│  └─────────────────────┬───────────────────────┘        │
│                        │                                 │
│                        ▼                                 │
│  ┌─────────────────────────────────────────────┐        │
│  │       Response Verification                  │        │
│  │  (timing, schema, patterns, anomaly score)   │        │
│  └─────────────────────┬───────────────────────┘        │
│                        │                                 │
│                        ▼                                 │
│  ┌─────────────────────────────────────────────┐        │
│  │        Output Validation                     │        │
│  └─────────────────────┬───────────────────────┘        │
│                        │                                 │
│                        ▼                                 │
│  ┌─────────────────────────────────────────────┐        │
│  │    Trace Recording → Test Generation         │        │
│  └─────────────────────────────────────────────┘        │
└──────────────────────────────────────────────────────────┘
                       │
                       ▼
              Your actual tool

CLI — Inspect Agent Traces and Generate Tests

# Initialize a SQLite trace store
agentguard traces init ./traces

# List recorded agent traces
agentguard traces list ./traces

# Show trace details for a session
agentguard traces show agent_run_001 ./traces

# Get latency and failure statistics
agentguard traces stats ./traces

# Generate JSON report
agentguard traces report ./traces --output report.json

# Import legacy JSONL traces into SQLite
agentguard traces import ./legacy_traces ./traces

# Export traces for replay or offline analysis
agentguard traces export ./traces --output-dir ./trace-export

# Run the local dashboard
agentguard traces serve ./traces --port 8765

# Auto-generate pytest test suite from traces
agentguard generate ./traces --output tests/test_generated.py

Full API Reference

Core — Guard Decorator and Tool Registry

Component Description
@guard Decorator that wraps any Python function with the full protection stack
GuardConfig Configuration dataclass for all guard options
GuardedTool The wrapper class created by @guard
ToolRegistry Global registry for tool discovery, stats, and health checks

Validators — Response Verification and Schema Validation

Component Description
ResponseVerifier Multi-signal response anomaly detection: timing, schema, patterns, statistical values
SchemaValidator Automatic type-hint and Pydantic-based input/output validation
SemanticValidator Register custom semantic validation checks per tool
CustomValidator Compose arbitrary validation functions into the pipeline

Guardrails — Circuit Breaker, Rate Limiter, Budget Control

Component Description
CircuitBreaker CLOSED → OPEN → HALF_OPEN state machine to prevent cascading failures
RateLimiter Token bucket with per-second/minute/hour rate limiting
TokenBudget Per-call and per-session cost and call-count budget enforcement
RetryPolicy Exponential backoff with jitter and configurable exception filtering
timeout Thread-based (sync) and asyncio (async) timeout enforcement

Testing — Trace Recording and Test Generation

Component Description
TraceRecorder Context manager for recording production agent traces
TraceReplayer Replay recorded traces against live tools to detect regressions
TestGenerator Auto-generate pytest test files from production traces
assert_tool_call() Fluent assertion builder for trace entries

Reporting — Metrics and Observability

Component Description
ConsoleReporter Rich-powered colour terminal tables
JsonReporter JSON reports with latency percentiles and anomaly detection

Comparison with Other AI Agent Safety Tools

Feature agentguard guardrails-ai NeMo Guardrails AgentCircuit Langfuse LangSmith
Response anomaly detection ✅ Multi-signal ❌ Text-only
Tool call input/output validation ✅ Type hints + Pydantic ✅ Validators ✅ Pydantic
Framework-agnostic ✅ Any function ❌ LangChain-first
Circuit breaker
Rate limiting
Budget enforcement ✅ Per-call + session ✅ Global Token tracking Token tracking
Auto test generation ✅ From traces
Zero dependencies* ✅ pydantic only ❌ Many ❌ NVIDIA stack
Self-hosted
Open source ✅ MIT ✅ Apache ✅ MIT

*Core library requires only pydantic>=2.0. No NVIDIA dependencies, no cloud services, no API keys needed.

Who Is This For?

  • AI/ML Engineers building production agent systems with OpenAI, Anthropic, or open-source LLMs
  • Backend Developers adding LLM-powered features who need reliability guarantees
  • Platform Teams managing multi-agent deployments with cost and safety concerns
  • Researchers studying agent reliability, response integrity, and tool-call verification

Contributing

See CONTRIBUTING.md for development setup and guidelines.

git clone https://github.com/rigvedrs/agentguard.git
cd agentguard
pip install -e ".[dev]"
pytest

License

MIT — see LICENSE for details.


Stop trusting your AI agents blindly.
⭐ Star on GitHub · Install from PyPI · Report a Bug · Request a Feature