Stratix Python SDK

Ship AI that actually works.

Evaluate 200+ models across 100+ benchmarks, trace agent behavior, build custom judges, and gate CI/CD on eval results.

Install · Quick Start · Compare · Docs · Examples · Discord

Why Stratix?

Stratix is built differently. It gives you production-grade evaluation infrastructure out of the box: rich public benchmarks, powerful custom judges, full agent trace analysis, playback, bulk evaluation, and CI/CD gates.

What makes it click:

200+ models and 100+ benchmarks, ready to query. No scraping leaderboards, no CSV wrangling. pc.models.get() and you're looking at real evaluation data.
Prompt-level comparisons. Not just "Model A scores 82%." You get the exact prompts where Model A passes and Model B fails, with outcome filters to find the interesting divergences.
A 4-generation eval ladder. Start with heuristic checks, graduate to model-graded scoring, add deliberation panels, then build auto-optimized GEPA judges. One SDK covers the full spectrum.
Agent trace evaluation. Upload a multi-step agent trace, replay it, and judge every step. Built for the world where agents do real work.
CI/CD eval gates. layerlens ci run --threshold 0.8 in your pipeline. Non-zero exit on regression. No custom scripts needed.

How Stratix Compares

Capability	Stratix	LangSmith	Langfuse	DeepEval	Phoenix (Arize)
Pre-built benchmarks	100+ benchmarks, 200+ models	No public benchmarks	No public benchmarks	50+ metrics	Bring your own
Prompt-level comparison	Native head-to-head with outcome filters	Side-by-side runs (manual)	Side-by-side runs + Playground/Experiments (UI Supported)	Manual setup	Not built-in
Custom judge builder	Auto-optimized GEPA judges with budget control	LLM-as-judge (manual)	LLM-as-judge (manual)	Basic LLM judges	LLM-as-judge templates
Agent trace evaluation	Upload, replay, judge every step	Trace logging + annotation	Trace logging + scoring	Trace logging only	Trace visualization
Eval generation ladder	Heuristic > model-graded > deliberation > GEPA	Single generation	Single generation	Single generation	Single generation
CI/CD eval gate	`layerlens ci run` with threshold	Custom integration	Custom integration	`deepeval test`	Manual integration
Evaluation Spaces	Collaborative eval environments	Hub (paid)	Not available	Not available	Not available
Dataset versioning	Pin evals to versions, diff between runs	Dataset management	Not built-in	Basic support	Dataset management
OpenTelemetry export	Native OTLP exporter	Not built-in	Native OTLP	Not built-in	Native (OpenInference)
Pricing model	Free public data; premium for org features	Per-trace pricing	Per-event pricing	Open source + cloud	Open source + cloud

Pricing

Free to start. PublicClient is free with an API key–query 200+ models, 50+ benchmarks, and run head-to-head comparisons. Advanced features (traces, custom judges, scorers, CI gates) require Stratix Premium. Sign up and purchase credits at app.layerlens.ai.

Installation

Note

layerlens is hosted on a private index during early access. Use the command below — the plain pip install layerlens[cli] will not work yet.

pip install --extra-index-url https://sdk.layerlens.ai/package layerlens[cli]

Quick Start

Note

Two clients, one SDK. Use PublicClient for models, benchmarks, and comparisons. Use Stratix for traces, custom judges, scorers, and CI gates. Both take the same API key.

1. Install

pip install --extra-index-url https://sdk.layerlens.ai/package layerlens[cli]

2. Set your API key

Get a key from app.layerlens.ai → Settings → API Keys.

export LAYERLENS_STRATIX_API_KEY="your-api-key"

3. Run your first comparison

from layerlens import PublicClient

pc = PublicClient()

# List available models
models = pc.models.get(page_size=10)
print(f"{models.total_count} models available")

# Compare two models head-to-head on a benchmark
comparison = pc.comparisons.compare_models(
    benchmark_id="aime2024",
    model_id_1="openai/gpt-4o",
    model_id_2="anthropic/claude-opus-4",
    outcome_filter="comparison_fails",  # prompts where model 2 fails
)

print(comparison)

That's it! You're comparing frontier models on real benchmark data. See full results in the dashboard →

Next steps

Run a custom evaluation ➡️ score your own model on any benchmark
Gate CI/CD on eval results ➡️ layerlens ci run --threshold 0.8 in your pipeline
Upload and evaluate agent traces ➡️ multi-step trace analysis

CLI

The SDK ships with a full CLI for managing evaluations from your terminal or CI pipeline:

# Set your API key
export LAYERLENS_STRATIX_API_KEY="your-api-key"

# List traces
layerlens trace list

# Run a judge evaluation
layerlens judge run --judge-id <id> --trace-id <id>

# Evaluate in CI mode (exits non-zero on failure)
layerlens ci run --judge-id <id> --trace-id <id> --threshold 0.8

Architecture

layerlens/
  _client.py          # Stratix (premium) client
  _public_client.py   # PublicClient (open data)
  cli/                # Click-based CLI with rich output
    commands/         # trace, judge, evaluate, scorer, space, bulk, ci
  models/             # Pydantic response models
  resources/          # API resource implementations
  contrib/
    rich_output.py    # Rich terminal tables & progress bars
    otel.py           # OpenTelemetry integration
    tracing.py        # @stratix.trace decorator
    datasets.py       # Dataset versioning & diffs
    error_suggestions.py  # Context-aware error messages

Samples

The samples/ directory contains 70+ production-ready samples organized by use case. See samples/README.md for the full index.

Category	Description
Core samples	Quickstart, traces, evaluations, judges, async workflows
Industry solutions	Healthcare, financial, legal, government, retail, insurance
CI/CD integration	Quality gates, pre-commit hooks, GitHub Actions workflow
Multi-agent (Cowork)	Generator-Evaluator, Code Review, RAG, Incident Response patterns
Content-type evaluations	Text, brand, and document quality scoring
LLM provider integrations	OpenAI, Anthropic, LangChain tracing and instrumentation
MCP server	Expose LayerLens as tools for Claude, Cursor, and any MCP-compatible assistant
CopilotKit CoAgents	Full-stack LangGraph + generative UI components
Claude Code skills	Slash commands for managing LayerLens from the Claude Code CLI
OpenClaw agent evaluation	Trace, evaluate, and monitor OpenClaw autonomous agents
Sample data	Pre-built traces, test datasets, and industry evaluation data

Used By

Stratix powers evaluation workflows at LayerLens and across teams building production AI systems. The public benchmark data is queried thousands of times per week via the SDK and stratix.layerlens.ai.

If your team uses Stratix, open a PR to add your logo here.

Join the Community

The LayerLens Discord is the best place to:

Get help with the SDK and trace evaluations
Share your custom judges and agent workflows
Access free Stratix Premium Credits for active contributors
Join weekly Eval Office Hours & model comparison discussions
Influence the roadmap

Join the LayerLens Discord!

Documentation

Full documentation is available at layerlens.gitbook.io/stratix-python-sdk.

To build docs locally:

pip install layerlens[docs]
mkdocs serve

Contributing

Contributions are welcome. See CONTRIBUTING.md for guidelines.

Security

To report a vulnerability, see SECURITY.md.

License

Apache 2.0. See LICENSE.

Next Steps

Get started in under 2 minutes:

pip install --extra-index-url https://sdk.layerlens.ai/package "layerlens[cli]"
export LAYERLENS_STRATIX_API_KEY="your-api-key"
python3 -c "from layerlens import PublicClient; pc = PublicClient(); print(pc.models.get(page_size=5))"

Then explore the Quick Start guide, try a cookbook recipe, or join the Discord to ask questions and share what you're building.

⭐ Star us if you found this useful! ⭐
It helps more developers discover Stratix.

Built by LayerLens · Discord · Twitter

Name		Name	Last commit message	Last commit date
Latest commit History 173 Commits
.github/workflows		.github/workflows
.husky		.husky
assets		assets
docs		docs
samples		samples
scripts		scripts
src		src
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
SUMMARY.md		SUMMARY.md
demo-stratix.gif		demo-stratix.gif
mypy.ini		mypy.ini
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
requirements-dev.lock		requirements-dev.lock
requirements.lock		requirements.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stratix Python SDK

Why Stratix?

How Stratix Compares

Pricing

Installation

Quick Start

1. Install

2. Set your API key

3. Run your first comparison

Next steps

CLI

Architecture

Samples

Used By

Join the Community

Documentation

Contributing

Security

License

Next Steps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stratix Python SDK

Why Stratix?

How Stratix Compares

Pricing

Installation

Quick Start

1. Install

2. Set your API key

3. Run your first comparison

Next steps

CLI

Architecture

Samples

Used By

Join the Community

Documentation

Contributing

Security

License

Next Steps

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages