Stratix Python SDK

Ship AI that actually works. Evaluate 200+ models across 100+ benchmarks, trace agent behavior, build custom judges, and gate CI/CD on eval results.

Install · Quick Start · Compare · Docs · Examples · Discord

Why Stratix?

Stratix is built differently. It gives you production-grade evaluation infrastructure out of the box: rich public benchmarks, powerful custom judges, full agent trace analysis, playback, bulk evaluation, and CI/CD gates.

What makes it click:

200+ models and 100+ benchmarks, ready to query. No scraping leaderboards, no CSV wrangling. pc.models.get() and you're looking at real evaluation data.
Prompt-level comparisons. Not just "Model A scores 82%." You get the exact prompts where Model A passes and Model B fails, with outcome filters to find the interesting divergences.
A 4-generation eval ladder. Start with heuristic checks, graduate to model-graded scoring, add deliberation panels, then build auto-optimized GEPA judges. One SDK covers the full spectrum.
Agent trace evaluation. Upload a multi-step agent trace, replay it, and judge every step. Built for the world where agents do real work.
CI/CD eval gates. layerlens ci run --threshold 0.8 in your pipeline. Non-zero exit on regression. No custom scripts needed.

How Stratix Compares

Capability	Stratix	LangSmith	Langfuse	DeepEval	Phoenix (Arize)
Pre-built benchmarks	100+ benchmarks, 200+ models	No public benchmarks	No public benchmarks	~14 metrics	Bring your own
Prompt-level comparison	Native head-to-head with outcome filters	Side-by-side runs (manual)	Not built-in	Manual setup	Not built-in
Custom judge builder	Auto-optimized GEPA judges with budget control	LLM-as-judge (manual)	LLM-as-judge (manual)	Basic LLM judges	LLM-as-judge templates
Agent trace evaluation	Upload, replay, judge every step	Trace logging + annotation	Trace logging + scoring	Trace logging only	Trace visualization
Eval generation ladder	Heuristic > model-graded > deliberation > GEPA	Single generation	Single generation	Single generation	Single generation
CI/CD eval gate	`layerlens ci run` with threshold	Custom integration	Custom integration	`deepeval test`	Manual integration
Evaluation Spaces	Collaborative eval environments	Hub (paid)	Not available	Not available	Not available
Dataset versioning	Pin evals to versions, diff between runs	Dataset management	Not built-in	Basic support	Dataset management
OpenTelemetry export	Native OTLP exporter	Not built-in	Native OTLP	Not built-in	Native (OpenInference)
Pricing model	Free public data; premium for org features	Per-trace pricing	Per-event pricing	Open source + cloud	Open source + cloud

Installation

# Recommended (includes CLI, rich output, and examples)
pip install layerlens[cli]

Note: During early access the package is hosted on a private index. Use:
pip install --extra-index-url https://sdk.layerlens.ai/package layerlens[cli]

Quick Start

Easiest way — use the one-command template:

stratix init my-first-eval
cd my-first-eval
python main.py

Or wire it up yourself in Python:

from layerlens import PublicClient, Stratix

# Public data (models, benchmarks, evaluations)
pc = PublicClient(api_key="your-api-key")

models = pc.models.get(page_size=200)
print(f"{models.total_count} models available")

# Compare two models head-to-head at prompt level
comparison = pc.comparisons.compare_models(
    benchmark_id="benchmark-id",
    model_id_1="model-a",
    model_id_2="model-b",
    outcome_filter="comparison_fails",  # where model B fails
)

# Premium features (traces, judges, scorers)
client = Stratix(api_key="your-api-key")

# Upload and evaluate an agent trace
client.traces.upload("trace.json")
eval_result = client.trace_evaluations.create(
    trace_id="trace-id",
    judge_id="judge-id",
)

CLI

The SDK ships with a full CLI for managing evaluations from your terminal or CI pipeline:

# Set your API key
export LAYERLENS_STRATIX_API_KEY="your-api-key"

# List traces
layerlens trace list

# Run a judge evaluation
layerlens judge run --judge-id <id> --trace-id <id>

# Evaluate in CI mode (exits non-zero on failure)
layerlens ci run --judge-id <id> --trace-id <id> --threshold 0.8

Architecture

layerlens/
  _client.py          # Stratix (premium) client
  _public_client.py   # PublicClient (open data)
  cli/                # Click-based CLI with rich output
    commands/         # trace, judge, evaluate, scorer, space, bulk, ci
  models/             # Pydantic response models
  resources/          # API resource implementations
  contrib/
    rich_output.py    # Rich terminal tables & progress bars
    otel.py           # OpenTelemetry integration
    tracing.py        # @stratix.trace decorator
    datasets.py       # Dataset versioning & diffs
    error_suggestions.py  # Context-aware error messages

Examples

See the examples/ directory for integration patterns:

Example	Description
LangGraph	Trace and evaluate a LangGraph agent
CrewAI	Evaluate CrewAI multi-agent workflows
AutoGen	Instrument AutoGen conversations
CI/CD Gate	Block deploys on eval regression
Custom Judge	Build and optimize a domain judge
Prompt Playground	Compare prompt variations side-by-side

Used By

Stratix powers evaluation workflows at LayerLens and across teams building production AI systems. The public benchmark data is queried thousands of times per week via the SDK and stratix.layerlens.ai.

If your team uses Stratix, open a PR to add your logo here.

Join the Community

The LayerLens Discord is the best place to:

Get help with the SDK and trace evaluations
Share your custom judges and agent workflows
Access free Stratix Premium Credits for active contributors
Join weekly Eval Office Hours & model comparison discussions
Influence the roadmap

Join the LayerLens Discord!

Documentation

Full documentation is available at layerlens.gitbook.io/stratix-python-sdk.

To build docs locally:

pip install layerlens[docs]
mkdocs serve

Contributing

Contributions are welcome. See CONTRIBUTING.md for guidelines.

Security

To report a vulnerability, see SECURITY.md.

License

Apache 2.0. See LICENSE.

Next Steps

Get started in under 2 minutes:

pip install --extra-index-url https://sdk.layerlens.ai/package layerlens[cli]
stratix init my-first-eval
cd my-first-eval && python main.py

Then explore the Quick Start guide, try a cookbook recipe, or join the Discord to ask questions and share what you're building.

⭐ Star us if you found this useful! ⭐
It helps more developers discover Stratix.

Built by LayerLens · Discord · Twitter

Name		Name	Last commit message	Last commit date
Latest commit History 163 Commits
.github/workflows		.github/workflows
.husky		.husky
assets		assets
docs		docs
samples		samples
scripts		scripts
src		src
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
SUMMARY.md		SUMMARY.md
mypy.ini		mypy.ini
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
requirements-dev.lock		requirements-dev.lock
requirements.lock		requirements.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stratix Python SDK

Why Stratix?

How Stratix Compares

Installation

Quick Start

CLI

Architecture

Examples

Used By

Join the Community

Documentation

Contributing

Security

License

Next Steps

About

Uh oh!

Releases 13

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stratix Python SDK

Why Stratix?

How Stratix Compares

Installation

Quick Start

CLI

Architecture

Examples

Used By

Join the Community

Documentation

Contributing

Security

License

Next Steps

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 13

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages