Skip to content

LayerLens/stratix-python

LayerLens

Stratix Python SDK

Ship AI that actually works. Evaluate 200+ models across 100+ benchmarks, trace agent behavior, build custom judges, and gate CI/CD on eval results.

PyPI Python GitHub Stars CI Coverage License Discord

Install · Quick Start · Compare · Docs · Examples · Discord


Why Stratix?

Stratix is built differently. It gives you production-grade evaluation infrastructure out of the box: rich public benchmarks, powerful custom judges, full agent trace analysis, playback, bulk evaluation, and CI/CD gates.

What makes it click:

  • 200+ models and 100+ benchmarks, ready to query. No scraping leaderboards, no CSV wrangling. pc.models.get() and you're looking at real evaluation data.
  • Prompt-level comparisons. Not just "Model A scores 82%." You get the exact prompts where Model A passes and Model B fails, with outcome filters to find the interesting divergences.
  • A 4-generation eval ladder. Start with heuristic checks, graduate to model-graded scoring, add deliberation panels, then build auto-optimized GEPA judges. One SDK covers the full spectrum.
  • Agent trace evaluation. Upload a multi-step agent trace, replay it, and judge every step. Built for the world where agents do real work.
  • CI/CD eval gates. layerlens ci run --threshold 0.8 in your pipeline. Non-zero exit on regression. No custom scripts needed.

How Stratix Compares

Capability Stratix LangSmith Langfuse DeepEval Phoenix (Arize)
Pre-built benchmarks 100+ benchmarks, 200+ models No public benchmarks No public benchmarks ~14 metrics Bring your own
Prompt-level comparison Native head-to-head with outcome filters Side-by-side runs (manual) Not built-in Manual setup Not built-in
Custom judge builder Auto-optimized GEPA judges with budget control LLM-as-judge (manual) LLM-as-judge (manual) Basic LLM judges LLM-as-judge templates
Agent trace evaluation Upload, replay, judge every step Trace logging + annotation Trace logging + scoring Trace logging only Trace visualization
Eval generation ladder Heuristic > model-graded > deliberation > GEPA Single generation Single generation Single generation Single generation
CI/CD eval gate layerlens ci run with threshold Custom integration Custom integration deepeval test Manual integration
Evaluation Spaces Collaborative eval environments Hub (paid) Not available Not available Not available
Dataset versioning Pin evals to versions, diff between runs Dataset management Not built-in Basic support Dataset management
OpenTelemetry export Native OTLP exporter Not built-in Native OTLP Not built-in Native (OpenInference)
Pricing model Free public data; premium for org features Per-trace pricing Per-event pricing Open source + cloud Open source + cloud

Installation

# Recommended (includes CLI, rich output, and examples)
pip install layerlens[cli]

Note: During early access the package is hosted on a private index. Use:

pip install --extra-index-url https://sdk.layerlens.ai/package layerlens[cli]

Quick Start

Easiest way — use the one-command template:

stratix init my-first-eval
cd my-first-eval
python main.py

Or wire it up yourself in Python:

from layerlens import PublicClient, Stratix

# Public data (models, benchmarks, evaluations)
pc = PublicClient(api_key="your-api-key")

models = pc.models.get(page_size=200)
print(f"{models.total_count} models available")

# Compare two models head-to-head at prompt level
comparison = pc.comparisons.compare_models(
    benchmark_id="benchmark-id",
    model_id_1="model-a",
    model_id_2="model-b",
    outcome_filter="comparison_fails",  # where model B fails
)

# Premium features (traces, judges, scorers)
client = Stratix(api_key="your-api-key")

# Upload and evaluate an agent trace
client.traces.upload("trace.json")
eval_result = client.trace_evaluations.create(
    trace_id="trace-id",
    judge_id="judge-id",
)

CLI

The SDK ships with a full CLI for managing evaluations from your terminal or CI pipeline:

# Set your API key
export LAYERLENS_STRATIX_API_KEY="your-api-key"

# List traces
layerlens trace list

# Run a judge evaluation
layerlens judge run --judge-id <id> --trace-id <id>

# Evaluate in CI mode (exits non-zero on failure)
layerlens ci run --judge-id <id> --trace-id <id> --threshold 0.8

Architecture

layerlens/
  _client.py          # Stratix (premium) client
  _public_client.py   # PublicClient (open data)
  cli/                # Click-based CLI with rich output
    commands/         # trace, judge, evaluate, scorer, space, bulk, ci
  models/             # Pydantic response models
  resources/          # API resource implementations
  contrib/
    rich_output.py    # Rich terminal tables & progress bars
    otel.py           # OpenTelemetry integration
    tracing.py        # @stratix.trace decorator
    datasets.py       # Dataset versioning & diffs
    error_suggestions.py  # Context-aware error messages

Examples

See the examples/ directory for integration patterns:

Example Description
LangGraph Trace and evaluate a LangGraph agent
CrewAI Evaluate CrewAI multi-agent workflows
AutoGen Instrument AutoGen conversations
CI/CD Gate Block deploys on eval regression
Custom Judge Build and optimize a domain judge
Prompt Playground Compare prompt variations side-by-side

Used By

Stratix powers evaluation workflows at LayerLens and across teams building production AI systems. The public benchmark data is queried thousands of times per week via the SDK and stratix.layerlens.ai.

If your team uses Stratix, open a PR to add your logo here.

Join the Community

The LayerLens Discord is the best place to:

  • Get help with the SDK and trace evaluations
  • Share your custom judges and agent workflows
  • Access free Stratix Premium Credits for active contributors
  • Join weekly Eval Office Hours & model comparison discussions
  • Influence the roadmap

Join the LayerLens Discord!

Documentation

Full documentation is available at layerlens.gitbook.io/stratix-python-sdk.

To build docs locally:

pip install layerlens[docs]
mkdocs serve

Contributing

Contributions are welcome. See CONTRIBUTING.md for guidelines.

Security

To report a vulnerability, see SECURITY.md.

License

Apache 2.0. See LICENSE.

Next Steps

Get started in under 2 minutes:

pip install --extra-index-url https://sdk.layerlens.ai/package layerlens[cli]
stratix init my-first-eval
cd my-first-eval && python main.py

Then explore the Quick Start guide, try a cookbook recipe, or join the Discord to ask questions and share what you're building.


Star us if you found this useful!
It helps more developers discover Stratix.

Built by LayerLens · Discord · Twitter

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors