diff --git a/README.md b/README.md
index a7c06f9..8bbd481 100644
--- a/README.md
+++ b/README.md
@@ -1,20 +1,21 @@
-
+
Stratix Python SDK
- Ship AI that actually works. Evaluate 200+ models across 100+ benchmarks, trace agent behavior, build custom judges, and gate CI/CD on eval results.
+ Ship AI that actually works.
+
+ Evaluate 200+ models across 100+ benchmarks, trace agent behavior, build custom judges, and gate CI/CD on eval results.
-
@@ -31,6 +32,9 @@
---
+
+
+
## Why Stratix?
@@ -48,8 +52,8 @@ Stratix is built differently. It gives you production-grade evaluation infrastru
| Capability | **Stratix** | LangSmith | Langfuse | DeepEval | Phoenix (Arize) |
| ----------------------- | ---------------------------------------------- | -------------------------- | ----------------------- | ------------------- | ---------------------- |
-| Pre-built benchmarks | 100+ benchmarks, 200+ models | No public benchmarks | No public benchmarks | ~14 metrics | Bring your own |
-| Prompt-level comparison | Native head-to-head with outcome filters | Side-by-side runs (manual) | Not built-in | Manual setup | Not built-in |
+| Pre-built benchmarks | 100+ benchmarks, 200+ models | No public benchmarks | No public benchmarks | 50+ metrics | Bring your own |
+| Prompt-level comparison | Native head-to-head with outcome filters | Side-by-side runs (manual) | Side-by-side runs + Playground/Experiments (UI Supported) | Manual setup | Not built-in |
| Custom judge builder | Auto-optimized GEPA judges with budget control | LLM-as-judge (manual) | LLM-as-judge (manual) | Basic LLM judges | LLM-as-judge templates |
| Agent trace evaluation | Upload, replay, judge every step | Trace logging + annotation | Trace logging + scoring | Trace logging only | Trace visualization |
| Eval generation ladder | Heuristic > model-graded > deliberation > GEPA | Single generation | Single generation | Single generation | Single generation |
@@ -59,59 +63,68 @@ Stratix is built differently. It gives you production-grade evaluation infrastru
| OpenTelemetry export | Native OTLP exporter | Not built-in | Native OTLP | Not built-in | Native (OpenInference) |
| Pricing model | Free public data; premium for org features | Per-trace pricing | Per-event pricing | Open source + cloud | Open source + cloud |
+## Pricing
+
+**Free to start.** `PublicClient` is free with an API key–query 200+ models, 50+ benchmarks, and run head-to-head comparisons. Advanced features (traces, custom judges, scorers, CI gates) require **Stratix Premium**. Sign up and purchase credits at [app.layerlens.ai](https://app.layerlens.ai).
+
## Installation
+> [!NOTE]
+> `layerlens` is hosted on a private index during early access. Use the command below — the plain `pip install layerlens[cli]` will not work yet.
+
```bash
-# Recommended (includes CLI, rich output, and examples)
-pip install layerlens[cli]
+pip install --extra-index-url https://sdk.layerlens.ai/package layerlens[cli]
```
-> **Note:** During early access the package is hosted on a private index. Use:
->
-> ```bash
-> pip install --extra-index-url https://sdk.layerlens.ai/package layerlens[cli]
-> ```
-
## Quick Start
-**Easiest way** — use the one-command template:
+> [!NOTE]
+> **Two clients, one SDK.** Use `PublicClient` for models, benchmarks, and comparisons. Use `Stratix` for traces, custom judges, scorers, and CI gates. Both take the same API key.
+
+### 1. Install
+
+```bash
+pip install --extra-index-url https://sdk.layerlens.ai/package layerlens[cli]
+```
+
+### 2. Set your API key
+
+Get a key from [app.layerlens.ai](https://app.layerlens.ai) → Settings → API Keys.
```bash
-stratix init my-first-eval
-cd my-first-eval
-python main.py
+export LAYERLENS_STRATIX_API_KEY="your-api-key"
```
-Or wire it up yourself in Python:
+### 3. Run your first comparison
```python
-from layerlens import PublicClient, Stratix
+from layerlens import PublicClient
-# Public data (models, benchmarks, evaluations)
-pc = PublicClient(api_key="your-api-key")
+pc = PublicClient()
-models = pc.models.get(page_size=200)
+# List available models
+models = pc.models.get(page_size=10)
print(f"{models.total_count} models available")
-# Compare two models head-to-head at prompt level
+# Compare two models head-to-head on a benchmark
comparison = pc.comparisons.compare_models(
- benchmark_id="benchmark-id",
- model_id_1="model-a",
- model_id_2="model-b",
- outcome_filter="comparison_fails", # where model B fails
+ benchmark_id="aime2024",
+ model_id_1="openai/gpt-4o",
+ model_id_2="anthropic/claude-opus-4",
+ outcome_filter="comparison_fails", # prompts where model 2 fails
)
-# Premium features (traces, judges, scorers)
-client = Stratix(api_key="your-api-key")
-
-# Upload and evaluate an agent trace
-client.traces.upload("trace.json")
-eval_result = client.trace_evaluations.create(
- trace_id="trace-id",
- judge_id="judge-id",
-)
+print(comparison)
```
+That's it! You're comparing frontier models on real benchmark data. **[See full results in the dashboard →](https://stratix.layerlens.ai)**
+
+### Next steps
+
+- **[Run a custom evaluation](./examples/)** ➡️ score your own model on any benchmark
+- **[Gate CI/CD on eval results](./examples/ci-gate)** ➡️ `layerlens ci run --threshold 0.8` in your pipeline
+- **[Upload and evaluate agent traces](./examples/agent-traces)** ➡️ multi-step trace analysis
+
## CLI
The SDK ships with a full CLI for managing evaluations from your terminal or CI pipeline:
@@ -148,18 +161,23 @@ layerlens/
error_suggestions.py # Context-aware error messages
```
-## Examples
+## Samples
-See the [`examples/`](./examples) directory for integration patterns:
+The [`samples/`](./samples) directory contains 70+ production-ready samples organized by use case. See [`samples/README.md`](./samples/README.md) for the full index.
-| Example | Description |
-| --------------------------------------------------------- | -------------------------------------- |
-| [LangGraph](./examples/integrations/langgraph_example.py) | Trace and evaluate a LangGraph agent |
-| [CrewAI](./examples/integrations/crewai_example.py) | Evaluate CrewAI multi-agent workflows |
-| [AutoGen](./examples/integrations/autogen_example.py) | Instrument AutoGen conversations |
-| [CI/CD Gate](./examples/cookbook/ci_eval_gate.py) | Block deploys on eval regression |
-| [Custom Judge](./examples/cookbook/custom_judge.py) | Build and optimize a domain judge |
-| [Prompt Playground](./examples/playground/) | Compare prompt variations side-by-side |
+| Category | Description |
+|---|---|
+| [Core samples](./samples) | Quickstart, traces, evaluations, judges, async workflows |
+| [Industry solutions](./samples/industry) | Healthcare, financial, legal, government, retail, insurance |
+| [CI/CD integration](./samples/cicd) | Quality gates, pre-commit hooks, GitHub Actions workflow |
+| [Multi-agent (Cowork)](./samples/cowork) | Generator-Evaluator, Code Review, RAG, Incident Response patterns |
+| [Content-type evaluations](./samples/modalities) | Text, brand, and document quality scoring |
+| [LLM provider integrations](./samples/integrations) | OpenAI, Anthropic, LangChain tracing and instrumentation |
+| [MCP server](./samples/mcp) | Expose LayerLens as tools for Claude, Cursor, and any MCP-compatible assistant |
+| [CopilotKit CoAgents](./samples/copilotkit) | Full-stack LangGraph + generative UI components |
+| [Claude Code skills](./samples/claude-code) | Slash commands for managing LayerLens from the Claude Code CLI |
+| [OpenClaw agent evaluation](./samples/openclaw) | Trace, evaluate, and monitor OpenClaw autonomous agents |
+| [Sample data](./samples/data) | Pre-built traces, test datasets, and industry evaluation data |
## Used By
@@ -208,12 +226,12 @@ Apache 2.0. See [LICENSE](./LICENSE).
**Get started in under 2 minutes:**
```bash
-pip install --extra-index-url https://sdk.layerlens.ai/package layerlens[cli]
-stratix init my-first-eval
-cd my-first-eval && python main.py
+pip install --extra-index-url https://sdk.layerlens.ai/package "layerlens[cli]"
+export LAYERLENS_STRATIX_API_KEY="your-api-key"
+python3 -c "from layerlens import PublicClient; pc = PublicClient(); print(pc.models.get(page_size=5))"
```
-Then explore the [Quick Start guide](https://layerlens.gitbook.io/stratix-python-sdk), try a [cookbook recipe](./examples/cookbook/), or [join the Discord](https://discord.gg/layerlens) to ask questions and share what you're building.
+Then explore the [Quick Start guide](https://layerlens.gitbook.io/stratix-python-sdk), try a [cookbook recipe](https://github.com/LayerLens/stratix-python/tree/main/samples), or [join the Discord](https://discord.gg/layerlens) to ask questions and share what you're building.
---
diff --git a/demo-stratix.gif b/demo-stratix.gif
new file mode 100644
index 0000000..07db7ee
Binary files /dev/null and b/demo-stratix.gif differ