Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions .github/workflows/auto-release.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
name: Auto Release

on:
push:
branches:
- release

jobs:
auto-tag-and-publish:
runs-on: ubuntu-latest
environment: production

permissions:
contents: write
actions: write

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Read version from _version.py
id: get_version
run: |
chmod +x ./scripts/get_version.sh
VERSION=$(./scripts/get_version.sh)
if [ -z "$VERSION" ]; then
echo "Error: Could not extract version"
exit 1
fi
TAG="v${VERSION}"
echo "version=${VERSION}" >> "$GITHUB_OUTPUT"
echo "tag=${TAG}" >> "$GITHUB_OUTPUT"
echo "Resolved tag: ${TAG}"

- name: Check if tag already exists
id: check_tag
run: |
TAG="${{ steps.get_version.outputs.tag }}"
git fetch --tags --prune --force
if git rev-parse --verify "refs/tags/${TAG}" >/dev/null 2>&1; then
echo "Tag ${TAG} already exists on origin — nothing to do."
echo "exists=true" >> "$GITHUB_OUTPUT"
else
echo "Tag ${TAG} does not exist — will create."
echo "exists=false" >> "$GITHUB_OUTPUT"
fi

- name: Configure git
if: steps.check_tag.outputs.exists == 'false'
run: |
git config --global user.name "github-actions[bot]"
git config --global user.email "github-actions[bot]@users.noreply.github.com"

- name: Create and push tag
if: steps.check_tag.outputs.exists == 'false'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
TAG="${{ steps.get_version.outputs.tag }}"
git tag "${TAG}" "${GITHUB_SHA}"
git push origin "${TAG}"
echo "Pushed tag ${TAG} at ${GITHUB_SHA}"

- name: Dispatch publish-to-aws workflow
if: steps.check_tag.outputs.exists == 'false'
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
TAG="${{ steps.get_version.outputs.tag }}"
gh workflow run publish-to-aws.yaml \
--ref "${TAG}" \
-f release_tag="${TAG}"
echo "Dispatched publish-to-aws.yaml for ${TAG}"

- name: Update latest version pointer in CHANGELOG.md on main
if: steps.check_tag.outputs.exists == 'false'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
VERSION="${{ steps.get_version.outputs.version }}"
TAG="${{ steps.get_version.outputs.tag }}"
DATE=$(date -u +%Y-%m-%d)
NEW_LINE="**Latest version:** [${VERSION}](https://github.com/LayerLens/stratix-python/releases/tag/${TAG}) — ${DATE}"

git fetch origin main
git checkout main
git pull --ff-only origin main

if ! grep -q '^\*\*Latest version:\*\*' CHANGELOG.md; then
echo "Error: '**Latest version:**' line not found in CHANGELOG.md"
exit 1
fi

sed -i "s|^\*\*Latest version:\*\*.*|${NEW_LINE}|" CHANGELOG.md

if git diff --quiet CHANGELOG.md; then
echo "CHANGELOG.md already up to date for ${TAG}"
exit 0
fi

git add CHANGELOG.md
git commit -m "chore: update latest version pointer to ${TAG}"
git push origin main
184 changes: 184 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
# Changelog

All notable changes to the Stratix Python SDK will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

**Latest version:** [1.7.0](https://github.com/LayerLens/stratix-python/releases/tag/v1.7.0) — 2026-05-20

## [Unreleased]

Things we're actively working on. Want to help? Check the [issues](https://github.com/LayerLens/stratix-python/issues) or [discussions](https://github.com/LayerLens/stratix-python/discussions).

### Added

### Changed

### Fixed

### Deprecated

### Removed

## [1.7.0] - 2026-05-20

### Added

- `extra_payload` parameter on `models.create_custom` and `models.update_custom` (sync + async). Optional JSON object merged into every outgoing chat-completions request body; customer values win on conflict with our hardcoded defaults. Lets customers add provider-specific fields (`top_p`, `max_completion_tokens`) or override values like `temperature` for providers that reject our defaults.

## [1.6.1] - 2026-05-15

### Added

- CLI authentication command (`layerlens auth`) (#72)
- `models.update_custom(model_id, *, api_url, api_key, max_tokens)` (sync + async) — repoint a custom model's mutable fields without recreating it (#169)
- `models.delete_custom(model_id)` (sync + async) — full teardown that disables the record, strips it from `Project.Models`, and releases the name for reuse (#169)
- 70+ production-ready SDK samples across 12 categories: core, industry, cowork, modalities, integrations, cicd, cli, openclaw, mcp, copilotkit, claude-code, data (#73)
- MCP server sample exposing LayerLens as tools
- CopilotKit sample with LangGraph CoAgents, React components, and hooks
- New trace samples (#144)

### Changed

- `models.add()` / `models.remove()` now operate on the full project model list (public + custom). The previous `type="public"` filter silently dropped custom-model IDs from `Project.Models` on every call (#169)
- Expanded SDK documentation and README (#139, #167)

### Fixed

- Trace evaluations bug (#74)
- CopilotKit evaluator graph now compiles with a checkpointer so `interrupt()` works over AG-UI. Includes a `RunIdPreservingAgent` workaround for the upstream `ag-ui-langgraph` runId-overwrite bug ([ag-ui-protocol/ag-ui#1582](https://github.com/ag-ui-protocol/ag-ui/issues/1582)) (#92)

## [1.6.0] - 2026-03-25

### Added

- Prompts exposed on the private client (#70)

## [1.5.0] - 2026-03-23

### Added

- Full-featured command-line interface via `layerlens` / `stratix`
- `client.scorers` resource with full CRUD: create, get, list, update, delete
- `client.evaluation_spaces` resource with get, list, create, update, delete
- `client.integrations` resource with get, list, create, update, delete, and test
- CLI getting started guide, command reference, and examples
- Scorers API reference documentation

### Changed

- Updated evaluations, models & benchmarks, and public client docs with new parameters

### Fixed

- `filter` by categories/languages/companies/regions/licenses now returns correct results

## [1.4.0] - 2026-03-17

### Added

- `unique` parameter on `evaluations.get_many()` and `public_evaluations.get_many()` that deduplicates results by model+dataset pair, keeping only the latest evaluation per pair

### Fixed

- Model comparison now passes `unique=True` when fetching evaluations, ensuring the correct (latest) evaluation is used for each model+benchmark pair instead of potentially picking up duplicates

## [1.3.3] - 2026-03-17

### Added

- Missing methods on `benchmarks` and `models` resources

### Fixed

- Inconsistent API naming across the SDK now follows a unified convention. Affected resources: comparisons, evaluations, judges, results, trace evaluations, traces, public benchmarks/evaluations/models (#61)
- `SUMMARY.md` structure and examples updated to match new naming

## [1.3.2] - 2026-03-13

### Added

- Documentation pages for GitBook: getting-started, troubleshooting, security

### Fixed

- `trace_evaluations.get_results()` no longer returns empty/None results. The API returns evaluation data (score, passed, reasoning, steps) directly, but the SDK was looking for a non-existent results array. `TraceEvaluationResultsResponse` now correctly maps to the API response shape and inherits from `TraceEvaluationResult`
- `TraceEvaluationStep` model now matches actual API fields (`tool`, `args`, `result`) instead of the incorrect (`step`, `reasoning`)

## [1.3.1] - 2026-03-13

### Added

- Automatic retry with exponential backoff for transient errors (HTTP 429, 500, 502, 503, 504) in both sync and async clients (up to 2 retries, respects `Retry-After` header, max 8s delay)
- Expanded documentation: updated README, examples for models/benchmarks, public API, and retrieving results

## [1.3.0] - 2026-03-13

### Changed

- Expanded model and benchmark result models with additional fields

### Fixed

- CI/CD publish workflows

## [1.2.0] - 2026-03-13

### Added

- `Stratix` / `AsyncStratix` clients (rebrand from Atlas)
- Judges resource with full CRUD
- Trace upload (JSON/JSONL up to 50 MB via presigned S3) and `trace_evaluations` resource
- Judge optimizations resource for tuning judge configurations
- `PublicClient` — a dedicated client for public endpoints (models, benchmarks, evaluations, comparisons), also accessible via `client.public`
- `get_by_key`, `add`, `remove`, `create_custom`, `create_smart` methods on Model & Benchmark resources
- `comparisons` resource for comparing evaluation results
- Apache 2.0 license

### Changed

- Expanded benchmark and model models with additional fields

### Deprecated

- `Atlas` client name — use `Stratix` instead (legacy `Atlas` aliases kept for backward compatibility)

### Fixed

- Evaluation status enum values

## [1.0.2] - 2026-03-13

### Changed

- Updated publish-to-AWS packaging job

## [1.0.1] - 2026-03-13

### Fixed

- Version bump

## [1.0.0] - 2026-03-13

### Added

- Initial release of the LayerLens evaluation SDK
- Sync and async clients for the LayerLens evaluation API
- `evaluations`, `results`, `models`, and `benchmarks` resources
- Typed exception hierarchy for API errors

[Unreleased]: https://github.com/LayerLens/stratix-python/compare/v1.6.1...HEAD
[1.6.1]: https://github.com/LayerLens/stratix-python/compare/v1.6.0...v1.6.1
[1.6.0]: https://github.com/LayerLens/stratix-python/compare/v1.5.0...v1.6.0
[1.5.0]: https://github.com/LayerLens/stratix-python/compare/v1.4.0...v1.5.0
[1.4.0]: https://github.com/LayerLens/stratix-python/compare/v1.3.3...v1.4.0
[1.3.3]: https://github.com/LayerLens/stratix-python/compare/v1.3.2...v1.3.3
[1.3.2]: https://github.com/LayerLens/stratix-python/compare/v1.3.1...v1.3.2
[1.3.1]: https://github.com/LayerLens/stratix-python/compare/v1.3.0...v1.3.1
[1.3.0]: https://github.com/LayerLens/stratix-python/compare/v1.2.0...v1.3.0
[1.2.0]: https://github.com/LayerLens/stratix-python/compare/v1.0.2...v1.2.0
[1.0.2]: https://github.com/LayerLens/stratix-python/compare/v1.0.1...v1.0.2
[1.0.1]: https://github.com/LayerLens/stratix-python/compare/v1.0.0...v1.0.1
[1.0.0]: https://github.com/LayerLens/stratix-python/releases/tag/v1.0.0
5 changes: 4 additions & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,9 +176,12 @@ response = client.models.create_custom(
name="My Fine-tuned Model",
key="my-org/custom-model-v1",
description="Fine-tuned GPT for medical Q&A",
api_url="https://my-api.example.com/v1",
api_url="https://my-api.example.com/v1/chat/completions",
max_tokens=4096,
api_key=os.environ.get("MY_PROVIDER_API_KEY"), # optional
# Optional — merged into every request body. Useful for provider-specific
# fields or for overriding our defaults (e.g. {"temperature": 1}).
extra_payload={"top_p": 0.9},
)
print(f"Created model: {response.model_id}")
```
Expand Down
58 changes: 37 additions & 21 deletions docs/api-reference/models-benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,21 +143,28 @@ client = Stratix()
success = client.models.remove("model-id-1", "model-id-2")
```

### `create_custom(name, key, description, api_url, max_tokens, api_key=None, timeout=None)`
### `create_custom(name, key, description, api_url, max_tokens, api_key=None, extra_payload=None, timeout=None)`

Creates a custom model backed by an OpenAI-compatible API endpoint. This allows you to evaluate any model accessible via a chat completions endpoint.

#### Parameters

| Parameter | Type | Required | Description |
| ------------- | -------------------------------- | -------- | --------------------------------------------------------------------------------- |
| `name` | `str` | Yes | Model name (max 256 characters) |
| `key` | `str` | Yes | Unique model key, lowercase alphanumeric with dots/hyphens/slashes (max 256 chars)|
| `description` | `str` | Yes | Model description (max 500 characters) |
| `api_url` | `str` | Yes | Base URL of the OpenAI-compatible API endpoint |
| `max_tokens` | `int` | Yes | Maximum number of tokens the model supports |
| `api_key` | `str \| None` | No | API key for the model provider |
| `timeout` | `float \| httpx.Timeout \| None` | No | Override request timeout |
| Parameter | Type | Required | Description |
| --------------- | --------------------------------- | -------- | --------------------------------------------------------------------------------- |
| `name` | `str` | Yes | Model name (max 256 characters) |
| `key` | `str` | Yes | Unique model key, lowercase alphanumeric with dots/hyphens/slashes (max 256 chars)|
| `description` | `str` | Yes | Model description (max 500 characters) |
| `api_url` | `str` | Yes | Full URL of the OpenAI-compatible chat completions endpoint |
| `max_tokens` | `int` | Yes | Maximum number of tokens the model supports |
| `api_key` | `str \| None` | No | API key for the model provider |
| `extra_payload` | `Dict[str, Any] \| None` | No | JSON object merged into every outgoing chat-completions request body (see below) |
| `timeout` | `float \| httpx.Timeout \| None` | No | Override request timeout |

#### `extra_payload` semantics

When set, the keys/values in `extra_payload` are deep-merged into every outgoing request body. Customer values **win on conflict** with our hardcoded defaults — use it to override `temperature` (we send `0` for reproducible evaluations) or to add provider-specific fields like `top_p`, `presence_penalty`, or `max_completion_tokens` (required by some OpenAI reasoning models that reject `max_tokens`).

The keys `messages`, `model`, and `stream` are reserved and will be rejected.

#### Returns

Expand All @@ -178,28 +185,31 @@ result = client.models.create_custom(
name="My Custom Model",
key="my-org/custom-model-v1",
description="Custom fine-tuned model served via vLLM",
api_url="https://my-model-endpoint.example.com/v1",
api_url="https://my-model-endpoint.example.com/v1/chat/completions",
api_key="my-provider-api-key",
max_tokens=4096,
# Optional — provider-specific overrides merged into every request body.
extra_payload={"top_p": 0.9},
)

if result:
print(f"Created model: {result.model_id}")
```

### `update_custom(model_id, *, api_url=None, api_key=None, max_tokens=None, timeout=None)`
### `update_custom(model_id, *, api_url=None, api_key=None, max_tokens=None, extra_payload=None, timeout=None)`

Updates a custom model's mutable fields. At least one of `api_url`, `api_key`, or `max_tokens` must be provided. Primary use case: repointing `api_url` for ephemeral vLLM endpoints behind cloudflared tunnels whose URL changes between sessions.
Updates a custom model's mutable fields. At least one of `api_url`, `api_key`, `max_tokens`, or `extra_payload` must be provided. Primary use case: repointing `api_url` for ephemeral vLLM endpoints behind cloudflared tunnels whose URL changes between sessions.

#### Parameters

| Parameter | Type | Required | Description |
| ------------ | -------------------------------- | -------- | -------------------------------------------------------- |
| `model_id` | `str` | Yes | ID of the custom model to update |
| `api_url` | `str \| None` | No | New base URL for the OpenAI-compatible API endpoint |
| `api_key` | `str \| None` | No | New API key for the model provider |
| `max_tokens` | `int \| None` | No | New maximum tokens value |
| `timeout` | `float \| httpx.Timeout \| None` | No | Override request timeout |
| Parameter | Type | Required | Description |
| --------------- | --------------------------------- | -------- | -------------------------------------------------------------------------------- |
| `model_id` | `str` | Yes | ID of the custom model to update |
| `api_url` | `str \| None` | No | New full URL of the OpenAI-compatible chat completions endpoint |
| `api_key` | `str \| None` | No | New API key for the model provider |
| `max_tokens` | `int \| None` | No | New maximum tokens value |
| `extra_payload` | `Dict[str, Any] \| None` | No | New JSON object merged into every outgoing request. Pass `{}` to clear it. |
| `timeout` | `float \| httpx.Timeout \| None` | No | Override request timeout |

#### Returns

Expand All @@ -213,7 +223,13 @@ client = Stratix()
# Repoint the api_url without re-creating the model
client.models.update_custom(
"model-id-from-create-custom",
api_url="https://my-new-endpoint.example.com/v1",
api_url="https://my-new-endpoint.example.com/v1/chat/completions",
)

# Override request parameters for a model that doesn't accept temperature=0
client.models.update_custom(
"model-id-from-create-custom",
extra_payload={"temperature": 1},
)
```

Expand Down
Loading
Loading