Add benchmark default matrix reporting by ictechgy · Pull Request #206 · ictechgy/context-guard

ictechgy · 2026-06-15T04:47:34Z

Summary

add context-guard-bench report-level default_matrix with six token-reduction lanes: trimming, artifact escrow, tool pruning, cache advice, adaptive-k, and optional compression
classify lanes as default-on, advisory, experimental, or reject/rework from existing matched-pair evidence, with lane match method, policy ceiling/clamp, reason codes, and report-only claim boundaries
render the matrix in benchmark dashboards and update benchmark docs/sample report/tests

Ralplan evidence

Context snapshot: .omx/context/g005-paired-replay-default-matrix-20260615T042329Z.md
Plan: .omx/plans/ralplan-g005-paired-replay-default-matrix.md
Architect: APPROVE, blockers none — .omx/artifacts/ralplan-g005-architect-20260615T042403Z.md
Critic: APPROVE, blockers none — .omx/artifacts/ralplan-g005-critic-20260615T042527Z.md

Validation

python3 scripts/sync_plugin_copies.py --check ✅
python3 -m py_compile context-guard-kit/benchmark_runner.py plugins/context-guard/bin/context-guard-bench tests/test_context_guard_kit.py scripts/release_smoke.py ✅
PYTHONDONTWRITEBYTECODE=1 python3 -m unittest -k benchmark tests.test_context_guard_kit.BenchmarkRunnerTests (35 tests) ✅
evidence replay smoke for docs/benchmark-fixtures/token-savings-12task.evidence.example.jsonl emitted default_matrix JSON and ## Default matrix dashboard ✅
python3 scripts/release_smoke.py --timeout 20 ✅
PYTHONDONTWRITEBYTECODE=1 python3 scripts/prepublish_check.py (697 tests) ✅
git diff --check ✅

ictechgy · 2026-06-15T06:49:08Z

G005 quad-review + validation evidence before merge:

Local validation passed:
- python3 scripts/sync_plugin_copies.py --check
- python3 -m py_compile context-guard-kit/benchmark_runner.py plugins/context-guard/bin/context-guard-bench tests/test_context_guard_kit.py scripts/release_smoke.py
- PYTHONDONTWRITEBYTECODE=1 python3 -m unittest -k benchmark tests.test_context_guard_kit.BenchmarkRunnerTests (35 tests)
- 12-task replay smoke for docs/benchmark-fixtures/token-savings-12task.* verified default_matrix schema/public-claim false/6 lanes/classifications/dashboard header
- python3 scripts/release_smoke.py --timeout 20
- PYTHONDONTWRITEBYTECODE=1 python3 scripts/prepublish_check.py (697 tests)
- git diff --check
PR CI passed: test-and-prepublish on Python 3.11, Python 3.12, and macOS 3.12 (run 27524671842).
Quad review loop: APPROVE / no blockers from Codex, Claude, Agy, and Forge.
- Codex: .omx/artifacts/quad-review-pr206-codex-20260615T050757Z.md
- Claude: .omx/artifacts/quad-review-pr206-claude-20260615T050757Z.md
- Agy: .omx/artifacts/quad-review-pr206-agy-20260615T050757Z.md
- Forge fallback: .omx/artifacts/quad-review-pr206-forge-fallback-20260615T064824Z.md

Nonblocking follow-ups captured from review: lane attribution remains key/name heuristic based for future fixtures; future non-current policy ceilings may need explicit clamp handling. Neither blocks current G005 scope.

Add benchmark default matrix reporting

51b1ac1

ictechgy merged commit be54adc into main Jun 15, 2026
3 checks passed

ictechgy deleted the g005-paired-replay-default-matrix branch June 15, 2026 06:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark default matrix reporting#206

Add benchmark default matrix reporting#206
ictechgy merged 1 commit into
mainfrom
g005-paired-replay-default-matrix

ictechgy commented Jun 15, 2026

Uh oh!

ictechgy commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ictechgy commented Jun 15, 2026

Summary

Ralplan evidence

Validation

Uh oh!

ictechgy commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant