chore(main): release 0.16.0#154
Open
jack-arturo wants to merge 1 commit into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Updates the project changelog for the 0.15.3 release generated by Release Please.
Changes:
- Add a new
0.15.3section toCHANGELOG.md - Document the included bug fix:
mcp-ssehealth/readiness decoupling (PR #151)
f71263e to
c02c93a
Compare
c02c93a to
d535732
Compare
d535732 to
1959ac6
Compare
26053b3 to
f1010db
Compare
jack-arturo
added a commit
that referenced
this pull request
Jun 12, 2026
…nce gate, date-aware ranking (#182, #193, #186, #187, #183, #184, #188) (#194) ## Release: ranking & recall series (develop → main)⚠️ **Merge with a MERGE COMMIT — do not squash.** release-please needs the individual conventional commits below to compute the version and changelog for PR #154. ### What's in this release | PR | Change | Default behavior | |---|---|---| | #182 | `feat(recall)`: configurable recency decay window/curve | unchanged (env-gated) | | #193 (replaces #185) | `feat(recall)`: tag-score denominator cap fixes query-length bias | unchanged (`SEARCH_TAG_SCORE_TOKEN_CAP=0`) | | #186 | `fix(recall)`: relevance gate — query-independent scoring gated on topical evidence (#130) | unchanged (gate off) | | #187 | `feat(recall)`: date-aware ranking, `recency_bias=off\|on\|auto`, latest-fact selection (#158, #159) | `RECALL_RECENCY_BIAS=off`; adds deterministic timestamp tiebreak for near-ties | | #183 | `feat(benchmarks)`: failure-mode diagnosis harness + judge quota preflight | tooling only | | #184 | `fix(mcp)`: surface stored metadata + `updated_at` in detailed recall format (#111) | additive | | #188 | `feat(enrichment)`: classification fallback-rate metrics in `/enrichment/status` | additive | Plus: CI now runs on `develop` pushes/PRs; benchmark experiment log + README contribution-policy note. ### Verification evidence - **Unit/lint/npm**: 625 pytest + 16 mcp-sse-server tests green on develop head; CI green. - **Default-preserve**: recall-lab baseline on the 10k-memory production snapshot — develop defaults vs main pooled baseline identical aggregates (R@5 0.655 / R@10 0.710 / MRR 0.434 / NDCG@10 0.501). Two-stack probe run (main vs develop, defaults): 11/12 preserve-exact, remaining diffs are near-tie reorders (top-1 score deltas ≤ 5.4e-5, the #187 timestamp tiebreak). - **Full judged 500q LongMemEval** (ship config: `RECALL_RECENCY_BIAS=auto` + `temporal-answer` harness): recall@5 96.6% (483/500), accuracy 86.0% (430/500), `judge_errors=0`, `memory_ingest_failures=0`. - **Churn attribution** (targeted re-runs of all 17 churned questions on current-main-at-defaults and develop-at-defaults): 15/17 moved with #191 (already on main) — the April canonical 97.2% floor is stale; current main measures ~97.0%. Develop-at-defaults differs from current main by **1 question in 500** (a near-tie rank-5/6 flip from #187's deterministic tiebreak). Accuracy is within answerer replicate noise (identical-config reference runs flip 28/500 answers). - Full detail: `benchmarks/EXPERIMENT_LOG.md` (2026-06-11 entry) and `benchmarks/results/lme_churn17_*` + `analyze_churn17.py`. ### Opt-in features shipped OFF `RECALL_RELEVANCE_GATE` (validated at 0.40 on lab corpus; improves negative-probe precision) and `RECALL_RECENCY_BIAS=auto` (current-state query re-ranking). Neither affects default behavior; see `docs/ENVIRONMENT_VARIABLES.md`. ### After merging release-please will update PR #154 (v0.16.0); merging *that* cuts the tag and publishes the `:stable` image — the actual user-facing deploy event for Railway template users. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
f1010db to
349f8c6
Compare
a0643ef to
4186f2a
Compare
4186f2a to
bf28679
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 I have created a release beep boop
0.16.0 (2026-06-17)
Features
Bug Fixes
python scripts/backup_automem.py(#175) (edd9742)Documentation
This PR was generated with Release Please. See documentation.