Skip to content

chore(main): release 0.16.0#154

Open
jack-arturo wants to merge 1 commit into
mainfrom
release-please--branches--main
Open

chore(main): release 0.16.0#154
jack-arturo wants to merge 1 commit into
mainfrom
release-please--branches--main

Conversation

@jack-arturo

@jack-arturo jack-arturo commented Apr 24, 2026

Copy link
Copy Markdown
Member

🤖 I have created a release beep boop

0.16.0 (2026-06-17)

Features

  • api: add admin backup endpoint (#162) (8b1f264)
  • api: support bulk memory associations (1221e36)
  • api: support bulk memory associations (#198) (28eb916)
  • benchmarks: LongMemEval failure-mode diagnosis harness + judge quota preflight (#183) (f99bece)
  • consolidation: expose cluster threshold and min size as env vars (#163) (7e731f3)
  • enrichment: expose classification fallback-rate metrics in /enrichment/status (#188) (0b522a9)
  • entity: harden identity cleanup and repair tooling (#176) (827dfbc)
  • eval: recall-quality optimization harness — lab foundation + design (#197) (431433e)
  • graph: support unbounded visualizer snapshots (#141) (c730128)
  • lab: add aged labelled distractor injection (cc5d546)
  • lab: add config_complexity simplicity metric (dfb10d9)
  • lab: add distractor_rate_at_k precision guardrail metric (872eab2)
  • lab: add lab_corpus with parameterized recall (5e1e071)
  • lab: add pick_winner scorecard decision rule (3187eac)
  • lab: add real consolidation pass helper (48a7d4a)
  • lab: isolate production clone restores (#171) (aef90c0)
  • lab: wire scorecard, distractors, recall params, consolidation into runner (589ec30)
  • recall: add metadata sidecar search (#177) (4e7956e)
  • recall: add state_mode=current|history recall alias (#173) (b1df86c)
  • recall: cap tag-score denominator to fix query-length bias (#193) (cefa516)
  • recall: date-aware ranking + latest-fact selection (#158, #159) (#187) (a6ed945)
  • recall: make recency decay window and curve configurable (#182) (dbb933f)
  • recall: ranking release — recency config, tag-score cap, relevance gate, date-aware ranking (#182, #193, #186, #187, #183, #184, #188) (#194) (337fe98)
  • scripts: safer reclassify_with_llm.py with provider flags + tighter prompt (#164) (a742602)

Bug Fixes

  • api: address copilot review on PR #198 (0466a1e)
  • api: handle grouped association write failures (cd93df9)
  • backup: make backup_automem.py runnable as python scripts/backup_automem.py (#175) (edd9742)
  • benchmarks: add publication verification bundle (#166) (420d721)
  • consolidation: skip eager first tick at startup to avoid FalkorDB load race (#165) (1b812cf)
  • docs: keep dispatch payload arrays stable (df6e9e8)
  • embedding: fall back to per-item real embeddings before placeholders in batch path (#189) (6e9c62c)
  • entity: restore person-shape exemption on the slug validation path (#179) (5e29960)
  • entity: stop validator over-rejecting real people, code tools, and event categories (#178) (193b730)
  • lab: address copilot review on PR #197 (45f80d6)
  • lab: align scorecard key contract (build_scorecard -> pick_winner) (7d91530)
  • mcp-sse: decouple /health liveness from upstream readiness (#151) (5bcfb8b)
  • mcp: cap association failure summary (ea4e08f)
  • mcp: surface stored metadata and updated_at in detailed recall format (#184) (230416e)
  • recall: address copilot review on PR #194 (50b1647)
  • recall: gate query-independent scoring on topical evidence within tag scope (#130) (#186) (c11b594)
  • recall: hydrate semantic recall summaries (#192) (76e845d)
  • recall: normalize graph keyword scores into the 0-1 component range (#191) (3653ddf)
  • recall: respect current memory state (#170) (ed36b98), closes #169 #158 #159

Documentation

  • bench: log full judged 500q LongMemEval ship-config run with churn attribution (41bf8d0)
  • eval: Plan A — lab metric foundation (TDD, 9 tasks) (0087dda)
  • eval: Plan B — parallel matrix harness (TDD, 9 tasks) (c8ddfb2)
  • evals: mark Memora/FAMA/WRIT lifecycle diagnostics as diagnostic-only (#174) (e8a3285)
  • eval: spec for recall-quality optimization harness (b1a1995)
  • note develop-branch contribution policy in README (ccf02dd)
  • positioning: add scout reference (#168) (922d23b)
  • refresh README and benchmark guidance (#157) (bba31cc)
  • runtime: align Docker viewer paths and setup guidance (#155) (bbda79b)

This PR was generated with Release Please. See documentation.

Copilot AI review requested due to automatic review settings April 24, 2026 15:30

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the project changelog for the 0.15.3 release generated by Release Please.

Changes:

  • Add a new 0.15.3 section to CHANGELOG.md
  • Document the included bug fix: mcp-sse health/readiness decoupling (PR #151)

@jack-arturo jack-arturo force-pushed the release-please--branches--main branch 6 times, most recently from f71263e to c02c93a Compare May 1, 2026 16:23
@jack-arturo jack-arturo changed the title chore(main): release 0.15.3 chore(main): release 0.16.0 May 14, 2026
@jack-arturo jack-arturo force-pushed the release-please--branches--main branch from c02c93a to d535732 Compare May 14, 2026 02:01
@jack-arturo jack-arturo force-pushed the release-please--branches--main branch from d535732 to 1959ac6 Compare May 22, 2026 07:39
@jack-arturo jack-arturo force-pushed the release-please--branches--main branch 14 times, most recently from 26053b3 to f1010db Compare June 11, 2026 18:24
jack-arturo added a commit that referenced this pull request Jun 12, 2026
…nce gate, date-aware ranking (#182, #193, #186, #187, #183, #184, #188) (#194)

## Release: ranking & recall series (develop → main)

⚠️ **Merge with a MERGE COMMIT — do not squash.** release-please needs
the individual conventional commits below to compute the version and
changelog for PR #154.

### What's in this release

| PR | Change | Default behavior |
|---|---|---|
| #182 | `feat(recall)`: configurable recency decay window/curve |
unchanged (env-gated) |
| #193 (replaces #185) | `feat(recall)`: tag-score denominator cap fixes
query-length bias | unchanged (`SEARCH_TAG_SCORE_TOKEN_CAP=0`) |
| #186 | `fix(recall)`: relevance gate — query-independent scoring gated
on topical evidence (#130) | unchanged (gate off) |
| #187 | `feat(recall)`: date-aware ranking,
`recency_bias=off\|on\|auto`, latest-fact selection (#158, #159) |
`RECALL_RECENCY_BIAS=off`; adds deterministic timestamp tiebreak for
near-ties |
| #183 | `feat(benchmarks)`: failure-mode diagnosis harness + judge
quota preflight | tooling only |
| #184 | `fix(mcp)`: surface stored metadata + `updated_at` in detailed
recall format (#111) | additive |
| #188 | `feat(enrichment)`: classification fallback-rate metrics in
`/enrichment/status` | additive |

Plus: CI now runs on `develop` pushes/PRs; benchmark experiment log +
README contribution-policy note.

### Verification evidence

- **Unit/lint/npm**: 625 pytest + 16 mcp-sse-server tests green on
develop head; CI green.
- **Default-preserve**: recall-lab baseline on the 10k-memory production
snapshot — develop defaults vs main pooled baseline identical aggregates
(R@5 0.655 / R@10 0.710 / MRR 0.434 / NDCG@10 0.501). Two-stack probe
run (main vs develop, defaults): 11/12 preserve-exact, remaining diffs
are near-tie reorders (top-1 score deltas ≤ 5.4e-5, the #187 timestamp
tiebreak).
- **Full judged 500q LongMemEval** (ship config:
`RECALL_RECENCY_BIAS=auto` + `temporal-answer` harness): recall@5 96.6%
(483/500), accuracy 86.0% (430/500), `judge_errors=0`,
`memory_ingest_failures=0`.
- **Churn attribution** (targeted re-runs of all 17 churned questions on
current-main-at-defaults and develop-at-defaults): 15/17 moved with #191
(already on main) — the April canonical 97.2% floor is stale; current
main measures ~97.0%. Develop-at-defaults differs from current main by
**1 question in 500** (a near-tie rank-5/6 flip from #187's
deterministic tiebreak). Accuracy is within answerer replicate noise
(identical-config reference runs flip 28/500 answers).
- Full detail: `benchmarks/EXPERIMENT_LOG.md` (2026-06-11 entry) and
`benchmarks/results/lme_churn17_*` + `analyze_churn17.py`.

### Opt-in features shipped OFF

`RECALL_RELEVANCE_GATE` (validated at 0.40 on lab corpus; improves
negative-probe precision) and `RECALL_RECENCY_BIAS=auto` (current-state
query re-ranking). Neither affects default behavior; see
`docs/ENVIRONMENT_VARIABLES.md`.

### After merging

release-please will update PR #154 (v0.16.0); merging *that* cuts the
tag and publishes the `:stable` image — the actual user-facing deploy
event for Railway template users.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@jack-arturo jack-arturo force-pushed the release-please--branches--main branch from f1010db to 349f8c6 Compare June 12, 2026 15:32
@jack-arturo jack-arturo force-pushed the release-please--branches--main branch 2 times, most recently from a0643ef to 4186f2a Compare June 17, 2026 03:16
@jack-arturo jack-arturo force-pushed the release-please--branches--main branch from 4186f2a to bf28679 Compare June 17, 2026 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

recall: respect invalidation and current-state semantics

2 participants