Skip to content

feat!: v5 — elegance refactor, security hardening, and automated releases#2

Merged
lawren merged 15 commits into
mainfrom
refactor/elegance-roadmap
Jun 22, 2026
Merged

feat!: v5 — elegance refactor, security hardening, and automated releases#2
lawren merged 15 commits into
mainfrom
refactor/elegance-roadmap

Conversation

@lawren

@lawren lawren commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • Security

    • Hardened bundle deserialization with stricter validation for offsets and ranges, plus safer handling of prototype-related edge cases.
  • Changed

    • Bundle configs now use equal (replacing facets).
    • FieldDefinition is now a discriminated union for safer compile-time field definitions.
    • Date-typed facet summaries now return numeric epoch values for correct sorting.
    • Manifest validation is tightened with stricter invariants.
  • Documentation

    • Updated examples to use equal.
  • Chores / Tests

    • CI split into lint/typecheck/test, improved caching/concurrency, and benchmarks/release gating updated; release automation/versioning updated.

lawren added 11 commits June 17, 2026 23:39
Examples used the deprecated `facets:` config key and shipped bundle.json
artifacts stamped version 1.0.0, which the v3/v4 loader rejects outright.

- Swap `facets:` → `equal:` in both build scripts and the basic-usage README
- Regenerate bundle.json artifacts (now version 4.1.0, loadable)
- Verified: both query-bundle.ts and agent-tool.ts run and return results
Both arms of the string-column branch returned encodeDictColumn (the
comment said 'fall through to dict regardless'), so the distinct-count
probe and its ratio constants had no effect. Delete countDistinctStrings,
DICT_DISTINCT_RATIO, DICT_MAX_DISTINCT, and the unused FLAG_NULL.

Behavior unchanged; ~25 lines removed.
Two contained issues on the malformed-bundle surface (load/loadBinary):

- Block slots took their off/len straight from the untrusted header with
  only an upper-bound check. A negative or fractional off slipped past it and
  made subarray reinterpret the range relative to the buffer end, yielding a
  view/posting-list/range-column over the wrong bytes. Add checkSlot()
  (non-negative safe integers within bounds) and route every raw slice through
  sliceBlob; guard BinaryReader.readF64View the same way.
- Bundle-controlled keys (field names, facet values) populated plain {} maps,
  so a "__proto__" key set the map's prototype instead of an own key — which
  hid it from the Object.keys-based capability allow-list (binary path) and
  corrupted lookups. Build facetIndex/nullIndex/rangeColumns/out with
  Object.create(null) on both the binary and JSON load paths.

Contained to crafted bundles (subarray clamps, so no OOB beyond the buffer;
no global prototype pollution), but both close silent-wrong-result holes.
Adds tests/bundle.security.test.ts with header-surgery regressions.
The facet value->key rule (String(value)) was copy-pasted across
buildFacetIndex, computeFacetCounts, and the getEqualCandidates lookups,
while its inverse (parseFacetKey) lived alone in bundle.ts and wasn't a true
inverse: an epoch-number date facet stringified to a numeric key that
Date.parse rejected, so getFacetSummary reported the raw string and skipped
its numeric sort.

- New src/query/facet-key.ts owns encodeFacetKey + decodeFacetKey as an
  inverse pair; date decode is number-first (epoch) then Date.parse (ISO).
- Route all encode sites and getFacetSummary's decode through it; delete
  parseFacetKey from bundle.ts.

Fixes the date-facet getFacetSummary bug (flagged by boundaries + SSOT +
testability). Adds facet-key codec + date-facet summary tests.
…t invariants

Alias-target validation was written three times (buildManifest inline,
validateManifest, and fromSimpleConfig) with divergent messages that could
drift. And validateManifest only checked capabilities ⊆ fields, not the
reverse — a facet/range/alias field omitted from capabilities passed
validation yet was silently unqueryable.

- Delete the inline alias checks in buildManifest and the duplicate in
  fromSimpleConfig; validateManifest now owns all structural invariants.
- Add the kind<->capability bijection assertion so capabilities is a checked
  projection of field kinds in both directions.

Adds a hand-edited-manifest regression for the dropped-facet case.
Previously a flat product type, so invalid combos compiled and only failed
(or silently mis-built) at runtime: { kind: 'range', type: 'string' }, an
alias without a targetField, or a non-alias field carrying one.

Model it as a discriminated union: range requires number|date, alias requires
targetField, and only alias may carry one. Internal construction sites already
satisfy this (inferRangeType returns number|date; the one targetField reader is
guarded by kind === 'alias'). Source-breaking only for callers writing configs
that were already invalid at runtime.

Adds @ts-expect-error coverage for the three rejected forms.
…to query/

getEqualCandidates and computeFacetCounts were core query stages living inline
in the LyraBundle facade while every other stage (normalize, filters,
resolveAliases) was delegated to query/. Move them to query/candidates.ts as
pure functions taking the facet index, item store, and scratch buffers as a
workspace; query() stays the orchestrator.

- Dedup the single-vs-multi-value posting-list resolution into one
  resolveFieldPostings helper (was copy-pasted across the single-field and
  multi-field paths).
- Promote viewOf to array-operations.ts (shared by bundle + candidates).
- selectEqualCandidates is now a direct unit seam; add tests/query-candidates
  covering intersect/union/IN/fail-closed with hand-built posting lists.

Behavior unchanged — the property and query suites (incl. scratch-buffer
aliasing) pass as-is.
…r branches

- Extract LyraBundle.appliedView so the result's applied filter view is built
  in one place (was duplicated in query() and emptyResult()).
- Extract copyThrough in query/filters.ts for the identical empty-filter
  passthrough in all three filter stages.
- Add tests/codec.test.ts covering the truncated-varint, count-mismatch, and
  non-multiple-of-8 error branches (happy paths were already property-tested).

Deferred (noted for follow-up): generic LyraQuery<TItem> field-name typing,
the applied->requested rename/normalized-semantics, dropping the derived
LyraField.ops, and a shared isNullish predicate — each is either a larger API
or wire-format decision or risks mis-converting null-only checks.
Security hardening, manifest-validation consolidation, the FieldDefinition
discriminated union, and the query-core extraction. Bundle format unchanged
(BUNDLE_VERSION stays 4.1.0). Minor bump: source-breaking only for configs
that were already invalid at runtime.
…llution

Address CodeRabbit: the prior assertion checked a non-existent 'polluted'
property. Assert instead that the injected __proto__ facet value is queryable
as ordinary data (own key on the null-prototype map) and the global prototype
is untouched.
Regenerate baseline.json from this branch at full precision (5000 iters)
across the 19 gated query/micro scenarios — same scope as the prior baseline.
Query/alias/micro paths are flat vs the old reference; the dead-heuristic
removal also made serialize binary ~20% faster, though that scenario sits
outside the gated baseline set (informational in bench:diff).

Note: regenerated on a local machine — if bench gating runs in a canonical CI
environment, re-baseline there to avoid cross-machine skew.
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Lyra v4.1.0 hardens binary bundle deserialization against crafted inputs via checkSlot validation and null-prototype index maps, extracts equal-candidate selection and facet counting into a new src/query/candidates.ts module, converts FieldDefinition to a kind-discriminated union, centralizes manifest validation with a bijection check, introduces a facet-key codec, simplifies string column encoding, restructures CI with separate lint/typecheck/test jobs, and regenerates example bundle artifacts.

Changes

Lyra v4.1.0 release

Layer / File(s) Summary
FieldDefinition discriminated union and type tests
src/types.ts, tests/api-surface.types.test.ts
FieldDefinition is replaced by a kind-discriminated union that statically constrains targetField to alias, restricts range type to number|date, and excludes targetField from all other kinds. Type tests assert invalid shapes are rejected by the compiler and valid shapes still compile.
Facet-key codec and builder/bundle integration
src/query/facet-key.ts, src/utils/builders.ts, src/bundle.ts, tests/facet-key.test.ts
New encodeFacetKey/decodeFacetKey codec centralizes facet value stringification and typed decoding with numeric-epoch handling for dates. buildFacetIndex and getFacetSummary switch to the new codec; the old local parseFacetKey helper is removed. Tests cover round-trips, epoch regression, and getFacetSummary numeric output.
Query module extraction: candidates, filters, viewOf
src/query/candidates.ts, src/query/filters.ts, src/utils/array-operations.ts, src/bundle.ts, tests/query-candidates.test.ts
selectEqualCandidates (AND across fields, IN within field, smallest-first K-way intersection) and computeFacetCounts are extracted into src/query/candidates.ts. copyThrough is added to filters.ts to deduplicate inline passthrough copies. viewOf is promoted to array-operations.ts. LyraBundle.query() is rewired to use the extracted functions; inline getEqualCandidates, computeFacetCounts, and parseFacetKey methods are removed. Unit tests cover all candidate-selection behaviors.
Manifest validation centralization
src/utils/builders.ts, src/utils/type-inference.ts, tests/bundle.test.ts
validateManifest gains assertCapabilityCovers to enforce a fieldscapabilities bijection for facet, range, and alias kinds. buildManifest and fromSimpleConfig remove duplicate alias-target checks, deferring to validateManifest. A new test asserts load() rejects bundles with a missing facet in capabilities.facets.
Binary deserialization hardening
src/utils/binary-bundle.ts, src/utils/binary.ts, src/bundle.ts, tests/bundle.security.test.ts
checkSlot() validates off/len as non-negative integers within buffer bounds; sliceBlob now routes through it. decodeV4 builds facetIndex, nullIndex, and rangeColumns as null-prototype objects. BinaryReader.readF64View gains an early integer/non-negative guard. bundle.ts load() switches all index maps to Object.create(null). Security tests cover negative/fractional/large slot offsets, __proto__ field keys, and __proto__ value keys without prototype pollution.
Item-store string encoding simplification
src/utils/item-store.ts
encodeColumn unconditionally selects utf8-dict for pure-string columns, removing the distinct-count heuristic, DICT_DISTINCT_RATIO, DICT_MAX_DISTINCT, FLAG_NULL constants, and the countDistinctStrings helper.
CI restructuring, package version, examples, and release metadata
.github/workflows/ci.yml, package.json, CHANGELOG.md, examples/*, tests/bench/baseline.json, tests/codec.test.ts
CI gains workflow-level concurrency cancellation and splits into separate lint, typecheck, test jobs; bench and release now depend on all three with explicit build steps. package.json version set to 4.1.0 with new typecheck and test:ci scripts, and CHANGELOG.md added to release assets. Example configs rename facets to equal; bundle artifacts are regenerated with new index sections (nullIndex, rangeColumns, facetIndexBin, nullIndexBin). Codec error-branch tests and benchmark baseline are refreshed.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 Hop, hop! The bundles are armored now,
No crafty __proto__ shall slip through, I vow.
The codec encodes, the candidates sort,
Discriminated unions keep wild types in court.
Null-prototype maps stand guard at the gate—
v4.1.0 ships, and it feels really great! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 72.41% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title clearly captures the three main objectives of the PR: elegance refactor, security hardening, and automated releases. It accurately summarizes the key changes across the codebase including refactored query logic, hardened deserialization, updated CI/CD workflows, and new testing coverage.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/elegance-roadmap

Comment @coderabbitai help to get the list of available commands.

…s off the PR path

The single test job ran build -> test -> full benchmarks serially, so PR
feedback waited on the entire 5000-iteration bench suite (incl. ~70-90ms
serialize/load scenarios at no iteration cap) — slow and machine-flaky.

- Split into parallel lint, typecheck, and test jobs so PR status checks are
  granular and fail fast. lint now gates directly (build only runs lint:fix,
  which can't fail).
- Add 'typecheck' (tsc --noEmit) and 'test:ci' (vitest run) scripts.
- Run the full benchmark suite post-merge on main only (the canonical baseline
  env), not as a per-PR gate; release now needs [lint, typecheck, test].
- Add concurrency cancellation, --frozen-lockfile installs, and a bun.lock
  cache key.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/ci.yml (1)

19-22: ⚠️ Potential issue | 🔴 Critical | ⚖️ Poor tradeoff

Pin all GitHub Actions to commit SHAs for supply-chain security.

Actions referenced by major/minor version tags (e.g., @v4, @v1) can be force-pushed or retagged, allowing tag hijacking and supply-chain attacks. Pin each action to its full commit SHA.

Example for the lint job (apply to all other jobs):

       - uses: actions/checkout@v4
+        with:
+          ref: ${{ github.ref }}
+        # Pin to a specific commit SHA; e.g., actions/checkout@a81bbbf8298c0fa03ea29cdc473d45aca21b888bb
-      - uses: oven-sh/setup-bun@v1
+      - uses: oven-sh/setup-bun@1d4f4fba51b51821d65e85c5299425221f4ca37bd
       - name: Cache dependencies
         uses: actions/cache@v4
+        # Pin to SHA; e.g., actions/cache@704facf57c6b42eed0180f818edc32b025e91700

Also applies to: 36-39, 53-56, 78-81

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci.yml around lines 19 - 22, Pin all GitHub Actions to
specific commit SHAs instead of version tags for supply-chain security. Replace
the version-based references such as actions/checkout@v4, oven-sh/setup-bun@v1,
and actions/cache@v4 with their full commit SHAs (e.g.,
actions/checkout@abc123def456...). Apply this change to all instances across the
CI workflow file, including those referenced in the comment at lines 36-39,
53-56, and 78-81.

Source: Linters/SAST tools

🧹 Nitpick comments (1)
tests/bundle.security.test.ts (1)

49-83: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider adding a test case for invalid len values.

All slot validation tests tamper with off. Since checkSlot validates both off and len with the same logic, the code paths are exercised. However, adding one test for a negative or fractional len would make coverage explicit and guard against future regressions if the validation logic changes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/bundle.security.test.ts` around lines 49 - 83, The test suite currently
only validates invalid off (offset) values but lacks explicit test coverage for
invalid len (length) values. Add new test cases following the same pattern as
the existing tests for negative offset, fractional offset, and out-of-bounds
offset. Create analogous tests that tamper with the len property instead of off,
setting it to negative values, fractional values, and excessively large values,
ensuring each test expects the appropriate error message from
LyraBundle.loadBinary to maintain explicit coverage of the validation logic for
both off and len parameters.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/ci.yml:
- Line 19: Add the `persist-credentials: false` parameter to all four instances
of the `actions/checkout@v4` action throughout the workflow file (at lines 19,
36, 53, and 78). This prevents the GitHub Actions token from being stored in git
config and potentially captured in build artifacts or caches. For each checkout
step, add `with: persist-credentials: false` as a child property under the
`uses: actions/checkout@v4` line.
- Around line 32-47: The workflow jobs do not declare explicit permissions,
which means they inherit broad default GITHUB_TOKEN scope. Add a `permissions:
{}` declaration at the beginning of each job definition for typecheck, test,
benchmark, and lint jobs since they do not require token access. This restricts
their permissions to the minimum necessary and improves security. If the release
job requires write permissions, ensure it declares those explicitly with
appropriate `permissions:` instead of relying on defaults.

---

Outside diff comments:
In @.github/workflows/ci.yml:
- Around line 19-22: Pin all GitHub Actions to specific commit SHAs instead of
version tags for supply-chain security. Replace the version-based references
such as actions/checkout@v4, oven-sh/setup-bun@v1, and actions/cache@v4 with
their full commit SHAs (e.g., actions/checkout@abc123def456...). Apply this
change to all instances across the CI workflow file, including those referenced
in the comment at lines 36-39, 53-56, and 78-81.

---

Nitpick comments:
In `@tests/bundle.security.test.ts`:
- Around line 49-83: The test suite currently only validates invalid off
(offset) values but lacks explicit test coverage for invalid len (length)
values. Add new test cases following the same pattern as the existing tests for
negative offset, fractional offset, and out-of-bounds offset. Create analogous
tests that tamper with the len property instead of off, setting it to negative
values, fractional values, and excessively large values, ensuring each test
expects the appropriate error message from LyraBundle.loadBinary to maintain
explicit coverage of the validation logic for both off and len parameters.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 8309d1f2-fafd-4eb8-b87f-9aedd5fd8633

📥 Commits

Reviewing files that changed from the base of the PR and between 9c82443 and 344e083.

📒 Files selected for processing (26)
  • .github/workflows/ci.yml
  • CHANGELOG.md
  • examples/agent-tool/build-bundle.ts
  • examples/agent-tool/bundle.json
  • examples/basic-usage/README.md
  • examples/basic-usage/build-bundle.simple-config.ts
  • examples/basic-usage/bundle.json
  • package.json
  • src/bundle.ts
  • src/query/candidates.ts
  • src/query/facet-key.ts
  • src/query/filters.ts
  • src/types.ts
  • src/utils/array-operations.ts
  • src/utils/binary-bundle.ts
  • src/utils/binary.ts
  • src/utils/builders.ts
  • src/utils/item-store.ts
  • src/utils/type-inference.ts
  • tests/api-surface.types.test.ts
  • tests/bench/baseline.json
  • tests/bundle.security.test.ts
  • tests/bundle.test.ts
  • tests/codec.test.ts
  • tests/facet-key.test.ts
  • tests/query-candidates.test.ts

Comment thread .github/workflows/ci.yml
Comment thread .github/workflows/ci.yml
lawren added 2 commits June 22, 2026 15:53
The flat eslint config imports globals, @typescript-eslint/parser, and
@typescript-eslint/eslint-plugin, but none were declared dependencies — they
only resolved via hoisting in a long-lived node_modules. A clean
'bun install --frozen-lockfile' (CI) leaves them unhoisted, so the config
fails to load with ERR_MODULE_NOT_FOUND ('Cannot find package globals') and
eslint exits 2 — failing the Lint job and the Test job's build step (lint:fix).

Declare all three as devDependencies. Verified by reproducing CI locally
(rm -rf node_modules && bun install --frozen-lockfile): lint, build, test, and
typecheck all pass.
Address CodeRabbit review on the CI workflow:
- Add a top-level 'permissions: contents: read' so every job defaults to a
  read-only token; the release job keeps its explicit write block.
- Add 'persist-credentials: false' to the checkout step in the lint, typecheck,
  test, and benchmark jobs (none push). Release is left as-is — semantic-release
  pushes commits/tags there.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
.github/workflows/ci.yml (1)

23-23: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pin third-party GitHub Actions to immutable commit SHAs.

uses: ...@v* is still mutable in this workflow; pinning to full commit SHA is needed to prevent upstream tag-retarget supply-chain risk.

Also applies to: 26-26, 28-28, 42-42, 45-45, 47-47, 61-61, 64-64, 66-66, 88-88, 91-91, 93-93

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci.yml at line 23, Replace all mutable version tags (like
`@v4`) with immutable commit SHAs in the GitHub Actions workflow file. For the
actions/checkout action and all other third-party GitHub Actions referenced in
the workflow (including uses statements on lines 23, 26, 28, 42, 45, 47, 61, 64,
66, 88, 91, and 93), change the format from `uses: action-name@v*` to `uses:
action-name@<full-commit-sha>` where the full commit SHA is the immutable
reference for that action version. This prevents supply-chain attacks from
upstream tag retargeting.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In @.github/workflows/ci.yml:
- Line 23: Replace all mutable version tags (like `@v4`) with immutable commit
SHAs in the GitHub Actions workflow file. For the actions/checkout action and
all other third-party GitHub Actions referenced in the workflow (including uses
statements on lines 23, 26, 28, 42, 45, 47, 61, 64, 66, 88, 91, and 93), change
the format from `uses: action-name@v*` to `uses: action-name@<full-commit-sha>`
where the full commit SHA is the immutable reference for that action version.
This prevents supply-chain attacks from upstream tag retargeting.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5d32d005-b15f-4ade-bbda-dc33f347e707

📥 Commits

Reviewing files that changed from the base of the PR and between c070b64 and 2623bc7.

📒 Files selected for processing (1)
  • .github/workflows/ci.yml

Adopt semantic-release for versioning, changelog, and npm publish on merge to
main. The release config existed but was unused and incomplete:

- Add CHANGELOG.md to the @semantic-release/git assets so generated notes are
  committed back (was package.json + docs only).
- Build dist in the release job before publishing (package ships files:[dist]).
- Reset package.json to the last published version (4.1.0); semantic-release
  owns the version from here.
- Drop the never-released 5.0.0/5.1.0 CHANGELOG entries; semantic-release
  regenerates release notes from commits.

BREAKING CHANGE: the v5 line removes QuerySchemaOptions and the second argument
to buildQuerySchema, and makes FieldDefinition a discriminated union on kind
(range must be number|date; alias requires targetField; only alias may carry
one). The on-the-wire bundle format is unchanged; valid bundles load as before.
@lawren lawren changed the title Refactor/elegance roadmap feat!: v5 — elegance refactor, security hardening, and automated releases Jun 22, 2026
@lawren lawren merged commit e97c114 into main Jun 22, 2026
10 of 11 checks passed
@github-actions

Copy link
Copy Markdown

🎉 This PR is included in version 5.0.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant