feat!: v5 — elegance refactor, security hardening, and automated releases#2
Conversation
Examples used the deprecated `facets:` config key and shipped bundle.json artifacts stamped version 1.0.0, which the v3/v4 loader rejects outright. - Swap `facets:` → `equal:` in both build scripts and the basic-usage README - Regenerate bundle.json artifacts (now version 4.1.0, loadable) - Verified: both query-bundle.ts and agent-tool.ts run and return results
Both arms of the string-column branch returned encodeDictColumn (the comment said 'fall through to dict regardless'), so the distinct-count probe and its ratio constants had no effect. Delete countDistinctStrings, DICT_DISTINCT_RATIO, DICT_MAX_DISTINCT, and the unused FLAG_NULL. Behavior unchanged; ~25 lines removed.
Two contained issues on the malformed-bundle surface (load/loadBinary):
- Block slots took their off/len straight from the untrusted header with
only an upper-bound check. A negative or fractional off slipped past it and
made subarray reinterpret the range relative to the buffer end, yielding a
view/posting-list/range-column over the wrong bytes. Add checkSlot()
(non-negative safe integers within bounds) and route every raw slice through
sliceBlob; guard BinaryReader.readF64View the same way.
- Bundle-controlled keys (field names, facet values) populated plain {} maps,
so a "__proto__" key set the map's prototype instead of an own key — which
hid it from the Object.keys-based capability allow-list (binary path) and
corrupted lookups. Build facetIndex/nullIndex/rangeColumns/out with
Object.create(null) on both the binary and JSON load paths.
Contained to crafted bundles (subarray clamps, so no OOB beyond the buffer;
no global prototype pollution), but both close silent-wrong-result holes.
Adds tests/bundle.security.test.ts with header-surgery regressions.
The facet value->key rule (String(value)) was copy-pasted across buildFacetIndex, computeFacetCounts, and the getEqualCandidates lookups, while its inverse (parseFacetKey) lived alone in bundle.ts and wasn't a true inverse: an epoch-number date facet stringified to a numeric key that Date.parse rejected, so getFacetSummary reported the raw string and skipped its numeric sort. - New src/query/facet-key.ts owns encodeFacetKey + decodeFacetKey as an inverse pair; date decode is number-first (epoch) then Date.parse (ISO). - Route all encode sites and getFacetSummary's decode through it; delete parseFacetKey from bundle.ts. Fixes the date-facet getFacetSummary bug (flagged by boundaries + SSOT + testability). Adds facet-key codec + date-facet summary tests.
…t invariants Alias-target validation was written three times (buildManifest inline, validateManifest, and fromSimpleConfig) with divergent messages that could drift. And validateManifest only checked capabilities ⊆ fields, not the reverse — a facet/range/alias field omitted from capabilities passed validation yet was silently unqueryable. - Delete the inline alias checks in buildManifest and the duplicate in fromSimpleConfig; validateManifest now owns all structural invariants. - Add the kind<->capability bijection assertion so capabilities is a checked projection of field kinds in both directions. Adds a hand-edited-manifest regression for the dropped-facet case.
Previously a flat product type, so invalid combos compiled and only failed
(or silently mis-built) at runtime: { kind: 'range', type: 'string' }, an
alias without a targetField, or a non-alias field carrying one.
Model it as a discriminated union: range requires number|date, alias requires
targetField, and only alias may carry one. Internal construction sites already
satisfy this (inferRangeType returns number|date; the one targetField reader is
guarded by kind === 'alias'). Source-breaking only for callers writing configs
that were already invalid at runtime.
Adds @ts-expect-error coverage for the three rejected forms.
…to query/ getEqualCandidates and computeFacetCounts were core query stages living inline in the LyraBundle facade while every other stage (normalize, filters, resolveAliases) was delegated to query/. Move them to query/candidates.ts as pure functions taking the facet index, item store, and scratch buffers as a workspace; query() stays the orchestrator. - Dedup the single-vs-multi-value posting-list resolution into one resolveFieldPostings helper (was copy-pasted across the single-field and multi-field paths). - Promote viewOf to array-operations.ts (shared by bundle + candidates). - selectEqualCandidates is now a direct unit seam; add tests/query-candidates covering intersect/union/IN/fail-closed with hand-built posting lists. Behavior unchanged — the property and query suites (incl. scratch-buffer aliasing) pass as-is.
…r branches - Extract LyraBundle.appliedView so the result's applied filter view is built in one place (was duplicated in query() and emptyResult()). - Extract copyThrough in query/filters.ts for the identical empty-filter passthrough in all three filter stages. - Add tests/codec.test.ts covering the truncated-varint, count-mismatch, and non-multiple-of-8 error branches (happy paths were already property-tested). Deferred (noted for follow-up): generic LyraQuery<TItem> field-name typing, the applied->requested rename/normalized-semantics, dropping the derived LyraField.ops, and a shared isNullish predicate — each is either a larger API or wire-format decision or risks mis-converting null-only checks.
Security hardening, manifest-validation consolidation, the FieldDefinition discriminated union, and the query-core extraction. Bundle format unchanged (BUNDLE_VERSION stays 4.1.0). Minor bump: source-breaking only for configs that were already invalid at runtime.
…llution Address CodeRabbit: the prior assertion checked a non-existent 'polluted' property. Assert instead that the injected __proto__ facet value is queryable as ordinary data (own key on the null-prototype map) and the global prototype is untouched.
Regenerate baseline.json from this branch at full precision (5000 iters) across the 19 gated query/micro scenarios — same scope as the prior baseline. Query/alias/micro paths are flat vs the old reference; the dead-heuristic removal also made serialize binary ~20% faster, though that scenario sits outside the gated baseline set (informational in bench:diff). Note: regenerated on a local machine — if bench gating runs in a canonical CI environment, re-baseline there to avoid cross-machine skew.
📝 WalkthroughWalkthroughLyra v4.1.0 hardens binary bundle deserialization against crafted inputs via ChangesLyra v4.1.0 release
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
…s off the PR path The single test job ran build -> test -> full benchmarks serially, so PR feedback waited on the entire 5000-iteration bench suite (incl. ~70-90ms serialize/load scenarios at no iteration cap) — slow and machine-flaky. - Split into parallel lint, typecheck, and test jobs so PR status checks are granular and fail fast. lint now gates directly (build only runs lint:fix, which can't fail). - Add 'typecheck' (tsc --noEmit) and 'test:ci' (vitest run) scripts. - Run the full benchmark suite post-merge on main only (the canonical baseline env), not as a per-PR gate; release now needs [lint, typecheck, test]. - Add concurrency cancellation, --frozen-lockfile installs, and a bun.lock cache key.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/ci.yml (1)
19-22:⚠️ Potential issue | 🔴 Critical | ⚖️ Poor tradeoffPin all GitHub Actions to commit SHAs for supply-chain security.
Actions referenced by major/minor version tags (e.g.,
@v4,@v1) can be force-pushed or retagged, allowing tag hijacking and supply-chain attacks. Pin each action to its full commit SHA.Example for the
lintjob (apply to all other jobs):- uses: actions/checkout@v4 + with: + ref: ${{ github.ref }} + # Pin to a specific commit SHA; e.g., actions/checkout@a81bbbf8298c0fa03ea29cdc473d45aca21b888bb - - uses: oven-sh/setup-bun@v1 + - uses: oven-sh/setup-bun@1d4f4fba51b51821d65e85c5299425221f4ca37bd - name: Cache dependencies uses: actions/cache@v4 + # Pin to SHA; e.g., actions/cache@704facf57c6b42eed0180f818edc32b025e91700Also applies to: 36-39, 53-56, 78-81
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/ci.yml around lines 19 - 22, Pin all GitHub Actions to specific commit SHAs instead of version tags for supply-chain security. Replace the version-based references such as actions/checkout@v4, oven-sh/setup-bun@v1, and actions/cache@v4 with their full commit SHAs (e.g., actions/checkout@abc123def456...). Apply this change to all instances across the CI workflow file, including those referenced in the comment at lines 36-39, 53-56, and 78-81.Source: Linters/SAST tools
🧹 Nitpick comments (1)
tests/bundle.security.test.ts (1)
49-83: 🧹 Nitpick | 🔵 Trivial | 💤 Low valueConsider adding a test case for invalid
lenvalues.All slot validation tests tamper with
off. SincecheckSlotvalidates bothoffandlenwith the same logic, the code paths are exercised. However, adding one test for a negative or fractionallenwould make coverage explicit and guard against future regressions if the validation logic changes.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/bundle.security.test.ts` around lines 49 - 83, The test suite currently only validates invalid off (offset) values but lacks explicit test coverage for invalid len (length) values. Add new test cases following the same pattern as the existing tests for negative offset, fractional offset, and out-of-bounds offset. Create analogous tests that tamper with the len property instead of off, setting it to negative values, fractional values, and excessively large values, ensuring each test expects the appropriate error message from LyraBundle.loadBinary to maintain explicit coverage of the validation logic for both off and len parameters.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/ci.yml:
- Line 19: Add the `persist-credentials: false` parameter to all four instances
of the `actions/checkout@v4` action throughout the workflow file (at lines 19,
36, 53, and 78). This prevents the GitHub Actions token from being stored in git
config and potentially captured in build artifacts or caches. For each checkout
step, add `with: persist-credentials: false` as a child property under the
`uses: actions/checkout@v4` line.
- Around line 32-47: The workflow jobs do not declare explicit permissions,
which means they inherit broad default GITHUB_TOKEN scope. Add a `permissions:
{}` declaration at the beginning of each job definition for typecheck, test,
benchmark, and lint jobs since they do not require token access. This restricts
their permissions to the minimum necessary and improves security. If the release
job requires write permissions, ensure it declares those explicitly with
appropriate `permissions:` instead of relying on defaults.
---
Outside diff comments:
In @.github/workflows/ci.yml:
- Around line 19-22: Pin all GitHub Actions to specific commit SHAs instead of
version tags for supply-chain security. Replace the version-based references
such as actions/checkout@v4, oven-sh/setup-bun@v1, and actions/cache@v4 with
their full commit SHAs (e.g., actions/checkout@abc123def456...). Apply this
change to all instances across the CI workflow file, including those referenced
in the comment at lines 36-39, 53-56, and 78-81.
---
Nitpick comments:
In `@tests/bundle.security.test.ts`:
- Around line 49-83: The test suite currently only validates invalid off
(offset) values but lacks explicit test coverage for invalid len (length)
values. Add new test cases following the same pattern as the existing tests for
negative offset, fractional offset, and out-of-bounds offset. Create analogous
tests that tamper with the len property instead of off, setting it to negative
values, fractional values, and excessively large values, ensuring each test
expects the appropriate error message from LyraBundle.loadBinary to maintain
explicit coverage of the validation logic for both off and len parameters.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 8309d1f2-fafd-4eb8-b87f-9aedd5fd8633
📒 Files selected for processing (26)
.github/workflows/ci.ymlCHANGELOG.mdexamples/agent-tool/build-bundle.tsexamples/agent-tool/bundle.jsonexamples/basic-usage/README.mdexamples/basic-usage/build-bundle.simple-config.tsexamples/basic-usage/bundle.jsonpackage.jsonsrc/bundle.tssrc/query/candidates.tssrc/query/facet-key.tssrc/query/filters.tssrc/types.tssrc/utils/array-operations.tssrc/utils/binary-bundle.tssrc/utils/binary.tssrc/utils/builders.tssrc/utils/item-store.tssrc/utils/type-inference.tstests/api-surface.types.test.tstests/bench/baseline.jsontests/bundle.security.test.tstests/bundle.test.tstests/codec.test.tstests/facet-key.test.tstests/query-candidates.test.ts
The flat eslint config imports globals, @typescript-eslint/parser, and
@typescript-eslint/eslint-plugin, but none were declared dependencies — they
only resolved via hoisting in a long-lived node_modules. A clean
'bun install --frozen-lockfile' (CI) leaves them unhoisted, so the config
fails to load with ERR_MODULE_NOT_FOUND ('Cannot find package globals') and
eslint exits 2 — failing the Lint job and the Test job's build step (lint:fix).
Declare all three as devDependencies. Verified by reproducing CI locally
(rm -rf node_modules && bun install --frozen-lockfile): lint, build, test, and
typecheck all pass.
Address CodeRabbit review on the CI workflow: - Add a top-level 'permissions: contents: read' so every job defaults to a read-only token; the release job keeps its explicit write block. - Add 'persist-credentials: false' to the checkout step in the lint, typecheck, test, and benchmark jobs (none push). Release is left as-is — semantic-release pushes commits/tags there.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
.github/workflows/ci.yml (1)
23-23:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPin third-party GitHub Actions to immutable commit SHAs.
uses: ...@v*is still mutable in this workflow; pinning to full commit SHA is needed to prevent upstream tag-retarget supply-chain risk.Also applies to: 26-26, 28-28, 42-42, 45-45, 47-47, 61-61, 64-64, 66-66, 88-88, 91-91, 93-93
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/ci.yml at line 23, Replace all mutable version tags (like `@v4`) with immutable commit SHAs in the GitHub Actions workflow file. For the actions/checkout action and all other third-party GitHub Actions referenced in the workflow (including uses statements on lines 23, 26, 28, 42, 45, 47, 61, 64, 66, 88, 91, and 93), change the format from `uses: action-name@v*` to `uses: action-name@<full-commit-sha>` where the full commit SHA is the immutable reference for that action version. This prevents supply-chain attacks from upstream tag retargeting.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In @.github/workflows/ci.yml:
- Line 23: Replace all mutable version tags (like `@v4`) with immutable commit
SHAs in the GitHub Actions workflow file. For the actions/checkout action and
all other third-party GitHub Actions referenced in the workflow (including uses
statements on lines 23, 26, 28, 42, 45, 47, 61, 64, 66, 88, 91, and 93), change
the format from `uses: action-name@v*` to `uses: action-name@<full-commit-sha>`
where the full commit SHA is the immutable reference for that action version.
This prevents supply-chain attacks from upstream tag retargeting.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 5d32d005-b15f-4ade-bbda-dc33f347e707
📒 Files selected for processing (1)
.github/workflows/ci.yml
Adopt semantic-release for versioning, changelog, and npm publish on merge to main. The release config existed but was unused and incomplete: - Add CHANGELOG.md to the @semantic-release/git assets so generated notes are committed back (was package.json + docs only). - Build dist in the release job before publishing (package ships files:[dist]). - Reset package.json to the last published version (4.1.0); semantic-release owns the version from here. - Drop the never-released 5.0.0/5.1.0 CHANGELOG entries; semantic-release regenerates release notes from commits. BREAKING CHANGE: the v5 line removes QuerySchemaOptions and the second argument to buildQuerySchema, and makes FieldDefinition a discriminated union on kind (range must be number|date; alias requires targetField; only alias may carry one). The on-the-wire bundle format is unchanged; valid bundles load as before.
|
🎉 This PR is included in version 5.0.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Summary by CodeRabbit
Security
Changed
equal(replacingfacets).FieldDefinitionis now a discriminated union for safer compile-time field definitions.Documentation
equal.Chores / Tests