Refactor/elegance roadmap#1
Closed
lawren wants to merge 11 commits into
Closed
Conversation
Examples used the deprecated `facets:` config key and shipped bundle.json artifacts stamped version 1.0.0, which the v3/v4 loader rejects outright. - Swap `facets:` → `equal:` in both build scripts and the basic-usage README - Regenerate bundle.json artifacts (now version 4.1.0, loadable) - Verified: both query-bundle.ts and agent-tool.ts run and return results
Both arms of the string-column branch returned encodeDictColumn (the comment said 'fall through to dict regardless'), so the distinct-count probe and its ratio constants had no effect. Delete countDistinctStrings, DICT_DISTINCT_RATIO, DICT_MAX_DISTINCT, and the unused FLAG_NULL. Behavior unchanged; ~25 lines removed.
Two contained issues on the malformed-bundle surface (load/loadBinary):
- Block slots took their off/len straight from the untrusted header with
only an upper-bound check. A negative or fractional off slipped past it and
made subarray reinterpret the range relative to the buffer end, yielding a
view/posting-list/range-column over the wrong bytes. Add checkSlot()
(non-negative safe integers within bounds) and route every raw slice through
sliceBlob; guard BinaryReader.readF64View the same way.
- Bundle-controlled keys (field names, facet values) populated plain {} maps,
so a "__proto__" key set the map's prototype instead of an own key — which
hid it from the Object.keys-based capability allow-list (binary path) and
corrupted lookups. Build facetIndex/nullIndex/rangeColumns/out with
Object.create(null) on both the binary and JSON load paths.
Contained to crafted bundles (subarray clamps, so no OOB beyond the buffer;
no global prototype pollution), but both close silent-wrong-result holes.
Adds tests/bundle.security.test.ts with header-surgery regressions.
The facet value->key rule (String(value)) was copy-pasted across buildFacetIndex, computeFacetCounts, and the getEqualCandidates lookups, while its inverse (parseFacetKey) lived alone in bundle.ts and wasn't a true inverse: an epoch-number date facet stringified to a numeric key that Date.parse rejected, so getFacetSummary reported the raw string and skipped its numeric sort. - New src/query/facet-key.ts owns encodeFacetKey + decodeFacetKey as an inverse pair; date decode is number-first (epoch) then Date.parse (ISO). - Route all encode sites and getFacetSummary's decode through it; delete parseFacetKey from bundle.ts. Fixes the date-facet getFacetSummary bug (flagged by boundaries + SSOT + testability). Adds facet-key codec + date-facet summary tests.
…t invariants Alias-target validation was written three times (buildManifest inline, validateManifest, and fromSimpleConfig) with divergent messages that could drift. And validateManifest only checked capabilities ⊆ fields, not the reverse — a facet/range/alias field omitted from capabilities passed validation yet was silently unqueryable. - Delete the inline alias checks in buildManifest and the duplicate in fromSimpleConfig; validateManifest now owns all structural invariants. - Add the kind<->capability bijection assertion so capabilities is a checked projection of field kinds in both directions. Adds a hand-edited-manifest regression for the dropped-facet case.
Previously a flat product type, so invalid combos compiled and only failed
(or silently mis-built) at runtime: { kind: 'range', type: 'string' }, an
alias without a targetField, or a non-alias field carrying one.
Model it as a discriminated union: range requires number|date, alias requires
targetField, and only alias may carry one. Internal construction sites already
satisfy this (inferRangeType returns number|date; the one targetField reader is
guarded by kind === 'alias'). Source-breaking only for callers writing configs
that were already invalid at runtime.
Adds @ts-expect-error coverage for the three rejected forms.
…to query/ getEqualCandidates and computeFacetCounts were core query stages living inline in the LyraBundle facade while every other stage (normalize, filters, resolveAliases) was delegated to query/. Move them to query/candidates.ts as pure functions taking the facet index, item store, and scratch buffers as a workspace; query() stays the orchestrator. - Dedup the single-vs-multi-value posting-list resolution into one resolveFieldPostings helper (was copy-pasted across the single-field and multi-field paths). - Promote viewOf to array-operations.ts (shared by bundle + candidates). - selectEqualCandidates is now a direct unit seam; add tests/query-candidates covering intersect/union/IN/fail-closed with hand-built posting lists. Behavior unchanged — the property and query suites (incl. scratch-buffer aliasing) pass as-is.
…r branches - Extract LyraBundle.appliedView so the result's applied filter view is built in one place (was duplicated in query() and emptyResult()). - Extract copyThrough in query/filters.ts for the identical empty-filter passthrough in all three filter stages. - Add tests/codec.test.ts covering the truncated-varint, count-mismatch, and non-multiple-of-8 error branches (happy paths were already property-tested). Deferred (noted for follow-up): generic LyraQuery<TItem> field-name typing, the applied->requested rename/normalized-semantics, dropping the derived LyraField.ops, and a shared isNullish predicate — each is either a larger API or wire-format decision or risks mis-converting null-only checks.
Security hardening, manifest-validation consolidation, the FieldDefinition discriminated union, and the query-core extraction. Bundle format unchanged (BUNDLE_VERSION stays 4.1.0). Minor bump: source-breaking only for configs that were already invalid at runtime.
…llution Address CodeRabbit: the prior assertion checked a non-existent 'polluted' property. Assert instead that the injected __proto__ facet value is queryable as ordinary data (own key on the null-prototype map) and the global prototype is untouched.
Regenerate baseline.json from this branch at full precision (5000 iters) across the 19 gated query/micro scenarios — same scope as the prior baseline. Query/alias/micro paths are flat vs the old reference; the dead-heuristic removal also made serialize binary ~20% faster, though that scenario sits outside the gated baseline set (informational in bench:diff). Note: regenerated on a local machine — if bench gating runs in a canonical CI environment, re-baseline there to avoid cross-machine skew.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.