Skip to content

Refactor/elegance roadmap#1

Closed
lawren wants to merge 11 commits into
mainfrom
refactor/elegance-roadmap
Closed

Refactor/elegance roadmap#1
lawren wants to merge 11 commits into
mainfrom
refactor/elegance-roadmap

Conversation

@lawren

@lawren lawren commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

No description provided.

lawren added 11 commits June 17, 2026 23:39
Examples used the deprecated `facets:` config key and shipped bundle.json
artifacts stamped version 1.0.0, which the v3/v4 loader rejects outright.

- Swap `facets:` → `equal:` in both build scripts and the basic-usage README
- Regenerate bundle.json artifacts (now version 4.1.0, loadable)
- Verified: both query-bundle.ts and agent-tool.ts run and return results
Both arms of the string-column branch returned encodeDictColumn (the
comment said 'fall through to dict regardless'), so the distinct-count
probe and its ratio constants had no effect. Delete countDistinctStrings,
DICT_DISTINCT_RATIO, DICT_MAX_DISTINCT, and the unused FLAG_NULL.

Behavior unchanged; ~25 lines removed.
Two contained issues on the malformed-bundle surface (load/loadBinary):

- Block slots took their off/len straight from the untrusted header with
  only an upper-bound check. A negative or fractional off slipped past it and
  made subarray reinterpret the range relative to the buffer end, yielding a
  view/posting-list/range-column over the wrong bytes. Add checkSlot()
  (non-negative safe integers within bounds) and route every raw slice through
  sliceBlob; guard BinaryReader.readF64View the same way.
- Bundle-controlled keys (field names, facet values) populated plain {} maps,
  so a "__proto__" key set the map's prototype instead of an own key — which
  hid it from the Object.keys-based capability allow-list (binary path) and
  corrupted lookups. Build facetIndex/nullIndex/rangeColumns/out with
  Object.create(null) on both the binary and JSON load paths.

Contained to crafted bundles (subarray clamps, so no OOB beyond the buffer;
no global prototype pollution), but both close silent-wrong-result holes.
Adds tests/bundle.security.test.ts with header-surgery regressions.
The facet value->key rule (String(value)) was copy-pasted across
buildFacetIndex, computeFacetCounts, and the getEqualCandidates lookups,
while its inverse (parseFacetKey) lived alone in bundle.ts and wasn't a true
inverse: an epoch-number date facet stringified to a numeric key that
Date.parse rejected, so getFacetSummary reported the raw string and skipped
its numeric sort.

- New src/query/facet-key.ts owns encodeFacetKey + decodeFacetKey as an
  inverse pair; date decode is number-first (epoch) then Date.parse (ISO).
- Route all encode sites and getFacetSummary's decode through it; delete
  parseFacetKey from bundle.ts.

Fixes the date-facet getFacetSummary bug (flagged by boundaries + SSOT +
testability). Adds facet-key codec + date-facet summary tests.
…t invariants

Alias-target validation was written three times (buildManifest inline,
validateManifest, and fromSimpleConfig) with divergent messages that could
drift. And validateManifest only checked capabilities ⊆ fields, not the
reverse — a facet/range/alias field omitted from capabilities passed
validation yet was silently unqueryable.

- Delete the inline alias checks in buildManifest and the duplicate in
  fromSimpleConfig; validateManifest now owns all structural invariants.
- Add the kind<->capability bijection assertion so capabilities is a checked
  projection of field kinds in both directions.

Adds a hand-edited-manifest regression for the dropped-facet case.
Previously a flat product type, so invalid combos compiled and only failed
(or silently mis-built) at runtime: { kind: 'range', type: 'string' }, an
alias without a targetField, or a non-alias field carrying one.

Model it as a discriminated union: range requires number|date, alias requires
targetField, and only alias may carry one. Internal construction sites already
satisfy this (inferRangeType returns number|date; the one targetField reader is
guarded by kind === 'alias'). Source-breaking only for callers writing configs
that were already invalid at runtime.

Adds @ts-expect-error coverage for the three rejected forms.
…to query/

getEqualCandidates and computeFacetCounts were core query stages living inline
in the LyraBundle facade while every other stage (normalize, filters,
resolveAliases) was delegated to query/. Move them to query/candidates.ts as
pure functions taking the facet index, item store, and scratch buffers as a
workspace; query() stays the orchestrator.

- Dedup the single-vs-multi-value posting-list resolution into one
  resolveFieldPostings helper (was copy-pasted across the single-field and
  multi-field paths).
- Promote viewOf to array-operations.ts (shared by bundle + candidates).
- selectEqualCandidates is now a direct unit seam; add tests/query-candidates
  covering intersect/union/IN/fail-closed with hand-built posting lists.

Behavior unchanged — the property and query suites (incl. scratch-buffer
aliasing) pass as-is.
…r branches

- Extract LyraBundle.appliedView so the result's applied filter view is built
  in one place (was duplicated in query() and emptyResult()).
- Extract copyThrough in query/filters.ts for the identical empty-filter
  passthrough in all three filter stages.
- Add tests/codec.test.ts covering the truncated-varint, count-mismatch, and
  non-multiple-of-8 error branches (happy paths were already property-tested).

Deferred (noted for follow-up): generic LyraQuery<TItem> field-name typing,
the applied->requested rename/normalized-semantics, dropping the derived
LyraField.ops, and a shared isNullish predicate — each is either a larger API
or wire-format decision or risks mis-converting null-only checks.
Security hardening, manifest-validation consolidation, the FieldDefinition
discriminated union, and the query-core extraction. Bundle format unchanged
(BUNDLE_VERSION stays 4.1.0). Minor bump: source-breaking only for configs
that were already invalid at runtime.
…llution

Address CodeRabbit: the prior assertion checked a non-existent 'polluted'
property. Assert instead that the injected __proto__ facet value is queryable
as ordinary data (own key on the null-prototype map) and the global prototype
is untouched.
Regenerate baseline.json from this branch at full precision (5000 iters)
across the 19 gated query/micro scenarios — same scope as the prior baseline.
Query/alias/micro paths are flat vs the old reference; the dead-heuristic
removal also made serialize binary ~20% faster, though that scenario sits
outside the gated baseline set (informational in bench:diff).

Note: regenerated on a local machine — if bench gating runs in a canonical CI
environment, re-baseline there to avoid cross-machine skew.
@lawren lawren closed this Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant