Skip to content

Evaluate struct-of-arrays node storage for @pierre/path-store #776

@jamesarosen

Description

@jamesarosen

The following analysis was produced with AI assistance. If the team is interested in pursuing this further, I'm happy to do additional benchmarking with your guidance.

Background

The central node store in packages/path-store is an array of ~1M plain objects (nodes: PathStoreNode[]) plus per-directory childIds: number[] arrays and a directories: Map. Every field on PathStoreNode is already a numeric integer (internal-types.ts:45-51):

interface PathStoreNode {
  parentId; nameId; depthAndFlags; subtreeNodeCount; visibleSubtreeCount;
}

Hypothesis

Converting this to a Struct-of-Arrays (SoA) layout (five parallel Int32Arrays plus a flat CSR child
table) will meaningfully speed up the hot full-tree projection sweep (buildVisibleTreeProjectionDataDFS,
projection.ts:763) and the construction/count passes at ~1M nodes, by eliminating 1M heap objects, improving cache
locality, and removing GC pressure.

Findings (summary)

The SoA mechanism is confirmed, but the render-speedup framing is disproven:

  • ✅ ~3x faster on the isolated node DFS sweep (6.2ms → 2.0ms at 990K nodes), validated with an identical-work
    checksum.
  • ✅ ~1.85x less memory (35.7 MB → 19.3 MB for node objects alone; excludes the child arrays + Map that SoA also
    subsumes).
  • ✅ GC tail latency eliminated — AoS p95 is 15.6ms (7.5x its median); SoA p95 ≈ median (flat).
  • ❌ But only ~7% end-to-end on the render path. The real getVisibleSlice is ~58ms; node-field access is only ~10% of
    that. The other ~90% is path-string materialization + PathStoreVisibleRow object allocation, which SoA cannot touch.

Conclusion

Worth doing for memory footprint and frame-time consistency on huge trees, and it lands the full 3x on
the count-only sweeps. It is not the right lever to speed up getVisibleSlice (that path is string/allocation-bound —
optimize pathCacheByNodeId and per-row object allocation first).

Benchmark

packages/path-store/scripts/benchmarkNodeStorageSoA.ts (see below). Builds the real linux-10x snapshot via
PathStoreBuilder, mirrors it into SoA, and runs the identical preorder DFS over each representation, writing into
identical Int32Array output buffers and accumulating a checksum (checksumMatch=true confirms both sweeps do exactly
the same work; only storage layout differs).

Isolated node-sweep (all directories expanded, full tree):

Workload Nodes AoS media SoA media Speedup AoS p95 SoA p95
linux-1x 99K 0.55ms 0.20ms 2.77x 0.63 0.55
linux-5x 495K 2.80ms 1.00ms 2.81x 4.30 2.86
linux-10x 990K 6.24ms 1.97ms 3.12x 15.6 2.06

Memory (linux-10x, node objects only): AoS 35.7 MB vs SoA 19.3 MB (matches the 20-bytes/node analytical floor).

Real-projection context (linux-10x): getVisibleSlice(0, all) median 58ms; comparable isolated node-sweep 6.2ms → node
access ≈ 10% of the real path.

Where the full win lands (count-only sweeps, no string work):

  • initializeOpenVisibleCountsstore.ts:143
  • recomputeCountsRecursivecanonical.ts:342
  • StaticPathStore recompute on every expand/collapse — static-store.ts:361
  • builder subtree-count accumulation — builder.ts:1087

Where it does NOT help: the render projection DFS, because path strings (${parentPath}${segment}/) and
PathStoreVisibleRow allocation dominate.

Implementation sketch

Goal: replace the array-of-objects node store + per-directory number[]/Map with a typed-array SoA + flat CSR child
table, behind the existing internal-types.ts accessor helpers so call sites barely change.

  1. Node fields → parallel Int32Arrays (indexed by nodeId):
interface SoaNodeStore {
  parentId: Int32Array;
  nameId: Int32Array;
  depthAndFlags: Int32Array;   // keep the existing depth<<4 | kind<<3 | flags packing
  subtreeNodeCount: Int32Array;
  visibleSubtreeCount: Int32Array;
  length: number;              // logical node count (arrays may over-allocate)
}

Back all five with one ArrayBuffer (or keep separate — benchmark both). NodeId/SegmentId are already number and fit
Int32.

  1. Children → flat CSR table (replaces directories: Map<NodeId, {childIds: number[]}>):
childStart: Int32Array;   // childStart[id] = offset into childIdsFlat
childCount: Int32Array;   // childCount[id] = number of direct children
childIdsFlat: Int32Array; // all child ids, concatenated per-directory

This collapses 61K separate array allocations + a Map into 3 dense buffers. StaticPathStoreSnapshot.childIds
(static-store.ts:45) already proves this CSR layout works — it just uses number[] instead of Int32Array.

  1. Keep the accessor helpers (internal-types.ts:65-95) as the abstraction seam — change them from node.depthAndFlags
    to store.depthAndFlags[id]:
getNodeDepth(store, id)      => store.depthAndFlags[id] >>> 4
isDirectoryNode(store, id)   => (store.depthAndFlags[id] & (1<<3)) !== 0
getNodeFlags(store, id)      => store.depthAndFlags[id] & 0b111
addNodeFlag(store, id, flag) => store.depthAndFlags[id] |= flag

Most call sites already go through these helpers, so they migrate by passing (store, id) instead of (node).

  1. Mutation / growth = manual arena allocator. Replace Array.push of node objects with append-at-length into the
    typed arrays, doubling capacity (new Int32Array(cap*2); buf.set(old)) when full. For node removal, reuse the existing
    PATH_STORE_NODE_FLAG_REMOVED tombstone (internal-types.ts:19) and optionally a free-list of reclaimed ids; the CSR
    child table is rebuilt on structural mutation exactly as rebuildVisibleChildChunks already does for chunk sums
    (child-index.ts:273).

  2. Build path. PathStoreBuilder.finish (builder.ts:687) currently emits the AoS snapshot; add a finish variant that
    writes directly into the SoA buffers. The builder already assigns node IDs sequentially in DFS preorder for presorted
    ingest, so childStart/childCount fill in one forward pass.

  3. Scope / sequencing. This touches the core store, so gate it: land the benchmark first (done), then prototype SoA
    behind a flag and re-run benchmarkNodeStorageSoA.ts + benchmarkVisibleTreeProjection.ts to confirm the memory/GC win
    without regressing the string-bound render path. Cheap adjacent win to grab regardless: convert the existing
    StaticPathStoreSnapshot.childIds / childPositionByNodeId from number[] to Int32Array (static-store.ts:45-46) —
    near-zero risk, speeds the sibling-advance loop at static-store.ts:716.

Reproduce the benchmark:

cd packages/path-store
AGENT=1 bun run ./scripts/benchmarkNodeStorageSoA.ts --workload linux-10x --runs 30 --warmup-runs 8

Acceptance signal: SoA shows ≥1.7x lower retained memory and flat p95≈median on the node sweep, with no regression on
benchmarkVisibleTreeProjection.ts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions