The following analysis was produced with AI assistance. If the team is interested in pursuing this further, I'm happy to do additional benchmarking with your guidance.
Background
The central node store in packages/path-store is an array of ~1M plain objects (nodes: PathStoreNode[]) plus per-directory childIds: number[] arrays and a directories: Map. Every field on PathStoreNode is already a numeric integer (internal-types.ts:45-51):
interface PathStoreNode {
parentId; nameId; depthAndFlags; subtreeNodeCount; visibleSubtreeCount;
}
Hypothesis
Converting this to a Struct-of-Arrays (SoA) layout (five parallel Int32Arrays plus a flat CSR child
table) will meaningfully speed up the hot full-tree projection sweep (buildVisibleTreeProjectionDataDFS,
projection.ts:763) and the construction/count passes at ~1M nodes, by eliminating 1M heap objects, improving cache
locality, and removing GC pressure.
Findings (summary)
The SoA mechanism is confirmed, but the render-speedup framing is disproven:
- ✅ ~3x faster on the isolated node DFS sweep (6.2ms → 2.0ms at 990K nodes), validated with an identical-work
checksum.
- ✅ ~1.85x less memory (35.7 MB → 19.3 MB for node objects alone; excludes the child arrays + Map that SoA also
subsumes).
- ✅ GC tail latency eliminated — AoS p95 is 15.6ms (7.5x its median); SoA p95 ≈ median (flat).
- ❌ But only ~7% end-to-end on the render path. The real getVisibleSlice is ~58ms; node-field access is only ~10% of
that. The other ~90% is path-string materialization + PathStoreVisibleRow object allocation, which SoA cannot touch.
Conclusion
Worth doing for memory footprint and frame-time consistency on huge trees, and it lands the full 3x on
the count-only sweeps. It is not the right lever to speed up getVisibleSlice (that path is string/allocation-bound —
optimize pathCacheByNodeId and per-row object allocation first).
Benchmark
packages/path-store/scripts/benchmarkNodeStorageSoA.ts (see below). Builds the real linux-10x snapshot via
PathStoreBuilder, mirrors it into SoA, and runs the identical preorder DFS over each representation, writing into
identical Int32Array output buffers and accumulating a checksum (checksumMatch=true confirms both sweeps do exactly
the same work; only storage layout differs).
Isolated node-sweep (all directories expanded, full tree):
| Workload |
Nodes |
AoS media |
SoA media |
Speedup |
AoS p95 |
SoA p95 |
| linux-1x |
99K |
0.55ms |
0.20ms |
2.77x |
0.63 |
0.55 |
| linux-5x |
495K |
2.80ms |
1.00ms |
2.81x |
4.30 |
2.86 |
| linux-10x |
990K |
6.24ms |
1.97ms |
3.12x |
15.6 |
2.06 |
Memory (linux-10x, node objects only): AoS 35.7 MB vs SoA 19.3 MB (matches the 20-bytes/node analytical floor).
Real-projection context (linux-10x): getVisibleSlice(0, all) median 58ms; comparable isolated node-sweep 6.2ms → node
access ≈ 10% of the real path.
Where the full win lands (count-only sweeps, no string work):
initializeOpenVisibleCounts — store.ts:143
recomputeCountsRecursive — canonical.ts:342
StaticPathStore recompute on every expand/collapse — static-store.ts:361
- builder subtree-count accumulation —
builder.ts:1087
Where it does NOT help: the render projection DFS, because path strings (${parentPath}${segment}/) and
PathStoreVisibleRow allocation dominate.
Implementation sketch
Goal: replace the array-of-objects node store + per-directory number[]/Map with a typed-array SoA + flat CSR child
table, behind the existing internal-types.ts accessor helpers so call sites barely change.
- Node fields → parallel
Int32Arrays (indexed by nodeId):
interface SoaNodeStore {
parentId: Int32Array;
nameId: Int32Array;
depthAndFlags: Int32Array; // keep the existing depth<<4 | kind<<3 | flags packing
subtreeNodeCount: Int32Array;
visibleSubtreeCount: Int32Array;
length: number; // logical node count (arrays may over-allocate)
}
Back all five with one ArrayBuffer (or keep separate — benchmark both). NodeId/SegmentId are already number and fit
Int32.
- Children → flat CSR table (replaces
directories: Map<NodeId, {childIds: number[]}>):
childStart: Int32Array; // childStart[id] = offset into childIdsFlat
childCount: Int32Array; // childCount[id] = number of direct children
childIdsFlat: Int32Array; // all child ids, concatenated per-directory
This collapses 61K separate array allocations + a Map into 3 dense buffers. StaticPathStoreSnapshot.childIds
(static-store.ts:45) already proves this CSR layout works — it just uses number[] instead of Int32Array.
- Keep the accessor helpers (
internal-types.ts:65-95) as the abstraction seam — change them from node.depthAndFlags
to store.depthAndFlags[id]:
getNodeDepth(store, id) => store.depthAndFlags[id] >>> 4
isDirectoryNode(store, id) => (store.depthAndFlags[id] & (1<<3)) !== 0
getNodeFlags(store, id) => store.depthAndFlags[id] & 0b111
addNodeFlag(store, id, flag) => store.depthAndFlags[id] |= flag
Most call sites already go through these helpers, so they migrate by passing (store, id) instead of (node).
-
Mutation / growth = manual arena allocator. Replace Array.push of node objects with append-at-length into the
typed arrays, doubling capacity (new Int32Array(cap*2); buf.set(old)) when full. For node removal, reuse the existing
PATH_STORE_NODE_FLAG_REMOVED tombstone (internal-types.ts:19) and optionally a free-list of reclaimed ids; the CSR
child table is rebuilt on structural mutation exactly as rebuildVisibleChildChunks already does for chunk sums
(child-index.ts:273).
-
Build path. PathStoreBuilder.finish (builder.ts:687) currently emits the AoS snapshot; add a finish variant that
writes directly into the SoA buffers. The builder already assigns node IDs sequentially in DFS preorder for presorted
ingest, so childStart/childCount fill in one forward pass.
-
Scope / sequencing. This touches the core store, so gate it: land the benchmark first (done), then prototype SoA
behind a flag and re-run benchmarkNodeStorageSoA.ts + benchmarkVisibleTreeProjection.ts to confirm the memory/GC win
without regressing the string-bound render path. Cheap adjacent win to grab regardless: convert the existing
StaticPathStoreSnapshot.childIds / childPositionByNodeId from number[] to Int32Array (static-store.ts:45-46) —
near-zero risk, speeds the sibling-advance loop at static-store.ts:716.
Reproduce the benchmark:
cd packages/path-store
AGENT=1 bun run ./scripts/benchmarkNodeStorageSoA.ts --workload linux-10x --runs 30 --warmup-runs 8
Acceptance signal: SoA shows ≥1.7x lower retained memory and flat p95≈median on the node sweep, with no regression on
benchmarkVisibleTreeProjection.ts.
The following analysis was produced with AI assistance. If the team is interested in pursuing this further, I'm happy to do additional benchmarking with your guidance.
Background
The central node store in
packages/path-storeis an array of ~1M plain objects (nodes: PathStoreNode[]) plus per-directorychildIds: number[]arrays and adirectories: Map. Every field onPathStoreNodeis already a numeric integer (internal-types.ts:45-51):Hypothesis
Converting this to a Struct-of-Arrays (SoA) layout (five parallel
Int32Arrays plus a flat CSR childtable) will meaningfully speed up the hot full-tree projection sweep (
buildVisibleTreeProjectionDataDFS,projection.ts:763) and the construction/count passes at ~1M nodes, by eliminating 1M heap objects, improving cachelocality, and removing GC pressure.
Findings (summary)
The SoA mechanism is confirmed, but the render-speedup framing is disproven:
checksum.
subsumes).
that. The other ~90% is path-string materialization + PathStoreVisibleRow object allocation, which SoA cannot touch.
Conclusion
Worth doing for memory footprint and frame-time consistency on huge trees, and it lands the full 3x on
the count-only sweeps. It is not the right lever to speed up
getVisibleSlice(that path is string/allocation-bound —optimize
pathCacheByNodeIdand per-row object allocation first).Benchmark
packages/path-store/scripts/benchmarkNodeStorageSoA.ts(see below). Builds the real linux-10x snapshot viaPathStoreBuilder, mirrors it into SoA, and runs the identical preorder DFS over each representation, writing intoidentical
Int32Arrayoutput buffers and accumulating a checksum (checksumMatch=trueconfirms both sweeps do exactlythe same work; only storage layout differs).
Isolated node-sweep (all directories expanded, full tree):
Memory (linux-10x, node objects only): AoS 35.7 MB vs SoA 19.3 MB (matches the 20-bytes/node analytical floor).
Real-projection context (linux-10x):
getVisibleSlice(0, all)median 58ms; comparable isolated node-sweep 6.2ms → nodeaccess ≈ 10% of the real path.
Where the full win lands (count-only sweeps, no string work):
initializeOpenVisibleCounts—store.ts:143recomputeCountsRecursive—canonical.ts:342StaticPathStorerecompute on every expand/collapse —static-store.ts:361builder.ts:1087Where it does NOT help: the render projection DFS, because path strings (
${parentPath}${segment}/) andPathStoreVisibleRowallocation dominate.Implementation sketch
Goal: replace the array-of-objects node store + per-directory
number[]/Mapwith a typed-array SoA + flat CSR childtable, behind the existing internal-types.ts accessor helpers so call sites barely change.
Int32Arrays (indexed bynodeId):Back all five with one
ArrayBuffer(or keep separate — benchmark both).NodeId/SegmentIdare already number and fitInt32.
directories: Map<NodeId, {childIds: number[]}>):This collapses 61K separate array allocations + a Map into 3 dense buffers. StaticPathStoreSnapshot.childIds
(static-store.ts:45) already proves this CSR layout works — it just uses number[] instead of Int32Array.
internal-types.ts:65-95) as the abstraction seam — change them fromnode.depthAndFlagsto
store.depthAndFlags[id]:Most call sites already go through these helpers, so they migrate by passing (store, id) instead of (node).
Mutation / growth = manual arena allocator. Replace Array.push of node objects with append-at-length into the
typed arrays, doubling capacity (new Int32Array(cap*2); buf.set(old)) when full. For node removal, reuse the existing
PATH_STORE_NODE_FLAG_REMOVED tombstone (internal-types.ts:19) and optionally a free-list of reclaimed ids; the CSR
child table is rebuilt on structural mutation exactly as rebuildVisibleChildChunks already does for chunk sums
(child-index.ts:273).
Build path. PathStoreBuilder.finish (builder.ts:687) currently emits the AoS snapshot; add a finish variant that
writes directly into the SoA buffers. The builder already assigns node IDs sequentially in DFS preorder for presorted
ingest, so childStart/childCount fill in one forward pass.
Scope / sequencing. This touches the core store, so gate it: land the benchmark first (done), then prototype SoA
behind a flag and re-run
benchmarkNodeStorageSoA.ts+benchmarkVisibleTreeProjection.tsto confirm the memory/GC winwithout regressing the string-bound render path. Cheap adjacent win to grab regardless: convert the existing
StaticPathStoreSnapshot.childIds/childPositionByNodeIdfromnumber[]toInt32Array(static-store.ts:45-46) —near-zero risk, speeds the sibling-advance loop at
static-store.ts:716.Reproduce the benchmark:
Acceptance signal: SoA shows ≥1.7x lower retained memory and flat p95≈median on the node sweep, with no regression on
benchmarkVisibleTreeProjection.ts.