Evaluate struct-of-arrays node storage for @pierre/path-store

The following analysis was produced with AI assistance. If the team is interested in pursuing this further, I'm happy to do additional benchmarking with your guidance.

# Background

The central node store in `packages/path-store` is an array of ~1M plain objects (`nodes: PathStoreNode[]`) plus per-directory `childIds: number[]` arrays and a `directories: Map`. Every field on `PathStoreNode` is already a numeric integer (internal-types.ts:45-51):

```
interface PathStoreNode {
  parentId; nameId; depthAndFlags; subtreeNodeCount; visibleSubtreeCount;
}
```

# Hypothesis

Converting this to a Struct-of-Arrays (SoA) layout (five parallel `Int32Array`s plus a flat CSR child
table) will meaningfully speed up the hot full-tree projection sweep (`buildVisibleTreeProjectionDataDFS`,
`projection.ts:763`) and the construction/count passes at ~1M nodes, by eliminating 1M heap objects, improving cache
locality, and removing GC pressure.

# Findings (summary)

The SoA mechanism is confirmed, but the render-speedup framing is disproven:

- ✅ ~3x faster on the isolated node DFS sweep (6.2ms → 2.0ms at 990K nodes), validated with an identical-work
checksum.
- ✅ ~1.85x less memory (35.7 MB → 19.3 MB for node objects alone; excludes the child arrays + Map that SoA also
subsumes).
- ✅ GC tail latency eliminated — AoS p95 is 15.6ms (7.5x its median); SoA p95 ≈ median (flat).
- ❌ But only ~7% end-to-end on the render path. The real getVisibleSlice is ~58ms; node-field access is only ~10% of
that. The other ~90% is path-string materialization + PathStoreVisibleRow object allocation, which SoA cannot touch.

## Conclusion

Worth doing for memory footprint and frame-time consistency on huge trees, and it lands the full 3x on
the count-only sweeps. It is not the right lever to speed up `getVisibleSlice` (that path is string/allocation-bound —
optimize `pathCacheByNodeId` and per-row object allocation first).

## Benchmark

`packages/path-store/scripts/benchmarkNodeStorageSoA.ts` (see below). Builds the real linux-10x snapshot via
`PathStoreBuilder`, mirrors it into SoA, and runs the identical preorder DFS over each representation, writing into
identical `Int32Array` output buffers and accumulating a checksum (`checksumMatch=true` confirms both sweeps do exactly
the same work; only storage layout differs).

Isolated node-sweep (all directories expanded, full tree):

| Workload | Nodes | AoS media | SoA media | Speedup | AoS p95 | SoA p95 |
|---------- | ------| -----------| -----------| --------| --------| --------|
| linux-1x  |   99K |     0.55ms |     0.20ms |   2.77x |    0.63 |    0.55 |
| linux-5x  |  495K |     2.80ms |     1.00ms |   2.81x |    4.30 |    2.86 |
| linux-10x |  990K |     6.24ms |     1.97ms |   3.12x |    15.6 |    2.06 |

Memory (linux-10x, node objects only): AoS 35.7 MB vs SoA 19.3 MB (matches the 20-bytes/node analytical floor).

Real-projection context (linux-10x): `getVisibleSlice(0, all)` median 58ms; comparable isolated node-sweep 6.2ms → node
access ≈ 10% of the real path.

Where the full win lands (count-only sweeps, no string work):
- `initializeOpenVisibleCounts` — `store.ts:143`
- `recomputeCountsRecursive` — `canonical.ts:342`
- `StaticPathStore` recompute on every expand/collapse — `static-store.ts:361`
- builder subtree-count accumulation — `builder.ts:1087`

Where it does NOT help: the render projection DFS, because path strings (`${parentPath}${segment}/`) and
`PathStoreVisibleRow` allocation dominate.

# Implementation sketch

**Goal**: replace the array-of-objects node store + per-directory `number[]/Map` with a typed-array SoA + flat CSR child
table, behind the existing internal-types.ts accessor helpers so call sites barely change.

1. Node fields → parallel `Int32Array`s (indexed by `nodeId`):

```
interface SoaNodeStore {
  parentId: Int32Array;
  nameId: Int32Array;
  depthAndFlags: Int32Array;   // keep the existing depth<<4 | kind<<3 | flags packing
  subtreeNodeCount: Int32Array;
  visibleSubtreeCount: Int32Array;
  length: number;              // logical node count (arrays may over-allocate)
}
```

Back all five with one `ArrayBuffer` (or keep separate — benchmark both). `NodeId/SegmentId` are already number and fit
Int32.

2. Children → flat CSR table (replaces `directories: Map<NodeId, {childIds: number[]}>`):

```
childStart: Int32Array;   // childStart[id] = offset into childIdsFlat
childCount: Int32Array;   // childCount[id] = number of direct children
childIdsFlat: Int32Array; // all child ids, concatenated per-directory
```

This collapses 61K separate array allocations + a Map into 3 dense buffers. StaticPathStoreSnapshot.childIds
(static-store.ts:45) already proves this CSR layout works — it just uses number[] instead of Int32Array.

3. Keep the accessor helpers (`internal-types.ts:65-95`) as the abstraction seam — change them from `node.depthAndFlags`
to `store.depthAndFlags[id]`:

```
getNodeDepth(store, id)      => store.depthAndFlags[id] >>> 4
isDirectoryNode(store, id)   => (store.depthAndFlags[id] & (1<<3)) !== 0
getNodeFlags(store, id)      => store.depthAndFlags[id] & 0b111
addNodeFlag(store, id, flag) => store.depthAndFlags[id] |= flag
```

Most call sites already go through these helpers, so they migrate by passing (store, id) instead of (node).

4. Mutation / growth = manual arena allocator. Replace Array.push of node objects with append-at-length into the
typed arrays, doubling capacity (new Int32Array(cap*2); buf.set(old)) when full. For node removal, reuse the existing
PATH_STORE_NODE_FLAG_REMOVED tombstone (internal-types.ts:19) and optionally a free-list of reclaimed ids; the CSR
child table is rebuilt on structural mutation exactly as rebuildVisibleChildChunks already does for chunk sums
(child-index.ts:273).

5. Build path. PathStoreBuilder.finish (builder.ts:687) currently emits the AoS snapshot; add a finish variant that
writes directly into the SoA buffers. The builder already assigns node IDs sequentially in DFS preorder for presorted
ingest, so childStart/childCount fill in one forward pass.

6. Scope / sequencing. This touches the core store, so gate it: land the benchmark first (done), then prototype SoA
behind a flag and re-run `benchmarkNodeStorageSoA.ts` + `benchmarkVisibleTreeProjection.ts` to confirm the memory/GC win
without regressing the string-bound render path. Cheap adjacent win to grab regardless: convert the existing
`StaticPathStoreSnapshot.childIds` / `childPositionByNodeId` from `number[]` to `Int32Array` (`static-store.ts:45-46`) —
near-zero risk, speeds the sibling-advance loop at `static-store.ts:716`.

Reproduce the benchmark:

```
cd packages/path-store
AGENT=1 bun run ./scripts/benchmarkNodeStorageSoA.ts --workload linux-10x --runs 30 --warmup-runs 8
```

Acceptance signal: SoA shows ≥1.7x lower retained memory and flat p95≈median on the node sweep, with no regression on
`benchmarkVisibleTreeProjection.ts`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate struct-of-arrays node storage for @pierre/path-store #776

Background

Hypothesis

Findings (summary)

Conclusion

Benchmark

Implementation sketch

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Workload	Nodes	AoS media	SoA media	Speedup	AoS p95	SoA p95
linux-1x	99K	0.55ms	0.20ms	2.77x	0.63	0.55
linux-5x	495K	2.80ms	1.00ms	2.81x	4.30	2.86
linux-10x	990K	6.24ms	1.97ms	3.12x	15.6	2.06

Evaluate struct-of-arrays node storage for @pierre/path-store #776

Description

Background

Hypothesis

Findings (summary)

Conclusion

Benchmark

Implementation sketch

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions