Explore kernel durability hardening (WAL record headers, fsync-batching mode, expanded DST fault profiles)

## Summary

Exploratory issue to scope **kernel durability hardening** — improvements to the
write-ahead log, the durability contract, and the deterministic-simulation
testing (DST) harness — that go a step beyond what the distribution-channels work
(#6) deliberately left untouched.

Context: after #6 the kernel (`src/core.ts`) is small, **runtime-agnostic** (zero
`node:` imports), 100% line/function/statement covered, and its crash/recovery is
already exercised by the DST harness in `src/sim/`. The WAL *is* the database
(no separate data file; see `ARCHITECTURE.md`), with each committed transaction
written as a length-framed, CRC-32'd record and fsync'd before the commit is
exposed. The ideas below are about making that foundation **more robust and more
evolvable** without betraying the manifesto — `core.ts` stays small because it is
genuinely minimal, never because complexity was swept elsewhere.

This is exploratory and **not** a commitment to build all (or any) of it. Each
candidate must earn its place against the comprehension budget.

## Candidate directions

### 1. Versioned / magic record (or file) header for the WAL
Today a record is `[u32 payloadLength][u32 crc32(payload)][payload]` with no magic
number and no format version. A small magic + version header would let recovery:
- **Detect a wrong/foreign/corrupt file early** with a clear error, instead of
  misparsing arbitrary bytes as a length-framed record.
- **Evolve the on-disk format forward-compatibly** (new record kinds/fields,
  alternative codecs) by branching on a version, rather than being frozen.

Open questions: per-file header vs per-record version; how recovery reacts to an
unknown/newer version (refuse vs best-effort); byte overhead vs the "WAL is the
database" minimalism; migration story for files written by today's headerless
format.

### 2. A documented fsync-batching (group-commit) durability mode
The kernel fsyncs on **every** commit — correct and simple, but per-commit fsync
caps write throughput. A standard database answer is **group commit**: let several
commits share one fsync, trading a *bounded, explicitly-documented* window of
durability for throughput (cf. Postgres `synchronous_commit`, SQLite WAL).

The point of this issue is as much **documentation as implementation**: if such a
mode exists it must be **opt-in**, with the exact durability guarantee spelled out
(what a crash can lose, and when), and it must not quietly weaken the default. It
also has to stay honest with the synchronous, single-process model.

Open questions: API surface (an `open()` option? a fence/`flush()` call?); the
precise crash semantics; interaction with #1 (does reordering/batching change what
recovery must tolerate?); whether the default stays fsync-per-commit (it should).

### 3. Expanded DST fault profiles
The DST harness already tortures recovery under a simulated crashing filesystem.
Extend it with more realistic fault profiles so we *prove* (not assume) what the
format and recovery survive:
- **Partial writes** — `write` lands fewer bytes than asked; torn at an arbitrary
  offset, not just a clean tail.
- **Operation reordering** — writes/fsyncs reach durable storage out of order
  (especially relevant if #2 introduces batching).
- Possibly: **bit-flip / corruption** (CRC-32 should catch; quantify), delayed or
  dropped fsync, and faults injected mid-recovery.

Open questions: which faults are realistic for the real backends we target
(Node/Bun `fs`, OPFS in a Worker, future adapters); which the current format
already survives vs which motivate #1/#2; keeping the harness deterministic and
fast.

## Non-goals
- Growing `core.ts` for its own sake. Each change must reduce risk *and* keep the
  kernel comprehensible — the metric is comprehension time, not line count.
- Cryptographic tamper-resistance. CRC-32 is error detection, not an integrity
  guarantee against a writer who already has file access (out of the threat model).
- Anything that compromises the zero-runtime-dependency or
  embedded/single-process posture, or that weakens the **default** durability.

## Constraints (carried from the project)
- 100% line/function/statement coverage; deterministic tests (the gate stays green).
- Conventional commits; English-only; no emoji.
- Changesets for any user-facing change (the published package / public API / types).
- Decisions that touch `core.ts` are guarded-core changes: heavy review, and they
  must not contradict `docs/DESIGN.md` without explicitly reopening the decision.

## Required first step: detailed research before any implementation

Before starting any development, a thorough investigation is mandatory. Given that
LibreDB is an embedded, FoundationDB-style architecture (one small ordered
key-value core with thin model lenses on top), the research must establish, for
each candidate above, **which durability mechanisms genuinely belong in a database
like this** and which would add complexity the design refuses. The research should
map each candidate to how well it fits the embedded, single-process,
zero-dependency design — and what it costs in comprehension, the kernel's real
budget — before we commit to building anything. Study the prior art closely
(SQLite WAL, FoundationDB, Postgres group commit, libSQL/Turso's DST practice).
The deliverable is a design note (in the spirit of the #6 research doc under
`docs/`), reviewed and agreed, **before any code is written**.

## Suggested order
1. Research note covering all three directions (feasibility, fit, cost, prior art).
2. #1 (record/file header) — smallest, enables the rest and improves diagnostics.
3. #3 (DST fault profiles) — so any durability change is provable.
4. #2 (fsync-batching mode) — highest risk; only after #1 + #3 make it safe to reason about.

Related: builds on the WAL/recovery described in `ARCHITECTURE.md` and the locked
decisions in `docs/DESIGN.md`; sibling to the distribution-channels work in #6.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Explore kernel durability hardening (WAL record headers, fsync-batching mode, expanded DST fault profiles) #9

Summary

Candidate directions

1. Versioned / magic record (or file) header for the WAL

2. A documented fsync-batching (group-commit) durability mode

3. Expanded DST fault profiles

Non-goals

Constraints (carried from the project)

Required first step: detailed research before any implementation

Suggested order

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Explore kernel durability hardening (WAL record headers, fsync-batching mode, expanded DST fault profiles) #9

Description

Summary

Candidate directions

1. Versioned / magic record (or file) header for the WAL

2. A documented fsync-batching (group-commit) durability mode

3. Expanded DST fault profiles

Non-goals

Constraints (carried from the project)

Required first step: detailed research before any implementation

Suggested order

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions