You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Exploratory issue to scope kernel durability hardening — improvements to the
write-ahead log, the durability contract, and the deterministic-simulation
testing (DST) harness — that go a step beyond what the distribution-channels work
(#6) deliberately left untouched.
Context: after #6 the kernel (src/core.ts) is small, runtime-agnostic (zero node: imports), 100% line/function/statement covered, and its crash/recovery is
already exercised by the DST harness in src/sim/. The WAL is the database
(no separate data file; see ARCHITECTURE.md), with each committed transaction
written as a length-framed, CRC-32'd record and fsync'd before the commit is
exposed. The ideas below are about making that foundation more robust and more
evolvable without betraying the manifesto — core.ts stays small because it is
genuinely minimal, never because complexity was swept elsewhere.
This is exploratory and not a commitment to build all (or any) of it. Each
candidate must earn its place against the comprehension budget.
Candidate directions
1. Versioned / magic record (or file) header for the WAL
Today a record is [u32 payloadLength][u32 crc32(payload)][payload] with no magic
number and no format version. A small magic + version header would let recovery:
Detect a wrong/foreign/corrupt file early with a clear error, instead of
misparsing arbitrary bytes as a length-framed record.
Evolve the on-disk format forward-compatibly (new record kinds/fields,
alternative codecs) by branching on a version, rather than being frozen.
Open questions: per-file header vs per-record version; how recovery reacts to an
unknown/newer version (refuse vs best-effort); byte overhead vs the "WAL is the
database" minimalism; migration story for files written by today's headerless
format.
2. A documented fsync-batching (group-commit) durability mode
The kernel fsyncs on every commit — correct and simple, but per-commit fsync
caps write throughput. A standard database answer is group commit: let several
commits share one fsync, trading a bounded, explicitly-documented window of
durability for throughput (cf. Postgres synchronous_commit, SQLite WAL).
The point of this issue is as much documentation as implementation: if such a
mode exists it must be opt-in, with the exact durability guarantee spelled out
(what a crash can lose, and when), and it must not quietly weaken the default. It
also has to stay honest with the synchronous, single-process model.
Open questions: API surface (an open() option? a fence/flush() call?); the
precise crash semantics; interaction with #1 (does reordering/batching change what
recovery must tolerate?); whether the default stays fsync-per-commit (it should).
3. Expanded DST fault profiles
The DST harness already tortures recovery under a simulated crashing filesystem.
Extend it with more realistic fault profiles so we prove (not assume) what the
format and recovery survive:
Partial writes — write lands fewer bytes than asked; torn at an arbitrary
offset, not just a clean tail.
Possibly: bit-flip / corruption (CRC-32 should catch; quantify), delayed or
dropped fsync, and faults injected mid-recovery.
Open questions: which faults are realistic for the real backends we target
(Node/Bun fs, OPFS in a Worker, future adapters); which the current format
already survives vs which motivate #1/#2; keeping the harness deterministic and
fast.
Non-goals
Growing core.ts for its own sake. Each change must reduce risk and keep the
kernel comprehensible — the metric is comprehension time, not line count.
Cryptographic tamper-resistance. CRC-32 is error detection, not an integrity
guarantee against a writer who already has file access (out of the threat model).
Anything that compromises the zero-runtime-dependency or
embedded/single-process posture, or that weakens the default durability.
Constraints (carried from the project)
100% line/function/statement coverage; deterministic tests (the gate stays green).
Conventional commits; English-only; no emoji.
Changesets for any user-facing change (the published package / public API / types).
Decisions that touch core.ts are guarded-core changes: heavy review, and they
must not contradict docs/DESIGN.md without explicitly reopening the decision.
Required first step: detailed research before any implementation
Before starting any development, a thorough investigation is mandatory. Given that
LibreDB is an embedded, FoundationDB-style architecture (one small ordered
key-value core with thin model lenses on top), the research must establish, for
each candidate above, which durability mechanisms genuinely belong in a database
like this and which would add complexity the design refuses. The research should
map each candidate to how well it fits the embedded, single-process,
zero-dependency design — and what it costs in comprehension, the kernel's real
budget — before we commit to building anything. Study the prior art closely
(SQLite WAL, FoundationDB, Postgres group commit, libSQL/Turso's DST practice).
The deliverable is a design note (in the spirit of the #6 research doc under docs/), reviewed and agreed, before any code is written.
Suggested order
Research note covering all three directions (feasibility, fit, cost, prior art).
Related: builds on the WAL/recovery described in ARCHITECTURE.md and the locked
decisions in docs/DESIGN.md; sibling to the distribution-channels work in #6.
Summary
Exploratory issue to scope kernel durability hardening — improvements to the
write-ahead log, the durability contract, and the deterministic-simulation
testing (DST) harness — that go a step beyond what the distribution-channels work
(#6) deliberately left untouched.
Context: after #6 the kernel (
src/core.ts) is small, runtime-agnostic (zeronode:imports), 100% line/function/statement covered, and its crash/recovery isalready exercised by the DST harness in
src/sim/. The WAL is the database(no separate data file; see
ARCHITECTURE.md), with each committed transactionwritten as a length-framed, CRC-32'd record and fsync'd before the commit is
exposed. The ideas below are about making that foundation more robust and more
evolvable without betraying the manifesto —
core.tsstays small because it isgenuinely minimal, never because complexity was swept elsewhere.
This is exploratory and not a commitment to build all (or any) of it. Each
candidate must earn its place against the comprehension budget.
Candidate directions
1. Versioned / magic record (or file) header for the WAL
Today a record is
[u32 payloadLength][u32 crc32(payload)][payload]with no magicnumber and no format version. A small magic + version header would let recovery:
misparsing arbitrary bytes as a length-framed record.
alternative codecs) by branching on a version, rather than being frozen.
Open questions: per-file header vs per-record version; how recovery reacts to an
unknown/newer version (refuse vs best-effort); byte overhead vs the "WAL is the
database" minimalism; migration story for files written by today's headerless
format.
2. A documented fsync-batching (group-commit) durability mode
The kernel fsyncs on every commit — correct and simple, but per-commit fsync
caps write throughput. A standard database answer is group commit: let several
commits share one fsync, trading a bounded, explicitly-documented window of
durability for throughput (cf. Postgres
synchronous_commit, SQLite WAL).The point of this issue is as much documentation as implementation: if such a
mode exists it must be opt-in, with the exact durability guarantee spelled out
(what a crash can lose, and when), and it must not quietly weaken the default. It
also has to stay honest with the synchronous, single-process model.
Open questions: API surface (an
open()option? a fence/flush()call?); theprecise crash semantics; interaction with #1 (does reordering/batching change what
recovery must tolerate?); whether the default stays fsync-per-commit (it should).
3. Expanded DST fault profiles
The DST harness already tortures recovery under a simulated crashing filesystem.
Extend it with more realistic fault profiles so we prove (not assume) what the
format and recovery survive:
writelands fewer bytes than asked; torn at an arbitraryoffset, not just a clean tail.
(especially relevant if ci: report sonar.projectVersion from package.json for new-code baseline #2 introduces batching).
dropped fsync, and faults injected mid-recovery.
Open questions: which faults are realistic for the real backends we target
(Node/Bun
fs, OPFS in a Worker, future adapters); which the current formatalready survives vs which motivate #1/#2; keeping the harness deterministic and
fast.
Non-goals
core.tsfor its own sake. Each change must reduce risk and keep thekernel comprehensible — the metric is comprehension time, not line count.
guarantee against a writer who already has file access (out of the threat model).
embedded/single-process posture, or that weakens the default durability.
Constraints (carried from the project)
core.tsare guarded-core changes: heavy review, and theymust not contradict
docs/DESIGN.mdwithout explicitly reopening the decision.Required first step: detailed research before any implementation
Before starting any development, a thorough investigation is mandatory. Given that
LibreDB is an embedded, FoundationDB-style architecture (one small ordered
key-value core with thin model lenses on top), the research must establish, for
each candidate above, which durability mechanisms genuinely belong in a database
like this and which would add complexity the design refuses. The research should
map each candidate to how well it fits the embedded, single-process,
zero-dependency design — and what it costs in comprehension, the kernel's real
budget — before we commit to building anything. Study the prior art closely
(SQLite WAL, FoundationDB, Postgres group commit, libSQL/Turso's DST practice).
The deliverable is a design note (in the spirit of the #6 research doc under
docs/), reviewed and agreed, before any code is written.Suggested order
Related: builds on the WAL/recovery described in
ARCHITECTURE.mdand the lockeddecisions in
docs/DESIGN.md; sibling to the distribution-channels work in #6.