You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add WAL compaction to the kernel so a long-lived, file-backed LibreDB
database can reclaim space and bound its open (recovery) time. Today the on-disk .libredb file is a pure append-only write-ahead log with no mechanism to ever
shrink it: every set/delete appends a new record, superseded records are
never reclaimed, and the file grows without bound for the life of the database.
This is a known limitation, already named in docs/DESIGN.md §7 ("Production
arc": "WAL compaction/checkpointing — both known limitations"). This issue
exists to scope it concretely. It is exploratory, not a commitment to a
specific design — each choice must earn its place against the comprehension
budget (CLAUDE.md: "core.ts stays small because it is genuinely minimal").
Background: the WAL is the database
The kernel (src/core.ts) keeps no separate data file. The WAL is the whole
durable representation:
Each committed transaction is appended as one length-framed, CRC-32'd record
(core.ts record format, ~L275-294) and fsync'd before the commit is exposed.
On open, recover() (core.ts ~L397) replays every intact record in order to
rebuild the committed state, then truncates any torn tail.
The "current state" only ever lives in memory as a sorted array
(committed, core.ts ~L458). Disk holds only the log.
Because the log is append-only and nothing rewrites it, superseded and deleted
data stay on disk forever:
Overwriting a key writes a new set record; the old one remains dead weight.
Deleting a key writes a delete tombstone (core.ts ~L239); both the original set and the tombstone remain on disk even though the key no longer exists.
Concrete evidence
Decoding the shipped sample libredb-studio/data/demo.libredb (719 bytes, 11
records) shows the effect in miniature — project:1 is written twice:
record op key value
8 SET project:1 libredb
9 SET project:1 libredb <- supersedes record 8; record 8 is now dead
Record 8 is pure dead weight: 25 bytes of file that recovery must still read and
replay on every open, only to be overwritten by record 9. At demo scale this is
noise; on hr.libredb after months of updates it is the dominant cost. A
workload that repeatedly updates the same rows (a counter, a status column, a session:* heartbeat) grows the file in proportion to the number of writes
ever made, not the amount of live data.
Impact
Two costs grow unbounded with write volume, independent of live data size:
Disk footprint. The file size tracks total historical writes, not the
current row count. A small database that is written often becomes a large
file.
Open / recovery time.recover() replays the entire log on every open.
More dead records means a slower cold start — and this is the only path to
rebuilding in-memory state, so it is on the critical path of every process
start.
This is the same class of problem operators know from other engines
(PostgreSQL dead tuples reclaimed by VACUUM; SQLite VACUUM; log-structured
stores' background compaction). LibreDB currently has the accumulation with none
of the reclamation.
Proposed approach: rewrite-and-rename compaction
Compaction is conceptually the inverse of recover(): instead of log -> memory,
it does memory -> a fresh, minimal log. The current in-memory committed state
is already the fully-collapsed truth (one entry per live key, no tombstones), so
compaction is essentially "serialize committed into a brand-new WAL and swap it
in."
Crash-safe sequence (the standard atomic-replace pattern):
Encode the current committed entries as a fresh log — one set record per
live key, zero tombstones, zero superseded versions. (Could be one record per
entry, or a single batched record; TBD.)
Write it to a temporary sibling file (e.g. <path>.compact), then fsync it.
Atomically rename() the temp file over the live path.
fsync the containing directory so the rename itself is durable.
Continue appending new commits to the now-compacted file.
Because rename() is atomic, a crash at any point leaves either the old
complete log or the new complete log intact — never a half-built file. The
existing recovery invariant (recovered state is always a valid committed prefix)
must continue to hold across a crash during compaction.
Design considerations & open questions
Trigger policy. Manual (compact() on Database, or a standalone
function) vs automatic (e.g. when dead-byte ratio or file/live size ratio
crosses a threshold) vs both. A manual API is the smaller, more honest first
step; automatic policy can layer on later. What is the API surface, and does it
belong on the Database interface or beside open?
FileSystem seam changes (guarded-core impact). The current FileSystem
interface exposes only open(path), and WalFile only size/read/append/ fsync/truncate/close (core.ts ~L109-134). Atomic rewrite-and-rename needs
new capabilities — create/write a temp file, rename, and ideally directory fsync. This widens the one guarded IO boundary, so it needs the same
heavy-review + DST treatment as any core.ts change, and matching adapters in
the Node entry (index.ts) and the browser/OPFS entry. Keep the additions the
minimal set compaction actually performs — nothing more.
In-memory databases are unaffected. A pathless open() has no log
(log === null); compaction must be a no-op (or unavailable) there.
Concurrency / re-entrancy. Compaction mutates the durable file while the
single-threaded transaction model owns committed. It must respect the same
serial discipline as transact (core.ts ~L490 inTransaction guard) and not
run interleaved with a live transaction.
DST coverage (mandatory). Crash-during-compaction must be added to the
deterministic-simulation harness (src/sim/), asserting the invariant that a
crash at any injected point recovers to exactly the committed state — from
either the pre- or post-rename file. Compaction without this coverage is not
done.
Honesty discipline. This must not smuggle complexity into core.ts beyond
what compaction genuinely requires. If it cannot be added while keeping the
kernel comprehensible, that tension is itself a finding worth recording.
Acceptance criteria (draft)
A file-backed database can be compacted, after which the file contains no
dead/superseded records and no tombstones for already-absent keys.
Post-compaction, recover() on the compacted file reproduces the exact
same committed state as before compaction.
A crash injected at any point during compaction recovers to the correct
committed state (verified in src/sim/), from either the old or new file.
In-memory (pathless) databases are handled explicitly (no-op / unsupported,
documented).
Coverage stays at 100% and bun run gate is green.
Non-goals (for this issue)
Online/background/incremental compaction concurrent with live writes (the first
cut can be a synchronous, stop-the-world rewrite).
Automatic scheduling heuristics, if the first cut ships a manual trigger.
A separate on-disk data file / checkpointing architecture — LibreDB
deliberately keeps "the WAL is the database"; this issue reclaims space within
that model, it does not replace it.
References
docs/DESIGN.md §7 "Production arc" — names WAL compaction/checkpointing as a
known limitation.
src/core.ts — record format (~L275-294), recover() (~L397), FileSystem/ WalFile seam (~L109-134), committed in-memory state (~L458), re-entrancy
guard (~L490).
ARCHITECTURE.md — "the WAL is the database" (no separate data file).
Summary
Add WAL compaction to the kernel so a long-lived, file-backed LibreDB
database can reclaim space and bound its open (recovery) time. Today the on-disk
.libredbfile is a pure append-only write-ahead log with no mechanism to evershrink it: every
set/deleteappends a new record, superseded records arenever reclaimed, and the file grows without bound for the life of the database.
This is a known limitation, already named in
docs/DESIGN.md§7 ("Productionarc": "WAL compaction/checkpointing — both known limitations"). This issue
exists to scope it concretely. It is exploratory, not a commitment to a
specific design — each choice must earn its place against the comprehension
budget (
CLAUDE.md: "core.ts stays small because it is genuinely minimal").Background: the WAL is the database
The kernel (
src/core.ts) keeps no separate data file. The WAL is the wholedurable representation:
(
core.tsrecord format, ~L275-294) and fsync'd before the commit is exposed.recover()(core.ts~L397) replays every intact record in order torebuild the committed state, then truncates any torn tail.
(
committed,core.ts~L458). Disk holds only the log.Because the log is append-only and nothing rewrites it, superseded and deleted
data stay on disk forever:
setrecord; the old one remains dead weight.deletetombstone (core.ts~L239); both the originalsetand the tombstone remain on disk even though the key no longer exists.Concrete evidence
Decoding the shipped sample
libredb-studio/data/demo.libredb(719 bytes, 11records) shows the effect in miniature —
project:1is written twice:Record 8 is pure dead weight: 25 bytes of file that recovery must still read and
replay on every open, only to be overwritten by record 9. At demo scale this is
noise; on
hr.libredbafter months of updates it is the dominant cost. Aworkload that repeatedly updates the same rows (a counter, a status column, a
session:*heartbeat) grows the file in proportion to the number of writesever made, not the amount of live data.
Impact
Two costs grow unbounded with write volume, independent of live data size:
current row count. A small database that is written often becomes a large
file.
recover()replays the entire log on every open.More dead records means a slower cold start — and this is the only path to
rebuilding in-memory state, so it is on the critical path of every process
start.
This is the same class of problem operators know from other engines
(PostgreSQL dead tuples reclaimed by
VACUUM; SQLiteVACUUM; log-structuredstores' background compaction). LibreDB currently has the accumulation with none
of the reclamation.
Proposed approach: rewrite-and-rename compaction
Compaction is conceptually the inverse of
recover(): instead of log -> memory,it does memory -> a fresh, minimal log. The current in-memory
committedstateis already the fully-collapsed truth (one entry per live key, no tombstones), so
compaction is essentially "serialize
committedinto a brand-new WAL and swap itin."
Crash-safe sequence (the standard atomic-replace pattern):
committedentries as a fresh log — onesetrecord perlive key, zero tombstones, zero superseded versions. (Could be one record per
entry, or a single batched record; TBD.)
<path>.compact), thenfsyncit.rename()the temp file over the live path.fsyncthe containing directory so the rename itself is durable.Because
rename()is atomic, a crash at any point leaves either the oldcomplete log or the new complete log intact — never a half-built file. The
existing recovery invariant (recovered state is always a valid committed prefix)
must continue to hold across a crash during compaction.
Design considerations & open questions
Trigger policy. Manual (
compact()onDatabase, or a standalonefunction) vs automatic (e.g. when dead-byte ratio or file/live size ratio
crosses a threshold) vs both. A manual API is the smaller, more honest first
step; automatic policy can layer on later. What is the API surface, and does it
belong on the
Databaseinterface or besideopen?FileSystem seam changes (guarded-core impact). The current
FileSysteminterface exposes only
open(path), andWalFileonlysize/read/append/ fsync/truncate/close(core.ts~L109-134). Atomic rewrite-and-rename needsnew capabilities — create/write a temp file,
rename, and ideally directoryfsync. This widens the one guarded IO boundary, so it needs the sameheavy-review + DST treatment as any
core.tschange, and matching adapters inthe Node entry (
index.ts) and the browser/OPFS entry. Keep the additions theminimal set compaction actually performs — nothing more.
Interaction with Explore kernel durability hardening (WAL record headers, fsync-batching mode, expanded DST fault profiles) #9 (durability hardening). Compaction writes a brand-new
file from scratch, which is the natural moment to also write a versioned/magic
file header if Explore kernel durability hardening (WAL record headers, fsync-batching mode, expanded DST fault profiles) #9's direction docs: add CI, quality gate, and coverage badges to README #1 lands. The two should be designed to compose
(a compacted file is a good place to start emitting the new format).
In-memory databases are unaffected. A pathless
open()has no log(
log === null); compaction must be a no-op (or unavailable) there.Concurrency / re-entrancy. Compaction mutates the durable file while the
single-threaded transaction model owns
committed. It must respect the sameserial discipline as
transact(core.ts~L490inTransactionguard) and notrun interleaved with a live transaction.
DST coverage (mandatory). Crash-during-compaction must be added to the
deterministic-simulation harness (
src/sim/), asserting the invariant that acrash at any injected point recovers to exactly the committed state — from
either the pre- or post-rename file. Compaction without this coverage is not
done.
Honesty discipline. This must not smuggle complexity into
core.tsbeyondwhat compaction genuinely requires. If it cannot be added while keeping the
kernel comprehensible, that tension is itself a finding worth recording.
Acceptance criteria (draft)
dead/superseded records and no tombstones for already-absent keys.
recover()on the compacted file reproduces the exactsame committed state as before compaction.
committed state (verified in
src/sim/), from either the old or new file.documented).
bun run gateis green.Non-goals (for this issue)
cut can be a synchronous, stop-the-world rewrite).
deliberately keeps "the WAL is the database"; this issue reclaims space within
that model, it does not replace it.
References
docs/DESIGN.md§7 "Production arc" — names WAL compaction/checkpointing as aknown limitation.
src/core.ts— record format (~L275-294),recover()(~L397),FileSystem/WalFileseam (~L109-134),committedin-memory state (~L458), re-entrancyguard (~L490).
ARCHITECTURE.md— "the WAL is the database" (no separate data file).file rewrite here).