Skip to content

Add WAL compaction to reclaim space and bound recovery time #12

Description

@cevheri

Summary

Add WAL compaction to the kernel so a long-lived, file-backed LibreDB
database can reclaim space and bound its open (recovery) time. Today the on-disk
.libredb file is a pure append-only write-ahead log with no mechanism to ever
shrink it
: every set/delete appends a new record, superseded records are
never reclaimed, and the file grows without bound for the life of the database.

This is a known limitation, already named in docs/DESIGN.md §7 ("Production
arc": "WAL compaction/checkpointing — both known limitations"). This issue
exists to scope it concretely. It is exploratory, not a commitment to a
specific design — each choice must earn its place against the comprehension
budget (CLAUDE.md: "core.ts stays small because it is genuinely minimal").

Background: the WAL is the database

The kernel (src/core.ts) keeps no separate data file. The WAL is the whole
durable representation:

  • Each committed transaction is appended as one length-framed, CRC-32'd record
    (core.ts record format, ~L275-294) and fsync'd before the commit is exposed.
  • On open, recover() (core.ts ~L397) replays every intact record in order to
    rebuild the committed state, then truncates any torn tail.
  • The "current state" only ever lives in memory as a sorted array
    (committed, core.ts ~L458). Disk holds only the log.

Because the log is append-only and nothing rewrites it, superseded and deleted
data stay on disk forever
:

  • Overwriting a key writes a new set record; the old one remains dead weight.
  • Deleting a key writes a delete tombstone (core.ts ~L239); both the original
    set and the tombstone remain on disk even though the key no longer exists.

Concrete evidence

Decoding the shipped sample libredb-studio/data/demo.libredb (719 bytes, 11
records) shows the effect in miniature — project:1 is written twice:

record  op    key                    value
   8    SET   project:1              libredb
   9    SET   project:1              libredb   <- supersedes record 8; record 8 is now dead

Record 8 is pure dead weight: 25 bytes of file that recovery must still read and
replay on every open, only to be overwritten by record 9. At demo scale this is
noise; on hr.libredb after months of updates it is the dominant cost. A
workload that repeatedly updates the same rows (a counter, a status column, a
session:* heartbeat) grows the file in proportion to the number of writes
ever made
, not the amount of live data.

Impact

Two costs grow unbounded with write volume, independent of live data size:

  1. Disk footprint. The file size tracks total historical writes, not the
    current row count. A small database that is written often becomes a large
    file.
  2. Open / recovery time. recover() replays the entire log on every open.
    More dead records means a slower cold start — and this is the only path to
    rebuilding in-memory state, so it is on the critical path of every process
    start.

This is the same class of problem operators know from other engines
(PostgreSQL dead tuples reclaimed by VACUUM; SQLite VACUUM; log-structured
stores' background compaction). LibreDB currently has the accumulation with none
of the reclamation.

Proposed approach: rewrite-and-rename compaction

Compaction is conceptually the inverse of recover(): instead of log -> memory,
it does memory -> a fresh, minimal log. The current in-memory committed state
is already the fully-collapsed truth (one entry per live key, no tombstones), so
compaction is essentially "serialize committed into a brand-new WAL and swap it
in."

Crash-safe sequence (the standard atomic-replace pattern):

  1. Encode the current committed entries as a fresh log — one set record per
    live key, zero tombstones, zero superseded versions. (Could be one record per
    entry, or a single batched record; TBD.)
  2. Write it to a temporary sibling file (e.g. <path>.compact), then fsync it.
  3. Atomically rename() the temp file over the live path.
  4. fsync the containing directory so the rename itself is durable.
  5. Continue appending new commits to the now-compacted file.

Because rename() is atomic, a crash at any point leaves either the old
complete log or the new complete log intact — never a half-built file. The
existing recovery invariant (recovered state is always a valid committed prefix)
must continue to hold across a crash during compaction.

Design considerations & open questions

  • Trigger policy. Manual (compact() on Database, or a standalone
    function) vs automatic (e.g. when dead-byte ratio or file/live size ratio
    crosses a threshold) vs both. A manual API is the smaller, more honest first
    step; automatic policy can layer on later. What is the API surface, and does it
    belong on the Database interface or beside open?

  • FileSystem seam changes (guarded-core impact). The current FileSystem
    interface exposes only open(path), and WalFile only size/read/append/ fsync/truncate/close (core.ts ~L109-134). Atomic rewrite-and-rename needs
    new capabilities — create/write a temp file, rename, and ideally directory
    fsync. This widens the one guarded IO boundary, so it needs the same
    heavy-review + DST treatment as any core.ts change, and matching adapters in
    the Node entry (index.ts) and the browser/OPFS entry. Keep the additions the
    minimal set compaction actually performs — nothing more.

  • Interaction with Explore kernel durability hardening (WAL record headers, fsync-batching mode, expanded DST fault profiles) #9 (durability hardening). Compaction writes a brand-new
    file from scratch, which is the natural moment to also write a versioned/magic
    file header if Explore kernel durability hardening (WAL record headers, fsync-batching mode, expanded DST fault profiles) #9's direction docs: add CI, quality gate, and coverage badges to README #1 lands. The two should be designed to compose
    (a compacted file is a good place to start emitting the new format).

  • In-memory databases are unaffected. A pathless open() has no log
    (log === null); compaction must be a no-op (or unavailable) there.

  • Concurrency / re-entrancy. Compaction mutates the durable file while the
    single-threaded transaction model owns committed. It must respect the same
    serial discipline as transact (core.ts ~L490 inTransaction guard) and not
    run interleaved with a live transaction.

  • DST coverage (mandatory). Crash-during-compaction must be added to the
    deterministic-simulation harness (src/sim/), asserting the invariant that a
    crash at any injected point recovers to exactly the committed state — from
    either the pre- or post-rename file. Compaction without this coverage is not
    done.

  • Honesty discipline. This must not smuggle complexity into core.ts beyond
    what compaction genuinely requires. If it cannot be added while keeping the
    kernel comprehensible, that tension is itself a finding worth recording.

Acceptance criteria (draft)

  • A file-backed database can be compacted, after which the file contains no
    dead/superseded records and no tombstones for already-absent keys.
  • Post-compaction, recover() on the compacted file reproduces the exact
    same committed state as before compaction.
  • A crash injected at any point during compaction recovers to the correct
    committed state (verified in src/sim/), from either the old or new file.
  • In-memory (pathless) databases are handled explicitly (no-op / unsupported,
    documented).
  • Coverage stays at 100% and bun run gate is green.

Non-goals (for this issue)

  • Online/background/incremental compaction concurrent with live writes (the first
    cut can be a synchronous, stop-the-world rewrite).
  • Automatic scheduling heuristics, if the first cut ships a manual trigger.
  • A separate on-disk data file / checkpointing architecture — LibreDB
    deliberately keeps "the WAL is the database"; this issue reclaims space within
    that model, it does not replace it.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions