Transcendental compaction.
PiLSMer is a data-free key-value store. Pi for storage. Regret for reads.
It writes your data normally, then uses planning or SlateDB compaction to replace stored values with instructions for finding equivalent byte sequences inside a deterministic stream. Reads still work. Everything else gets worse.
This is an MVP implementation of the core joke:
pilsmer-coreencodes raw values and reconstruction plans.pilsmer-coreincludes deterministicsha256-counter:v1andpi.hex-fraction-prefix:v1streams, packed k-gram indexes, dynamic-programming planning, reconstruction, andexplain.pilsmer-slatewraps SlateDB and can rewrite values through guarded app-levelplan-key,vacuum-meaning, or a SlateDB compaction filter.pilsmer-cliexposes local demo commands, scan-derived metrics, and the benchmark suite.
The custom scheduler is still future work. The standalone compactor, benchmark suite, and ceremonial codec are available.
Run commands from the repo root.
For the full local demo, including JSON, a tiny PNG, ceremonial planning,
VACUUM MEANING, metrics, and the benchmark matrix:
scripts/demo.shcargo run -q --bin pilsmer -- init /tmp/pilsmer-demo/db
printf '{"total":49.99,"status":"paid"}' > /tmp/pilsmer-demo/invoice.json
cargo run -q --bin pilsmer -- put /tmp/pilsmer-demo/db invoice:123 /tmp/pilsmer-demo/invoice.json
cargo run -q --bin pilsmer -- explain /tmp/pilsmer-demo/db invoice:123
cargo run -q --bin pilsmer -- plan-key /tmp/pilsmer-demo/db invoice:123
cargo run -q --bin pilsmer -- explain /tmp/pilsmer-demo/db invoice:123
cargo run -q --bin pilsmer -- get /tmp/pilsmer-demo/db invoice:123SQL-like demo input is available through shell:
cat <<'SQL' | cargo run -q --bin pilsmer -- shell /tmp/pilsmer-demo/db
PUT invoice:123 '{"total":49.99,"status":"paid"}';
GET invoice:123;
EXPLAIN GET invoice:123;
COMPACT INTO NONEXISTENCE;
EXPLAIN GET invoice:123;
VACUUM MEANING;
SQLplan-key is the deterministic single-key demo path. It rewrites the latest
value through the wrapper after checking that the source envelope has not
changed.
explain reports literal bytes, average chunk length, metadata amplification,
and philosophical compression status.
The default stream is sha256-counter:v1 because it can produce arbitrary
prefix sizes. The default SHA prefix is 16 MiB; pass --prefix-bytes for
faster toy runs. The pi, e, and sqrt(2) demo streams are available with
--stream pi-prefix, --stream e-prefix, and --stream sqrt2-prefix; each has
a 256-byte built-in prefix:
printf '\x24\x3f\x6a\x88' > /tmp/pilsmer-demo/pi-bytes.bin
cargo run -q --bin pilsmer -- --stream pi-prefix put /tmp/pilsmer-demo/db pi:first4 /tmp/pilsmer-demo/pi-bytes.bin
cargo run -q --bin pilsmer -- --stream pi-prefix plan-key /tmp/pilsmer-demo/db pi:first4
cargo run -q --bin pilsmer -- --stream pi-prefix explain /tmp/pilsmer-demo/db pi:first4The standalone compactor path is available too:
cargo run -q --bin pilsmer -- compact /tmp/pilsmer-demo/db --into-nonexistence --run-ms 1000compact runs SlateDB's foreground compactor for a bounded window with the
PiLSMer compaction filter installed. It still uses SlateDB's size-tiered
scheduler, so it is workload-dependent. The default --min-compaction-sources
is 4; use 2 for toy workloads with multiple flushed L0 files. plan-key is the
clearer single-key demo. After the bounded run, compact prints aggregate
compaction-filter stats from that process. Use --runs <n> to repeat bounded
compactor windows as a manual stand-in for periodic compaction.
For maximum embarrassment:
cargo run -q --bin pilsmer -- compact /tmp/pilsmer-demo/db --humiliation maximumThat forces pure raw-to-plan compaction, sets max_k = 1, and uses the
ceremonial plan codec; it fails if the value cannot be represented without
literal bytes.
pilsmer init <db>
pilsmer put <db> <key> <file>
pilsmer get <db> <key>
pilsmer delete <db> <key>
pilsmer explain <db> <key> [--philosophical]
pilsmer plan-key <db> <key>
pilsmer vacuum-meaning <db> <key>|--all [--budget 10s]
pilsmer metrics <db>
pilsmer shell <db>
pilsmer migrate <db> --constant pi --to e
pilsmer compact <db> [--mode normal|force-raw-to-plan|vacuum-meaning|disabled]
pilsmer bench <db> [--workload sha256-stream|tiny-json|json-4k|random-64k|repeated-64k|uuid-heavy|all-bytes|tiny-png|png-256k] [--suite|--against-common-sense]Global options:
--prefix-bytes <bytes>
--max-k <k>
--plan-codec compact-binary|ceremonial-cbor
--allow-literals
--stream sha256-counter|pi-prefix|e-prefix|sqrt2-prefix
--disable-embedded-compactor--allow-literals is for development and tests. Forced compaction into plans
rejects it because literal chunks are still user bytes.
Use --disable-embedded-compactor on writer commands when you want the
standalone compact command to be the only compactor in the demo.
migrate --constant <from> --to <to> replans existing values from one checked-in
constant prefix to another, for example pi to e.
bench writes into a new directory and refuses to reuse an existing path. It
compares plain SlateDB values, PiLSMer raw envelopes, compact binary plans,
ceremonial plans, and post-vacuum plans. --suite runs the named workload
matrix; --against-common-sense is the same suite with a more honest name.
Without either, --values and --size apply to the selected workload. Rows
include aggregate times, flush time, put p50/p95, read p50/p95/p99,
reconstruction hash failures, and object-store GET/PUT counts.
metrics combines a scan-derived storage snapshot with runtime counters for
planner, reconstruction, and app-level vacuum activity in the current process.
All reads and writes must go through the PiLSMer wrapper. Direct SlateDB reads return envelopes, not user values.
Normal put stores a raw envelope. put_with_options can create an immediate
plan only for development and tests.
vacuum_meaning accepts VacuumOptions for per-run planning and reconstruction
limits.
The reconstruction cache is off by default; set
PiLsmOptions::reconstruction_cache_bytes to enable it and report the resulting
philosophical purity violation.
After any successful put, plan-key, vacuum-meaning, or compaction-filter
rewrite, get should return the original logical bytes unless the key was
deleted. Planned values carry logical hashes and reconstruction verifies them.
The wrapper enforces max_reconstruct_bytes for planned reads and scans. Use
scan_with_options(..., reconstruct: false) or scan_envelopes for raw stored
envelopes without reconstructing planned values.
Because exact lookup is too honest.
What is stored?
Only metadata.
And logical hashes.
And chunk offsets.
And stream identifiers.
And enough regret to reconstruct the original value.