feat(openaudio): auto-tune Postgres memory and WAL defaults at container start#220
Open
RolfAris wants to merge 2 commits intoOpenAudio:mainfrom
Open
feat(openaudio): auto-tune Postgres memory and WAL defaults at container start#220RolfAris wants to merge 2 commits intoOpenAudio:mainfrom
RolfAris wants to merge 2 commits intoOpenAudio:mainfrom
Conversation
…ner start The audiusd container ships stock Debian Postgres 15 defaults (shared_buffers=128MB, work_mem=4MB, effective_cache_size=4GB) which are sized for a tiny dev VM rather than a validator host. Adds an entrypoint shim that picks a memory and WAL tier from detected host RAM and writes a single drop-in conf at $POSTGRES_DATA_DIR/conf.d/. Conservative-by-default: skips with stock defaults when postgresql.conf already has any of the tuned parameters set, when any include_dir directive is already present (active, commented, or pointing at a different dir), when running as a non-root non-postgres uid, when postgres -C preflight rejects the rendered conf, or on any I/O failure. Disable with AUDIUSD_DISABLE_AUTO_TUNE=1 or override via conf.d/99-*.conf or ALTER SYSTEM. Atomic writes (mktemp + rename) on both the tune file and postgresql.conf. Cgroup-aware memory detection (v2 then v1 then /proc/meminfo). Tested: 161 assertions covering every tier midpoint, every boundary value (2048, 4095, 4096, 8191, 8192, 16383, 16384, 32767, 32768, 65535, 65536), sub-floor cases, disable variants, operator-tuning detection, foreign include_dir detection, atomic append well-formedness, orphan tmp cleanup, and the tier log line. shellcheck clean.
…-auto-tune-defaults
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The audiusd container ships stock Debian Postgres 15 defaults (
shared_buffers = 128MB,work_mem = 4MB,effective_cache_size = 4GB), sized for a tiny dev VM rather than a validator host. This adds an entrypoint shim that picks a memory and WAL tier from detected host RAM and writes one drop-in conf file at container start.Suggesting, not requiring. Happy to scope down or close. We're running an equivalent tuning on our 20-node fleet and it's helped meaningfully, so wanted to put it in front of the team.
Tier table
Sizing rules:
shared_buffersnear 25% of RAM (capped to leave headroom for the audiusd Go process, observed at roughly 7 GB RSS on a busy validator).effective_cache_sizenear 50% (more conservative than pgtune's 75% since Postgres is not the only tenant in this container).wal_bufferscapped at 16 MB per the Postgres docs.work_memmodest because audiusd's observed concurrency is roughly 8 connections, not 100.Conservative-by-default behavior
The shim skips with stock defaults whenever it cannot prove safety:
postgresql.conf. Ifshared_buffers,work_mem,maintenance_work_mem,effective_cache_size,wal_buffers,max_wal_size, ormin_wal_sizeis set uncommented inpostgresql.conf, the shim skips with a log line. Operator wins.include_dirdirective. Anyinclude_dirline (active, commented out, or pointing at a different directory or using different quoting) makes the shim skip rather than risk last-occurrence-wins overriding the operator's directory.postgres -C shared_buffers --config-file=...preflight runs after rendering. If Postgres refuses to parse the conf, the shim removes the rendered file and exits.Override knobs
Precedence (later wins):
postgresql.conftop-of-fileconf.d/00-audiusd-defaults.conf(this shim, written conditional on the conservative checks above)conf.d/99-*.conf(operator override slot)postgresql.auto.conf(ALTER SYSTEM, processed last by Postgres regardless of position)Evidence: controlled before/after on one of our nodes
Same node (24 GB host, 144 GB DB, 38 GB indexes), 20-min steady-state windows on each side, reset stats between. Only
shared_buffers,wal_buffers,max_wal_size, andmin_wal_sizewere changed (the restart-required group). The other tier values were already in place viaALTER SYSTEMon that node.shared_buffersshared_buffers(16-32 GB tier)pg_stat_bgwriter.buffers_allocratepg_stat_bgwriter.buffers_backend(window)pg_stat_bgwriter.buffers_checkpoint(window)The shape of buffer accounting changed. Before: backends doing emergency dirty-page writes (1.25M of them) because
shared_bufferswas exhausted. After: the checkpointer does planned batched writes on schedule (35k). The 99% drop inbuffers_backendis the strongest signal thatshared_bufferswas undersized.Representative heavy query,
SELECT count(*) FROM ops WHERE "table" = 'uploads'(full table scan over a 50 GB table):Light queries (e.g.
SELECT * FROM core_blocks ORDER BY created_at DESC LIMIT 100) went from 9 to 14 disk reads down to 0. Index pages stay resident in the bigger pool. Sub-ms either way, but the disk-read count delta is the durable signal.Restart cost: roughly 3 seconds Postgres unavailability. The audiusd Go process kept running and reconnected; no application errors observed.
Reproduce on any node:
Determinism and consensus safety
A tuning that changes plan choice (
work_mem,effective_cache_size) could in principle affect consensus state if any state-applied query relied on default plan ordering. We audited the currently visible ORDER-sensitive paths::many ... LIMITqueries inpkg/core/db/sql/reads.sqlhave explicitORDER BY.:one ... LIMIT 1queries WHERE on a unique column.GetAllRegisteredNodes,GetAllEthAddressesOfRegisteredNodes,GetActiveStorageNodeEndpoints) feed onlycommon.GetAttestorRendezvous, which sorts internally by hash. The output is order-independent.opssweep isOrder("ulid asc")server-side (pkg/mediorum/server/serve_crud.go:33).We did not find a path where a plan flip could change consensus state. This is an audit, not an executable guard. Adding a deterministic-order assertion test would harden this further; happy to do that if it would help review.
Caveats for operators
shared_buffersis restart-required. On image upgrade,docker compose up -drecreates the container and Postgres starts with the new value. On hosts with very tight free RAM at upgrade time, the larger allocation may cause Postgres start to fail. Workaround: setAUDIUSD_DISABLE_AUTO_TUNE=1before upgrading, or override viaconf.d/99-*.conf./dev/shmsize on big tiers. Postgres 15's parallel workers use/dev/shmfor dynamic shared memory. Docker defaults/dev/shmto 64 MB. On the 64 GB and up tier (16 GBshared_buffers), parallel queries with many workers can hitcould not resize shared memory segment. Operators on big hosts should pass--shm-size=2gor larger.securityContext.runAsUser, rootless docker, or podman with--usermake the in-containerchown postgres:postgresno-op. The shim detects this and skips. Operators in those modes will see stock defaults, which is the existing behavior.Out of scope
random_page_cost,effective_io_concurrency(assume SSD),synchronous_commit,wal_compression,checkpoint_*,max_connections. Memory and WAL sizing only.Test plan
bash cmd/openaudio/postgres-auto-tune_test.sh, 161 assertions covering: every tier at midpoint and at both boundary edges (one inside the tier, one just below it); sub-floor; idempotency across re-runs;AUDIUSD_DISABLE_AUTO_TUNE=1short-circuit;=trueand=0correctly NOT honored (canonical form is=1); operator-tunedpostgresql.confskip; commented-tuning does NOT trigger skip; foreigninclude_dirskip (alternate dir, double-quoted, commented); existinginclude_dir = 'conf.d'recognized as ours; well-formedpostgresql.confafter atomic append; orphan tmp file cleanup; tier log line. Lint-clean (shellcheck).include_dir = 'conf.d'appended once via atomic temp+rename, drop-in renders, restart picks up newshared_buffersAUDIUSD_DISABLE_AUTO_TUNE=1, noconf.ddirectory, stock defaultsdocker run -m 1G(cgroup-limited container), shim detects via cgroup, sub-2GB skip, stock defaultspostgresql.confwith hand-tunedshared_buffers, shim detects and skips with log line