Skip to content

fix: tighten state-sync defaults (keep=2, blockInterval=100000)#255

Open
raymondjacobson wants to merge 2 commits intomainfrom
fix/state-sync-keep-default-2
Open

fix: tighten state-sync defaults (keep=2, blockInterval=100000)#255
raymondjacobson wants to merge 2 commits intomainfrom
fix/state-sync-keep-default-2

Conversation

@raymondjacobson
Copy link
Copy Markdown
Contributor

@raymondjacobson raymondjacobson commented May 9, 2026

Summary

Two related default changes in pkg/core/config/config.go to bring the state-sync defaults in line with what production validators actually use:

  • stateSyncKeep: 62 — each snapshot is currently ~30–45 GB. With Keep=6 a snapshot-serving node accumulates ~180 GB+ of snapshots on top of chain data and Postgres. This exhausted the 1 TB disk on creatornode2.audius.co today (the only prod node with stateSyncServeSnapshots=true), which put Postgres into a checkpoint PANIC / recovery loop until snapshots were manually deleted. Two snapshots is enough to serve an incoming state-syncer (newest) plus one fallback (in case the newest is mid-creation).

  • stateSyncBlockInterval: 100100000 — a 100-block interval would create a new snapshot roughly every minute on mainnet, which is far too aggressive both on disk and CPU. Prod already overrides this to 100000 (the height boundaries seen in /data/bolt/snapshots_*/height_002420**** etc.); align the default with what validators actually use.

Operators who want different values can still override via the stateSyncKeep / stateSyncBlockInterval env vars.

Test plan

  • CI green
  • After rollout on creatornode2.audius.co, confirm only the most recent 2 snapshot directories exist under /data/bolt/snapshots_<chainID>/ after the next snapshot creation cycle.
  • Confirm new nodes coming up without env overrides take snapshots at 100k-block boundaries (heights ending in _00000).

🤖 Generated with Claude Code

raymondjacobson and others added 2 commits May 8, 2026 23:09
Each state-sync snapshot is currently ~30-45GB. With Keep=6 a snapshot-serving
node accumulates ~180GB+ of snapshots on top of chain data and Postgres,
which exhausted disk on creatornode2.audius.co (1TB PVC, 100% full) and put
Postgres into a checkpoint PANIC / recovery loop.

Two snapshots is enough to serve incoming state-syncers (a fresh one plus
a fallback) while leaving headroom for chain growth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A 100-block interval would create a new snapshot roughly every minute on
mainnet, which is far too aggressive — both for disk and CPU. Production
already overrides this to 100000 (the values seen in
/data/bolt/snapshots_*/height_0024200000 etc.); align the default with
the value validators are actually using.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@raymondjacobson raymondjacobson changed the title fix: lower stateSyncKeep default from 6 to 2 fix: tighten state-sync defaults (keep=2, blockInterval=100000) May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant