Skip to content

Counters + Nursery#1081

Open
0xdeafbeef wants to merge 14 commits into
masterfrom
counters
Open

Counters + Nursery#1081
0xdeafbeef wants to merge 14 commits into
masterfrom
counters

Conversation

@0xdeafbeef
Copy link
Copy Markdown
Member

@0xdeafbeef 0xdeafbeef commented May 11, 2026

Pull Request Checklist

NODE CONFIGURATION MODEL CHANGES

[Yes]

Added storage.cell_storage_threads.

Default: 4.

Rationale: cell store/remove now uses a dedicated worker pool for parallel traversal without occupying generic rayon/global worker capacity.

Changed defaults for storage.states_gc:

  • interval: 60s -> 1s
  • random_offset: true -> false

Rationale: under high load, delayed states GC lets many short-lived counters accumulate. Running GC ASAP keeps counter storage closer to the real live set.

Old configs remain valid because the new field has a serde/default value and existing states_gc configs still deserialize.

BLOCKCHAIN CONFIGURATION MODEL CHANGES

[None]


COMPATIBILITY

Affected features:

  • [State]

    • Compatibility status: [special logic applied]
    • Existing cells DB is migrated to indexed cells/counters. Migration handles resume after interruption, including deleted boundary keys.
    • Tested by cells_v3_migration_resumes_after_deleted_boundary_key.
  • [Storage. States]

    • Compatibility status: [special logic applied]
    • Cell storage now uses indexed ref counters plus cell nursery WAL/checkpoint sidecar. Existing node state opens through storage migrations; nursery state is initialized/replayed from its sidecar.
    • Tested by nursery replay tests and raw state store test.
  • [Persistent State]

    • Compatibility status: [fully compatible]
    • Persistent state BOC format is unchanged. Import/store path now writes into indexed cell storage and counters.
    • Tested by raw_state_store_allows_existing_snapshot.

SPECIAL DEPLOYMENT ACTIONS

[Not Required]

Normal update is enough. 100M state migrates in 12m.

Note: downgrade after opening/migrating the cells DB is not supported.


PERFORMANCE IMPACT

[Expected impact]

Expected changes:

  • Lower RocksDB write amplification for recent cells via cell nursery WAL/checkpoint and delayed promotion.
  • Lower persisted counter churn because rc = 1 counters are implicit and states GC runs more frequently by default.
  • Additional CPU/memory overhead from nursery maps/filter/WAL handling and parallel cell traversal.
  • Cells RocksDB options are tuned for the nursery workload and direct IO usage.
  • Cell insert/remove paths now avoid most RocksDB reads: an RSQF-backed persisted-cell prefilter stops recursive descent at cells already known to be durable.

New metrics/dashboards include:

  • cell nursery entries/checkpoint/WAL traffic
  • nursery written/saved/admitted ratios
  • persisted filter capacity/error ratio
  • load-cell path ratios and timings

Manual devnet runs showed nursery saving a large share of short-lived cells before promotion; latest observed dashboard was roughly 30% written / 70% saved on the tested validator run. (which means at least x5 * 0.7 write traffic saved due to compactions).

Write amplification now is close to 1 for workloads operating with temporary state.

  • master: 571,873,001,681 bytes, about 572 GB
  • counters: 238,587,994 bytes, about 239 MB

Tps for 10kk deploy:

  • master: 18,383.9
  • counters: 25,705.0 (+39.8%)

TESTS

Unit Tests

[Covered by:]

  • storage::db::migrations::tests::cells_v3_migration_resumes_after_deleted_boundary_key
  • storage::shard_state::store_state_raw::test::raw_state_store_allows_existing_snapshot
  • storage::shard_state::cell_storage::tests::nursery_replays_insert_after_reopen
  • storage::shard_state::cell_storage::tests::nursery_replays_remove_after_reopen
  • storage::shard_state::cell_storage::tests::nursery_replays_promotion_after_reopen

Network Tests

[Covered by:]

Devnet9 load testing.

Manual Tests

Deploy 100m accounts now completes in 3 hours. Could run even faster if we bump size of the raw cache.
30k tps on 10m state.

image

Notes/Additional Comments:

This PR changes node-local storage internals only. It does not change blockchain data structures, block formats, proofs, queue format, mempool API, or consensus rules.

@0xdeafbeef 0xdeafbeef force-pushed the counters branch 5 times, most recently from c42b3e2 to 5632ced Compare May 11, 2026 20:59
@0xdeafbeef 0xdeafbeef force-pushed the counters branch 2 times, most recently from 86fd664 to 1caad61 Compare May 20, 2026 14:44
@0xdeafbeef 0xdeafbeef changed the title docs(storage): document indexed cell counters Counters + Nursery May 20, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 20, 2026

🧪 Network Tests

To run network tests for this PR, use:

gh workflow run network-tests.yml -f pr_number=1081

Available test options:

  • Run all tests: gh workflow run network-tests.yml -f pr_number=1081
  • Run specific test: gh workflow run network-tests.yml -f pr_number=1081 -f test_selection=ping-pong

Test types: destroyable, ping-pong, one-to-many-internal-messages, fq-deploy, nft-index, persistent-sync

Results will be posted as workflow runs in the Actions tab.

@github-actions
Copy link
Copy Markdown

❌ Python formatting check failed in CI.

Please run just fmt_py locally and push the updated files.

@0xdeafbeef 0xdeafbeef marked this pull request as ready for review May 20, 2026 14:59
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 84.35336% with 565 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.13%. Comparing base (a87ef36) to head (e9cfea6).
⚠️ Report is 21 commits behind head on master.

Files with missing lines Patch % Lines
core/src/storage/shard_state/cell_storage.rs 81.09% 69 Missing and 83 partials ⚠️
core/src/storage/shard_state/mod.rs 59.86% 53 Missing and 8 partials ⚠️
...src/storage/shard_state/nursery_persistence/mod.rs 80.61% 12 Missing and 45 partials ⚠️
...c/storage/shard_state/nursery_persistence/tests.rs 85.48% 8 Missing and 37 partials ⚠️
core/src/storage/gc.rs 6.66% 33 Missing and 9 partials ⚠️
core/src/storage/shard_state/db_state.rs 72.84% 30 Missing and 11 partials ⚠️
core/src/storage/shard_state/counters.rs 94.58% 27 Missing and 4 partials ⚠️
core/src/storage/db/migrations.rs 90.41% 20 Missing and 10 partials ⚠️
core/src/storage/shard_state/store_state_raw.rs 73.43% 7 Missing and 10 partials ⚠️
cli/src/cmd/tools/hardfork.rs 0.00% 16 Missing ⚠️
... and 10 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1081      +/-   ##
==========================================
+ Coverage   58.97%   60.13%   +1.16%     
==========================================
  Files         459      473      +14     
  Lines       76888    80100    +3212     
  Branches    76888    80100    +3212     
==========================================
+ Hits        45342    48166    +2824     
- Misses      29434    29607     +173     
- Partials     2112     2327     +215     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@0xdeafbeef 0xdeafbeef requested review from Mododo and Rexagon May 21, 2026 12:26
@0xdeafbeef 0xdeafbeef force-pushed the counters branch 5 times, most recently from 9954586 to 26f600d Compare May 21, 2026 14:29
@Rexagon Rexagon force-pushed the counters branch 2 times, most recently from 1f234ca to 65303c6 Compare May 22, 2026 15:23
@0xdeafbeef 0xdeafbeef force-pushed the counters branch 3 times, most recently from 2950b8b to 675bdcf Compare May 29, 2026 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants