Skip to content

trace_api: history-solution upgrade - auto ABIs, query APIs, indexes#295

Open
heifner wants to merge 33 commits into
masterfrom
feature/trace-api-history
Open

trace_api: history-solution upgrade - auto ABIs, query APIs, indexes#295
heifner wants to merge 33 commits into
masterfrom
feature/trace-api-history

Conversation

@heifner
Copy link
Copy Markdown
Contributor

@heifner heifner commented Apr 14, 2026

Summary

Upgrades trace_api_plugin to serve as a complete history solution for exchanges and indexers. Five themes:

  1. Auto-captured ABIs. --trace-rpc-abi and --trace-no-abis are removed. Plugin captures ABIs from chain directly (setabi traces in real time, lazy-fetch on first encounter), versioned per global_sequence so actions decode with the exact ABI that was in effect when they ran. Append-only abi_log replaces the earlier O(N^2) abi_store rewrite.

  2. Continuity enforced at startup. Gaps between existing trace data and chain head shut the node down with operator-facing guidance (load a snapshot covering the gap, copy slices from another node, or delete the trace dir). Overlap from a snapshot restore is tolerated (re-applied blocks overwrite existing slice entries).

  3. O(1) lookups. Per-slice trace_trx_idx_<range>.log (open-addressing hash table) replaces the linear scan in get_transaction_trace; per-slice trace_blk_idx_<range>.log (sparse uint64_t array) replaces the linear scan in get_block. Both sidecars are built alongside the existing metadata log and fall back to the scan if a sidecar is missing or corrupt.

  4. New query endpoints over the existing trace data.

  5. Per-slice bloom for get_actions slice-skip. New trace_recv_bloom_<range>.log sidecar carries two boost::bloom filters (K=7, FPR=0.01): one over action receiver, one over packed (receiver, action) pairs. get_actions probes the bloom once per slice in the requested block range; on a negative probe the entire slice is skipped with no get_block call. Turns "receiver never appears in this slice" from a full-slice scan into a single file read. Missing or CRC-corrupt sidecar falls back to the scan (fail-safe - a false negative would silently drop matching actions from the response). Written live during extraction at slice roll-over; cleaned up alongside the other per-slice files during retention pruning.

New endpoints

POST /v1/trace_api/get_actions

Paginated search over action traces across a block range with optional filters. The handler clamps the block window to [block_num_start, block_num_start + trace-max-block-range - 1] and reports the actual window scanned on the response so clients can resume pagination by advancing block_num_start.

Request fields:

Field Type Default Description
block_num_start uint32 0 First block to scan (inclusive).
block_num_end uint32 UINT32_MAX Last block to scan; clamped server-side.
receiver string any Filter on act.receiver.
account string any Filter on act.account (contract code).
action string any Filter on action name.
include_notifications bool false When false, specifying exactly one of receiver/account implicitly constrains the other to the same value (canonical execution only, no notification copies). When true, only the specified filters apply.

Response rows carry the full action-trace execution-tree context (action_ordinal, creator_action_ordinal, recv_sequence, auth_sequence, code_sequence, abi_sequence, account_ram_deltas, optional per-action cpu_usage_us / net_usage), the ABI-decoded params and return_data (or a decode_error field with the raw hex when the ABI is unavailable), and the enclosing block / transaction context (trx_id, block_num, block_time, producer_block_id, block_status, trx_cpu_usage_us, trx_net_usage_words). block_status is "irreversible" or "pending", sourced from the same data-log tuple that powers get_block's status field; operators wanting only-irreversible responses can run nodeop with read-mode = irreversible. Within a transaction actions are ordered by global_sequence (execution order). trx_cpu_usage_us and trx_net_usage_words are the parent transaction's totals; they differ from the action-level cpu_usage_us / net_usage in scope (whole trx vs single action) and units (action net_usage is bytes; trx net_usage_words is ceil(net_usage / 8)).

POST /v1/trace_api/get_token_transfers

Convenience preset of get_actions with receiver = account = <token_contract> and action = transfer. Defaults token_contract to sysio.token. Because it constrains receiver == account, each transfer appears exactly once — the canonical execution, not the inline notifications to sender and recipient.

Returns a slim subset of get_actions fields optimized for exchange/indexer workflows (omits execution-tree ordinals, receipt sequences, RAM deltas, and CPU/NET usage); block_status is retained because exchanges crediting transfers need finality. Use get_actions directly if you need the full row shape.

Breaking changes

  • Removed CLI options: --trace-rpc-abi, --trace-no-abis. Operators no longer supply abi.json files.
  • action_trace_v0 gains action_ordinal, creator_action_ordinal, closest_unnotified_ancestor_action_ordinal, recv_sequence, auth_sequence, code_sequence, abi_sequence, account_ram_deltas, optional cpu_usage_us / net_usage. Persisted trace format is incompatible with earlier builds; operators must delete the trace directory on upgrade.
  • Failed actions no longer persisted; status removed from transaction_trace_v0.
  • JSON field renames: action -> name on actions, account -> actor on authorizations.

New config

  • --trace-max-block-range (default 1000, clamped [1, 10000]) — silently caps block_num_end - block_num_start + 1 for the range-scanning query endpoints.

On-disk layout (additions)

File Scope Purpose
trace_trx_idx_<range>.log per slice mmap'd hash table: trx_id_prefix64 -> block_num
trace_blk_idx_<range>.log per slice sparse uint64_t[stride] block_num -> trace offset
trace_recv_bloom_<range>.log per slice boost::bloom filters over action receiver and packed (receiver, action) pairs; probed by get_actions for O(1) slice-skip
abi_log.log global append-only records (account, global_seq, blob) + CRC32

Docs

Full endpoint reference, configuration table, on-disk layout, ABI versioning semantics, pagination guide, and exchange/indexer integration notes live in plugins/trace_api_plugin/trace_api_plugin.md.

heifner added 22 commits April 13, 2026 11:52
Add first_recorded_block() and last_recorded_block() to slice_directory
and store_provider by scanning the highest/lowest index slice files.

On the first block_start signal after startup, check_continuity() in
chain_extraction_impl_type validates the relationship between existing
trace data and the current chain head:

- No prior data: fresh start, log and proceed.
- Chain head within [first_recorded, last_recorded+1]: overlap or exact
  continuation; re-applied blocks overwrite existing slice entries, which
  allows recovery from disk corruption by replaying from a snapshot.
- Chain head < first_recorded: snapshot predates trace history start,
  error with operator guidance to delete the trace directory.
- Chain head > last_recorded+1: forward gap, error with guidance.

12 new unit tests cover all cases in test_continuity.cpp.
Replaces the O(n) linear scan in get_trx_block_number() with a compact
open-addressing hash table (load factor <=0.5, power-of-2 bucket count,
linear probing) written as a trace_trx_idx_<range>.log sidecar per
slice. The maintenance thread builds the index atomically (write .tmp,
rename) after each slice becomes irreversible. Lookups do a per-slice
fast-path through the reader; the linear scan fallback remains for
slices not yet indexed. Also fixes _max_filename_size to account for
the 14-char "trace_trx_idx_" prefix (was sized for the 12-char
"trace_index_" prefix).
…i/--trace-no-abis

Replace the file-based --trace-rpc-abi option with automatic ABI capture:

- abi_store: sorted on-disk index mapping (account, global_seq) -> ABI bytes,
  enabling O(log n) point-in-time lookups per action. Written atomically via
  temp-file rename; loaded at startup for continuity across restarts.

- chain_extraction: lazy-fetches each new account's ABI via find_account_metadata
  on first observation, and captures setabi transactions in real time to track
  ABI version changes at the exact global_sequence where they took effect.

- abi_data_handler: replaced static add_abi() map with an abi_lookup_fn callback
  that performs versioned (account, global_seq) lookups from the ABI store.
  Decoding failures are now soft (log at debug, fall back to raw hex) since ABIs
  are auto-captured rather than operator-provided.

- Removed --trace-rpc-abi and --trace-no-abis options from nodeop, all integration
  tests, TestHarness, config templates, tools, tutorials, and the examples/abis/
  directory of hand-maintained ABI files.
… endpoints

get_actions: paginated action search over a block range with optional
filters on receiver, account, and action name. Pagination uses an
after_global_seq cursor and returns more+last_global_seq for the next
page. ABI-decoded params are included when available.

get_token_transfers: convenience preset of get_actions with
receiver=account=token_contract (default sysio.token), action=transfer.
Using receiver=token_contract yields exactly one result per transfer --
the canonical execution, not the inline notification copies.

Also fixes a latent UB: fc::to_hex is now guarded against empty data
vectors (nullptr .data()).

Adds 12 unit tests covering filters, pagination, multi-block scan,
ABI decoding, and the token transfer deduplication behaviour.
…test assertions

- Add --trace-max-query-limit (default 1000, -1=unlimited) so operators
  running private nodes can remove per-request caps on get_actions /
  get_token_transfers queries
- Add comprehensive plugin documentation covering all endpoints,
  configuration options, on-disk layout, ABI decoding, pagination,
  and exchange/indexer integration guidance
- Extend nodeop_run_test.py with get_actions and get_token_transfers
  assertions (params decoded, trx_id matching, receiver filtering)
- Fix Cluster.py: add trailing space after producer_api_plugin arg to
  prevent concatenation with subsequent extra args when spacer was removed
Prevent integer overflow in the ABI blob offset accumulator when total
ABI data exceeds 4 GB. blob_offset and blob_size in abi_store_index_entry
are promoted from uint32_t to uint64_t (struct grows from 24 to 32 bytes);
the local running_offset and blob_offsets vector in abi_store_writer::write()
follow suit, removing the truncating static_cast<uint32_t> casts.
Implements spring#1438 by capturing and exposing the full set of action
receipt and execution-tree fields on every action trace:

- action_trace_v0 now carries action_ordinal, creator_action_ordinal,
  closest_unnotified_ancestor_action_ordinal, recv_sequence, auth_sequence
  (flat_map<name, uint64_t>), code_sequence, abi_sequence,
  account_ram_deltas, and optional cpu_usage_us / net_usage (populated for
  top-level input actions, where producers set deterministic budgets).
- authorization_trace_v0: rename account -> actor to match on-chain naming.
- account_delta_v0: new struct for account_ram_deltas.
- JSON output uses "name" (was "action") for the action name and "actor"
  (was "account") inside authorization entries.

Remove failed-action support:
- chain_extraction filters context-free AND failed action traces (at.except
  set) from stored block traces.
- transaction_trace_v0 no longer carries a status field; all persisted
  transactions are executed.
- get_transaction_trace / get_actions / get_token_transfers no longer
  accept or return a "failed" indicator.

Slim response preserved: get_token_transfers returns only transfer-relevant
fields (omits ordinals, receipt sequences, ram_deltas, cpu/net usage).
Callers needing those can call get_actions directly.

Tests and plugin docs updated accordingly.
Addresses AntelopeIO/leap#1219: /v1/trace_api/get_block response time
varies by block height, with a worst case at the end of each
trace-slice-stride window (~200ms for the last block of a 10k-block
slice).

Root cause: store_provider::get_block scanned trace_index_<range>.log
from offset 0, unpacking up to stride-many metadata_log_entry records
before finding the target block's offset.

Fix: add trace_blk_idx_<range>.log, a flat sparse array of
stride-many uint64_t slots indexed by (block_num - slice_base).
Each slot holds offset+1 into trace_<range>.log (0 reserved as empty).
The sidecar is written synchronously alongside the existing metadata
log, pre-allocated on creation, and updated in place on fork re-writes.

Read path (get_block): one seek + one read on the sidecar.  Falls back
to the existing scan when the sidecar is missing or the slot is empty,
preserving correctness for unusual states.  Uses the in-memory
best_known_lib for O(1) irreversibility, replacing the lib-entry scan.

Write path (append): adds one 8-byte pwrite per block alongside the
existing trace + metadata appends.  Trivial cost; file is sparse on
Linux (~80KB per slice at default stride).

Cleanup: include trace_blk_idx_* in slice rotation; doc updates.
…ge cap

Drops `limit`, `more`, `last_global_seq`, and `after_global_seq` from
action_query and actions_result.  Adds `--trace-max-block-range` (default
100) which silently clamps `block_num_end` to
`block_num_start + max - 1` in the HTTP handler.  Clients paginate by
advancing `block_num_start` by `max_block_range` between calls.

Removes the unbounded scan that an unauthenticated empty-body request to
`/v1/trace_api/get_token_transfers` could previously trigger against
block 0..UINT32_MAX.

Sort by `global_sequence` inside `get_actions_impl` is retained: chain
`action_traces` arrive in schedule order, which is NOT execution order
when an action queues both `require_recipient` and inline actions.
Iterating without the sort would mix a notification handler's inline
ahead of later notifications.  Behavior matches `chain_plugin`'s
`push_transaction` tree shape.

Doc tag-along: also updates the `abi_store.log` layout block to reflect
the previously-widened blob_offset/blob_size (uint64) — pre-existing
stale doc, fixed while updating other doc sections.
Bulk get_actions queries over a single contract were rebuilding the ABI
serializer per action: a 100-action response triggered 100 abi_def
unpacks and 100 abi_serializer constructions.  abi_serializer
construction walks all types in the ABI and is non-trivial.

Adds a 128-entry LRU keyed by (account.value, global_seq) protected by a
mutex.  Cache misses do the expensive abi_def unpack and
abi_serializer construction OUTSIDE the lock so concurrent cache users
do not block on a slow miss.  On a race where two threads miss for the
same key, the second observer of the inserted entry wins and the
duplicate construction is discarded.
The reader was re-opening the file on every lookup and trusting cached
_blob_area_offset.  Between the writer's atomic rename and the swap of
the shared_ptr<abi_store_reader>, an in-flight HTTP-thread reader
holding the OLD reader (OLD index, OLD blob_area_offset) could resolve
the path to the NEW inode and seek to an offset computed against the
old layout inside the new file, returning garbage bytes.

Holding the file via boost::iostreams::mapped_file_source pins the inode
through the rename, removes the per-lookup syscalls (open + seek + read
+ close), and lets us bounds-check the blob slice against the mapping.
The on-disk layout matches the in-memory struct on x86_64 (fc::raw::pack
is little-endian, structs are unpadded), so we memcpy the header and
index out of the mapping at construction.

Widens abi_store_header::entry_count to uint64_t and drops _reserved.
Header stays 16 bytes via 4+4+8 layout.  Format version intentionally
not bumped: chain has not launched.
Rewriting the entire abi_store.log on every captured ABI was O(N^2) during
fresh-start replay: each first-encountered contract triggered a full sort
and rewrite of the growing file (plus a reload of the mmap-backed reader).
With hundreds to thousands of distinct contracts touched in early blocks,
the extraction thread was spending seconds on file I/O per block.

Replaces it with an append-only log:

  Header (16 bytes): magic "ABIL" (u32), version (u32), reserved (u64)
  Records:           account(u64) + global_seq(u64) + blob_size(u64)
                     + blob_bytes + crc32

Appends stream to the tail under a mutex.  The lookup index lives in memory
as std::map<(account, global_seq), {offset, size}>, built by scanning the
file at startup and validating each record's CRC32.  Lookups take the index
mutex only long enough to copy (offset, size), then pread the blob from
the held cfile's fileno -- concurrent lookups never serialize on I/O.

Writes are not fsync'd.  Tail corruption from a kernel crash is handled by
the startup recovery scan: any CRC failure truncates the file at the first
bad record.  Lost entries are recaptured the next time their contract is
touched (setabi observation or lazy current-ABI fetch), so no replay is
required.

Separate mutexes for appending vs index so lookups do not block on a
concurrent append's file write.  CRC uses boost::crc_32_type, matching
the convention in fc::datastream_crc and finalizer.hpp.

Format version intentionally not bumped beyond 1: chain has not launched,
so any existing abi_store.log files from earlier builds of this branch
can be deleted before upgrading.  No auto-migration code.
on_applied_transaction runs AFTER all actions in a transaction have been
applied, so lazy-fetching the current chain-DB ABI on first encounter of
an account produces the POST-setabi ABI when that account's ABI is being
replaced in the same trx.  Recording it as account@global_seq=0 would
then be served by upper_bound step-back for any pre-setabi action on that
account, decoding those actions with the wrong schema.

Fix: scan action_traces once for setabi targets, then on the second pass
skip the lazy fetch for any account whose ABI is being replaced in this
trx.  For the narrow edge case (never-before-observed contract with a
setabi in its first-observed trx AND pre-setabi actions on it), those
pre-setabi actions now return raw hex -- strictly safer than wrong data,
and the correct behavior once any earlier trx has recorded an ABI.

Also documents the caveat in trace_api_plugin.md and adds three
extraction tests (skip-when-target, prior-ABI-survives, sibling-lazy-
fetch-still-fires).
…rst-encounter

The lazy-fetch path tracked an std::unordered_set<uint64_t> of every
account ever observed, growing without bound for the lifetime of the
node.  Naive LRU eviction is unsafe: re-fetching after eviction would
overwrite an existing X@0 record with the post-any-interim-setabi ABI
and poison pre-setabi lookups.

Replace with abi_log::has_entry(account) -- if the log already records
any (account, *) entry, lazy fetch is skipped.  Memory is now bounded
by the number of contracts that actually have an ABI captured (a few
to a few thousand on realistic chains), not by everything-ever-seen.

For the rare case of a contract account whose ABI is empty in
chainbase, has_entry stays false and we re-fetch on every action -- one
chainbase find_account_metadata call per action, microseconds.

Same change converts uint64_t-typed name fields/keys throughout the
plugin to chain::name (record_header.account, index_key, cache_key,
setabi_targets_this_trx).  chain::name is layout-compatible with
uint64_t so the on-disk format is unchanged; the in-memory code is
type-safe and drops .to_uint64_t() conversions.
The trx_id index reader trusted bucket_count from the file header, so a
corrupt or hostile index could:
  - allocate ~32 GB at startup (header.bucket_count = UINT32_MAX),
  - corrupt lookups (header.bucket_count not a power of two breaks the
    `& mask` math),
  - hang lookups forever (a fully-populated bucket array makes the
    probe-for-empty-slot loop never terminate).

Validate at construction:
  - bucket_count must be a power of two (std::has_single_bit),
  - bucket_count must be <= 2^28 (~268M buckets, ~4 GB; covers any
    realistic slice configuration),
  - file size must equal sizeof(header) + bucket_count * sizeof(bucket).
On any failure, mark the reader invalid; lookups return nullopt and
get_trx_block_number falls back to the linear scan over the trx_id log.

Bound the probe loop in lookup() at bucket_count iterations so even a
hand-crafted file with no empty buckets terminates without hanging.

While here, also bound --trace-slice-stride to [1, 1000000].  The
default is 10000 and the practical realistic range tops out around 100K.
The cap prevents an absurd configuration from making the per-slice
block-offset sidecar pre-allocate gigabytes and from pushing trx_id
bucket_count past the 2^28 cap added above.

Adds 4 reader tests (non-power-of-two, above-cap, file-size mismatch,
fully-populated table doesn't hang).
…rite

The trx_id index path had two related correctness gaps that diverged
from the linear-scan get_trx_block_number behavior:

1. trx_id_index_writer claimed "last write wins per prefix" but actually
   wrote the second entry to the next empty bucket, leaving both. The
   reader returned whichever the linear probe reached first -- usually
   the FIRST inserted entry, opposite of the documented intent. Fix the
   writer to also stop on prefix match and overwrite the existing slot.

2. build_trx_id_index iterated every block_trxs_entry in the trx_id log
   and added each (trx_id, block_num) pair, ignoring fork resolution.
   When a trx briefly appeared in a block that was later forked out
   (and possibly replaced by a different trx set at the same block_num,
   or the trx moved to a later block), the index would point at the
   forked-out block_num instead of the canonical one. Fix by first
   collapsing the log into a canonical map<block_num, ids> -- last
   block_trxs_entry per block_num wins, matching the linear scan's
   trx_block_nums.erase logic -- then feeding that to the writer.

Updates the existing duplicate_prefix64_last_write_wins test to actually
assert the lookup returns the latest value (was previously satisfied by
"doesn't crash"), and adds an integration test that exercises three
fork patterns: trx moved to a later block, trx removed entirely, and
trx canonically at its original block.
The Startup continuity check section claimed gaps "do not prevent the
node from running" -- but check_continuity in chain_extraction.hpp
calls except_handler on a gap, which is wired to app().quit(), so the
node actually shuts down.

Update the doc to describe the real behavior: shut-down on gap, with
operator-facing recovery steps (delete trace dir, load a snapshot
covering the gap, or copy slices from another node).  Also drops the
stale "snapshot restore detected -> warning" row -- the code logs
nothing on overlap, it just resumes silently.

No code change.
…tpicks

Three small fixes bundled:

1. Promote the named "trace_api" logger to a shared header
   (include/sysio/trace_api/logging.hpp) so every translation unit in
   the plugin can route log output through it.  Previously only
   trace_api_plugin.cpp used the named logger; abi_log, trx_id_index,
   store_provider, and chain_extraction logged via the default logger,
   so operator filtering on the "trace_api" entry in logging.json only
   affected a fraction of plugin output.  All wlog/dlog/ilog/elog call
   sites in the plugin now go through fc_*log(_log, ...).  Also adds
   diagnostic logs to the previously silent setabi catch blocks in
   chain_extraction so a malformed setabi action no longer disappears
   without trace.

2. slice_number_from_path returns std::optional<uint32_t> on parse
   failure rather than letting std::stoul throw out of a public method.
   The caller in get_trx_block_number now falls back to its existing
   linear scan when the filename can't be parsed.  Named-local return
   type chosen for NRVO.

3. etc/config/nodeop/aio/config.template.ini: replace the orphaned
   Trace API Plugin comment block with a short note that ABIs are now
   captured automatically (the prior trace-no-abis line was removed
   earlier in this PR but the surrounding context left it unclear what
   operator action was needed, if any -- answer: none).
…naming

Six small cleanups bundled:

* trx_id_index reader/writer use bulk I/O (single read/write of the
  whole bucket array) instead of per-bucket fc::raw pack/unpack +
  datastream creation.  Layout-equivalent on x86_64 LE; static_assert
  on bucket size guards the assumption.
* test_trace_file fixture: replace IIFE-style imperative builders
  with C++20 designated-init aggregates (now that action_trace_v0 has
  17 fields, positional aggregate init was unwieldy and the IIFE was
  worse).
* request_handler: delete duplicated process_authorizations from .cpp;
  promote serialize_authorizations to an inline free function in the
  trace_api namespace; use it from both get_block (response_formatter)
  and the get_actions / get_token_transfers handlers.
* request_handler: drop redundant `data.empty() ? "" : fc::to_hex(...)`
  conditionals (4 sites) -- fc::to_hex(ptr, 0) returns "" without
  dereferencing the pointer.
* store_provider: compute _max_filename_size at compile time across
  every prefix and extension using std::max; adding a longer prefix
  later auto-grows the buffer instead of silently overflowing.
* All headers converted from `namespace sysio { namespace trace_api {`
  to `namespace sysio::trace_api {` (C++17 nested form, matching the
  rest of the plugin).
* Reserved padding fields renamed `_reserved` -> `reserved` (the
  leading-underscore convention is for private members; these are
  public on-disk struct fields).
* Comment on the trx_id writer's bucket vector now explains that the
  zero-init is load-bearing for the empty-slot sentinel that
  terminates the probe loop.
…fication opt-in

Applies 30 fixes from the pre-PR review of feature/trace-api-history:

Correctness
- trace-max-block-range clamped to [1, 10000]; -1 rejected (was unbounded)
- first_and_last_recorded_blocks() replaces two separate optionals so
  callers see a consistent view; NRVO in implementation
- abi_data_handler cache keyed by effective ABI global_seq (via new
  abi_log::lookup_result) so bulk queries hit the cache
- trx_id index hit confirmed against the block's block_trxs_entry before
  returning; collisions fall through to linear scan
- get_transaction_trace scans raw transaction_trace_v0[] and builds the
  variant for the matching trx only; no more JSON-string round-trip

API
- get_actions: include_notifications flag (default false); when off and
  exactly one of account/receiver is set, the other is mirrored
- Response envelope includes the actual block_num_start/end scanned
- decode_error field surfaces ABI decode failures; params keep their
  decoded value when only return_value decode fails
- On-disk magics reversed so a hex dump reads BLIX/TRIX/ABIL/WIRE

Hygiene
- File-scope constexpr _n literals for setabi / sysio.token / transfer
- std::vector -> std::deque for trx_id_index_writer entries (growth cost)
- abi_log: best-effort remove on open failure; blob_offset renamed to
  blob_file_offset; pread/stdio-flush note; append-side-is-single-threaded
  note; last-write-wins note at _index[...] sites
- trace-max-block-range default raised from 100 to 1000
- Continuity-check error text names specific recovery remedies
- Lazy ABI fetch exception logged at debug with account + message
- Removed empty trace_api_rpc_plugin_impl::set_program_options and call sites
- Log prefix "trace_api:" on plugin-init log lines
- Miscellaneous struct alignment, comment clarifications, sort-stability
  note, yield_exception catch-order note

Tests
- test_continuity: new single-invocation-guarantee case with a
  non-throwing except_handler
- Mocks updated for first_and_last_recorded_blocks / decode / lookup_result
- test_trx_id_index uses id.data_size() - 1 instead of hardcoded 31

Docs
- trace_api_plugin.md rewritten: user-facing sections first, on-disk
  layout moved to an "Implementation details" section at the end
- etc/config/nodeop/aio/config.template.ini: commented
  trace-max-block-range line added

Shared helper build_action_variant(action, decoded, variant_shape) moved
from three near-duplicate inline builders to request_handler.cpp so
get_actions, get_token_transfers, and process_block all agree.

Verified: trace_api_plugin builds clean; test_trace_api_plugin passes
all 97 test cases; nodeop links cleanly.
- chain_extraction: fmt::format for continuity error messages;
  drop _ prefix on abi_fetcher / startup_checked members for
  consistency with the rest of the file
- abi_data_handler: clarify serialize_to_variant comment -- it's
  the active path for get_block / get_transaction_trace, not legacy
- trace, request_handler, store_provider: whitespace / comment nits
- tests/sysio_util_snapshot_info_test: flush NamedTemporaryFile
  before sys-util reads it (v2 format reads the footer, so the
  file must be fully on disk); update head_block_id after regen
- Regenerate unittests/snapshots/* and consensus_blockchain/snapshot
  with the corrected WIRE magic number
Three tests read transaction_trace responses from the trace_api and
accessed the action name via transaction["actions"][0]["action"]. That
field was renamed to "name" in this PR; update the call sites.
heifner added 3 commits April 22, 2026 14:38
…t of hot loop

Filter actions before the global_sequence sort so transactions whose actions are all rejected by the receiver/account/action filter skip the sort entirely - the common case when a request scans thousands of blocks.  std::sort replaced with std::ranges::sort and a member-pointer projection on action_trace_v0::global_sequence.

Also in get_actions_impl:
- Reuse the matches vector across trxs/blocks; clear() keeps capacity so repeat scans avoid per-trx allocations.
- Hoist trx.id.str() and the other trx-level fields (block_num, block_time, producer_block_id) out of the match emit loop; a multi-match trx no longer repeats the checksum->hex conversion.
- Materialize has_receiver/has_account/has_action + unwrapped chain::name values once per call; inner predicate compares names directly instead of dereferencing std::optional each time.

All 11 get_actions_tests cases and the full 97-case trace_api suite pass.
Adds trace_recv_bloom_<range>.log per slice containing two boost::bloom filters (K=7, FPR=0.01): one over action_trace_v0::receiver and one over packed (receiver, action) pairs.  get_actions consults the bloom once per slice and advances block_num past the slice on a negative probe, turning the "receiver never appears in this slice" case from a 10,000-block scan into an O(1) file read.

Pieces:

- bloom_sidecar.hpp: header-only bloom_builder + bloom_reader over boost::bloom::filter<uint64_t, 7>.  On-disk header uses the same uint32 magic convention as blk_offset_index_header (0x42524957 -> "WIRB" on little-endian) with in-class initializers and a reserved pad word for natural alignment.  Body is recv bits + recv_action bits + trailing CRC32.  Reader rejects bad magic, version mismatches, truncation, and CRC mismatches; on rejection may_contain_* returns true so the caller falls back to a scan (fail-safe - a false negative would silently drop matching actions).
- store_provider::append() feeds every block's actions into a per-slice builder and flushes to disk at slice roll-over via temp + rename.  A crash between roll-overs leaves the in-progress slice without a bloom; the scan fall-back keeps query correctness at the cost of one scan-only slice until retention ages it out.
- request_handler::get_actions_impl opens the bloom lazily per slice, probes the receiver filter (when set) and the (receiver, action) composite (when both are set), and skips the remainder of the slice on a negative probe.  Unfiltered queries don't touch the bloom.
- slice_directory::run_maintenance_tasks removes trace_recv_bloom_<range>.log alongside the other per-slice files during retention pruning so the sidecar doesn't leak as data is aged out.

Uses boost::bloom (Boost 1.89, header-only via vcpkg) and boost::unordered_flat_set, both already in the installed header set.

Test coverage (10 new cases, all 107 trace_api_plugin cases pass):
- bloom_sidecar_tests: round-trip hits/misses, empty slice produces valid always-miss file, add_block walks every action, missing file / bad magic / CRC corruption / truncation / version mismatch all fail-safe to invalid reader.
- get_actions_tests: valid bloom with the queried receiver absent causes the entire slice to be skipped without any get_block call; no-filter query does not consult the bloom.
…fensive cap

Batch of mechanical + tightening changes from pre-PR review (items 2, 3, 4, 7, 8, 9, 10, 14):

- get_actions_impl: tighten skip_eligible to `has_receiver` since both bloom probes require a receiver.  Previously `has_account && has_action` could open the sidecar for no probe benefit on `include_notifications = true` queries.
- get_actions_impl: rename local filter values receiver_v/account_v/action_v to *_name for readability (matches chain::name type; no "_v suffix" convention elsewhere in the codebase).
- bloom_sidecar: rename struct field `k_hashes` to `k_hash_count` so it no longer shadows the namespace constant bloom::k_hashes; removes the disambiguating-qualification wart.
- bloom_sidecar: add `max_capacity_bits = 128 MiB` defensive bound in bloom_reader::load.  A corrupted or maliciously-crafted sidecar with an absurd capacity could previously trigger a huge std::vector allocation; this caps allocations at a realistic maximum (~500x a busy-mainnet slice bloom).
- bloom_sidecar: port bloom_builder::finalize_and_write and bloom_reader::load from std::ifstream/std::ofstream to fc::cfile for consistency with the rest of the plugin's sidecars.  Reader catches std::exception broadly so any fc::cfile failure (open, read) still falls back to the scan path via _valid == false.
- bloom_sidecar: add `noexcept` to may_contain_receiver and may_contain_recv_action.  Both are const, pure-compute, and the invalid-path is a plain return; annotating lets the compiler elide exception-handling metadata in the get_actions inner loop.
- bloom_sidecar: comment cleanup (`=>` -> `->`).

New test:
- bloom_sidecar_tests/filter_capacity_roundtrip_invariant pins the boost::bloom guarantee `filter{f.capacity()}.capacity() == f.capacity()` (documented in boost/bloom/detail/core.hpp:480) across item counts from 1 to 1000.  A future boost upgrade that quietly breaks this would fail the test rather than silently disabling the skip path in production.

All 108 trace_api_plugin tests pass.
heifner added 7 commits April 22, 2026 17:37
)

Moves the per-slice bloom sidecar build out of the synchronous append path and into slice_directory::run_maintenance_tasks, alongside build_trx_id_index.  The write is triggered by LIB crossing the slice rather than by a slice roll-over in append().

Why: under the earlier design a fork that crossed a slice boundary could overwrite an already-flushed bloom with an incomplete one built only from the replayed blocks.  Walk-through: slice K gets flushed when the first block of slice K+1 appends; a fork rolls back into slice K; the next append detects the backward roll-over, flushes the partial slice K+1 builder, then flushes a near-empty slice K on the subsequent forward roll-over - overwriting the correct bloom.  Queries for a receiver in slice K's pre-fork blocks would then get a bloom miss and silently drop the action from the response, defeating the fail-safe.

LIB crossing is the natural guard: a fork cannot reach back across LIB, so the slice's data log is final by the time the bloom is built.  Reading the data log in order still picks up any stale records left by earlier forks within the slice - but that's safe: bloom allows false positives (a forked-out receiver probes as present, the query scan visits the slice and finds no canonical match).  False negatives are the only fatal mode and are eliminated by construction.

Changes:

- slice_directory::build_recv_bloom(slice_number, log): new method mirroring build_trx_id_index.  Opens the uncompressed trace log, streams through each block_trace_v0 record, inserts every action into a bloom_builder, finalize_and_writes the sidecar.  Skips if already built, if the slice has no uncompressed data (already compressed, or never written), if the trace file is empty (0 bytes), or if no record could be parsed (corrupted input).  All I/O errors are captured via FC_LOG_AND_DROP; a failed build leaves no file and the query path falls back to scanning that slice.
- slice_directory::_last_bloomed_slice: new tracker analogous to _last_indexed_slice.
- slice_directory::run_maintenance_tasks: add a second process_irreversible_slice_range pass for bloom building, scheduled after trx_id-index build and before retention pruning / compression so the uncompressed .log is still available.
- store_provider::append: remove the in-append rollover flush.  Bloom building is no longer coupled to slice rollover.
- store_provider: remove _current_bloom_slice and _current_bloom_builder members; bloom state now lives only on disk.

Tests (3 new cases in slice_tests):

- slice_dir_recv_bloom_build_on_lib: asserts the sidecar is absent immediately after append() and present after run_maintenance_tasks crosses LIB past the slice; verifies probes hit appended receivers and largely miss never-appended ones (<=1 false positive across 5 probes).
- slice_dir_recv_bloom_fork_in_slice: appends a block, forks, replays with a different receiver.  Verifies canonical receivers probe present, forked-out receiver also probes present (harmless false positive), and never-appended receivers largely miss.
- slice_dir_recv_bloom_cross_slice_fork: the exact scenario that motivated this fix.  Writes blocks spanning slice 0 and slice 1, forks the tail of slice 0 and head of slice 1, then advances LIB first past slice 0 only (slice 1 bloom absent) and then past slice 1 (both present).  Asserts every canonical receiver in slice 0 probes as present - this would have failed under the earlier design.

All 110 trace_api_plugin tests pass.  nodeop links cleanly; plugin_test sweep green.  Documentation updated to reflect the new build model.
The magic byteswap was extracted to PR #309 targeting
feature/kv-secondary-primary-id. This branch goes back to master's
reference snapshot files and the original 0x57495245 magic; reference
files will be regenerated on top of #309 when it lands.

Reverted:
- libraries/chain/include/sysio/chain/snapshot.hpp - magic restored to
  0x57495245
- unittests/snapshots/{blocks.log,snap_v1.bin.gz,snap_v1.bin.json.gz,
  snap_v1.json.gz} - reverted to master
- unittests/test-data/consensus_blockchain/snapshot - reverted to master
- tests/sysio_util_snapshot_info_test.py - head_block_id reverted; flush
  fix retained
…istory

# Conflicts:
#	tests/sysio_util_snapshot_info_test.py
#	unittests/snapshots/blocks.log
#	unittests/snapshots/snap_v1.bin.gz
#	unittests/snapshots/snap_v1.bin.json.gz
#	unittests/snapshots/snap_v1.json.gz
get_actions previously emitted only the action-level cpu_usage_us /
net_usage on each action variant.  These are per-action and in
different units from the transaction-level totals (action net_usage is
bytes, trx net_usage_words is ceil(net_usage / 8)), so callers that
needed per-transaction resource totals - e.g. PerformanceHarness's
post-test extraction - could not derive them: filtering by action
name drops sibling actions, and even with all actions the units
differ.

Add trx_cpu_usage_us (uint32_t) and trx_net_usage_words
(fc::unsigned_int) alongside the existing per-trx trx_id, block_num,
block_time, and producer_block_id fields on every emitted action.
The values are hoisted once per parent trx so a multi-match trx
doesn't repeat the field reads.
Follow-on to 61906f8.  Per the slim shape's existing intent
("omits the resource usage fields"), trx_cpu_usage_us and
trx_net_usage_words are now gated to full shape; get_token_transfers
no longer emits them.

Also document the two new fields in trace_api_plugin.md - example
response, action field table, and the slim-omits list - and add a
slim test asserting both the action-level and trx-level resource
fields are absent.
…ions

get_block already exposes per-block "status" (irreversible/pending). get_actions and get_token_transfers had no equivalent, so callers had to mix in chain/get_info LIB to know if an action's block was final -- that read is not correlated with trace_api's data log and can disagree with the trace data they just consumed.

Sourcing block_status from the same get_block tuple keeps trace_api as the single source of truth: every action emitted from a block carries that block's finality literal at the moment of read. The slim shape (get_token_transfers) emits it too -- exchanges crediting transfers need finality just as much as general consumers. Operators that want only-irreversible responses can still run nodeop with read-mode = irreversible; every block returned will then carry "irreversible".

The literal is hoisted once per block (shared by every trx and every action in the block) rather than recomputed per emission.

Test mock fixture gained a per-block pending override, with a new test covering both irreversible and pending blocks across full and slim shapes. Doc updated with the new field on both endpoints' examples and the get_actions response-field table, including the irreversible-mode note.
…ture

fee1815 added optional cpu_usage_us / net_usage to action_trace_v0 and
updated test_extraction.cpp's expected fixtures to set both to
fc::unsigned_int{0} on every action, but the make_action_trace helper
that drives the actual chain::action_trace inputs was not updated.
to_action_trace copied the (empty) optionals through, so the actual JSON
omitted both fields while expected JSON contained "cpu_usage_us":0 and
"net_usage":0 -- block_extraction/basic_single_transaction_block and
block_extraction/basic_multi_transaction_block both failed the
block_trace_v0 equality check.

Set both fields on the chain::action_trace returned by the helper so the
fixture is internally consistent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant