feature: log per-op left/right/result memory and cache high-water mark#514
feature: log per-op left/right/result memory and cache high-water mark#514bimalgaudel wants to merge 4 commits into
Conversation
evaleev
left a comment
There was a problem hiding this comment.
Overview
Adds two new trace-mode log structures to the evaluator:
EvalDetailBinary— emits left/right/result bytes plus an estimated cache+result high-water-mark on every binary opCacheDetail— emits a per-event Store/Access/Release record with the key and current data size
Also tightens CacheManager's templated constructor with a requires(!same_as<…, CacheManager>) guard so it can't shadow the implicit copy/move ctors.
Body is empty on the PR — would be worth a sentence on the motivation (profiling memory peaks in eval traces?).
Correctness
The high-water-mark estimate may under-count when canon_phase() != 1 (eval.hpp:386–389):
auto hwmark = log::bytes(cache) + log::bytes(result).value;
if (!cache.alive(node.left())) hwmark += log::bytes(left).value;
if (!cache.alive(node.right())) hwmark += log::bytes(right).value;Rationale assumes that when a child is still alive() in the cache, left/right aliases the cached buffer and shouldn't be double-counted. That's true when phase == 1 (entry::access() returns the same shared pointer), but mult_by_phase allocates a fresh post-phase result while the cache still holds the pre-phase data — so both buffers exist live simultaneously and alive()==true would skip the new allocation. Minor and only affects the estimate, but worth a comment so future maintainers don't think the formula is exact.
The requires clause on the ctor is a real fix, but unrelated to the feature. It guards the templated ctor from out-competing the implicit copy/move when CacheManager is passed by lvalue. Worth either splitting into its own commit or calling out in the PR body, since it changes overload resolution behavior independently of logging.
Style / Project conventions
- Redundant logs at every cache event. Both
log::cache(node, cache)(existingCacheStat) and the newlog::cache(log::CacheDetail{...})are emitted side-by-side ateval.hpp:317–325andeval.hpp:332–338. Consumers will see two lines per event. Consider folding the new fields intoCacheStator makingCacheDetaila replacement rather than an addition. - Brace-init inconsistency at
eval.hpp:393:.mem_hwmark = {hwmark}uses extra braces while the otherBytesmembers are constructed vialog::bytes(...)(which already returnsBytes). Either.mem_hwmark = Bytes{hwmark}or refactor sohwmarkis built up asBytesfrom the start. - API asymmetry surfaced by this PR:
log::bytes(CacheManager)returnssize_twhilelog::bytes(ResultPtr...)returnsBytes, which forces.valueaccesses in the new code. Not new in this PR, but it's the first caller that mixes both — would be cleaner to harmonize the return type. - The new
EvalDetailBinary/CacheDetailstructs lack doc-comments while the existingEvalStat/CacheStatare similarly bare, so this matches local convention. A one-liner per struct would still help.
Tests
- No new tests. The new
CacheManager::alive(key)method is testable in isolation (tests/unit/test_cache_manager.cpp) — worth adding a small case that exercises pre-store / post-store / post-final-access transitions. - No coverage of the new log lines themselves; presumably they're consumed elsewhere — confirm there's at least one integration test path that exercises
EvalTrace=Onso a regression in the format would surface.
Performance / Risk
- All new work is gated by
if constexpr (trace(EvalTrace)), so zero impact when traces are off. Risk to existing flows is essentially nil. - When trace is on, each binary op adds one hash lookup (
cache.alive) per child and one extracache.size_in_bytes()traversal (already happens vialog::bytes(cm)). Negligible.
Security
No external input, no allocations of attacker-controllable size — nothing to flag.
Suggested follow-ups (non-blocking)
- PR body: a sentence on intended consumer (manual inspection? a downstream parser?) so log-format stability expectations are clear.
- Split the
requires-clause hardening into its own commit, or note it in the description. - Either replace
log::cache(node, cache)with the new detail variant, or document that both are emitted intentionally. - Brief unit test for
CacheManager::alive(key_type const&).
Overall: clean, low-risk, mostly mechanical addition. Ship-able once the redundant-log question is resolved and the ctor change is acknowledged.
Drop EvalDetailBinary; collapse to a single Eval line per op, with optional left/right populated only for Sum/Product. Add an alloc field that distinguishes fresh-allocation ops from SumInplace (alloc=0B), so summing the column gives total bytes allocated and `grep alloc=0B` isolates inplace ops. Fix hwmark undercounts in Permute and SumInplace for cached inputs whose canon_phase != 1 or that the permute wrapper turned into a fresh buffer — previously these silently biased hw down.
Replace the sequant.nu helper with sequant_trace.py — a stdlib-only parser exposing trace_eval/trace_cache/trace_term plus a one-pass parse() that returns NamedTuple rows (EvalRow/CacheRow/TermRow with an extras dict for unrecognized fields). Rewrite evaluation.rst with Python recipes and regenerated example outputs that reflect the new trace columns (alloc, hw, left/right) and the per-mode field-population rules. Add a note that the parser is iteration-blind: callers must slice rows by line range or term before aggregating across repeated contexts (the sample mpqc_out.txt is capped at one iteration for this reason).
Summary
Adds per-binary-op memory breakdown to the eval trace logger and a
cache+result high-water-mark estimate, intended for manual inspection
of eval traces while profiling memory peaks. Each cache event now
emits one self-contained line with both the per-entry and total cache
sizes; each binary-op event emits one
EvalDetailline withleft=,right=,result=, andhw=fields. Log columns arelabeled
key=valueto keep lines self-describing, but the format isnot yet a stable contract — downstream parsers should expect churn.
Changes
CacheManagergainsalive(key)andentry_size_in_bytes(key),used by the trace to count cached children at most once when
computing the high-water mark.
canon_phase() != 1, wheremult_by_phasematerializes a freshbuffer alongside the cached pre-phase data.
CacheStatabsorbs the per-entry memory field; the short-livedCacheDetailstruct from an earlier iteration is removed.log::bytes(...)is unified to one variadic that accepts bothResultPtrandCacheManagerarguments and returnsBytes.CacheManager's templated constructor with arequires(!same_as<…, CacheManager>)guard so it no longer shadowsthe implicit copy/move overloads. Unrelated to the logging feature,
kept in this PR for ergonomics.