Skip to content

feat(recheck): add LEP-6 storage recheck evidence runtime#290

Merged
j-rafique merged 1 commit intosupernode/LEP-6-chain-client-extensionsfrom
supernode/LEP-6-recheck-evidence
May 4, 2026
Merged

feat(recheck): add LEP-6 storage recheck evidence runtime#290
j-rafique merged 1 commit intosupernode/LEP-6-chain-client-extensionsfrom
supernode/LEP-6-recheck-evidence

Conversation

@j-rafique
Copy link
Copy Markdown
Contributor

Implements the PR-5 Supernode side of LEP-6 storage-truth recheck evidence on top of the PR-4 heal-op dispatch branch.

Public surfaces added:

  • supernode/recheck: Candidate, RecheckResult, Finder, Attestor, Service, ReporterSource, SupernodeReporterSource, eligibility and outcome mapping helpers.
  • pkg/storage/queries: RecheckQueries plus SQLite-backed HasRecheckSubmission and RecordRecheckSubmission.
  • pkg/lumera/modules/audit: GetEpochReportsByReporter query wrapper for network-wide candidate discovery.
  • supernode/storage_challenge: LEP6Dispatcher.Recheck to execute RECHECK-bucket proofs without adding results to epoch reports.

Spec/chain alignment decisions:

  • Candidate discovery is network-wide: the service lists registered supernodes and scans EpochReportsByReporter over the configured lookback window, rather than only scanning this node's own report.
  • Recheck candidate eligibility mirrors chain storage transcript records: only HASH_MISMATCH, TIMEOUT_OR_NO_RESPONSE, OBSERVER_QUORUM_FAIL, and INVALID_TRANSCRIPT originals are eligible.
  • The service rejects self-target candidates and self-reported challenged results because chain SubmitStorageRecheckEvidence rejects creator == challenged_supernode_account and creator == challenged result reporter.
  • Recheck execution maps local PASS to PASS and confirmed hash mismatch to RECHECK_CONFIRMED_FAIL; timeout/quorum/invalid transcript classes remain explicit and are not collapsed.
  • Recheck execution reuses the PR-3 compound dispatcher in RECHECK bucket mode with an isolated temporary buffer so recheck results are submitted only through MsgSubmitStorageRecheckEvidence and are never included in host epoch reports.
  • Local dedup is submit-then-persist keyed by epoch_id + ticket_id (creator/self is implicit locally); tx hard-fail does not persist, while chain replay/already-submitted errors persist local dedup for idempotence.
  • Startup/config wiring is additive under storage_challenge.lep6.recheck and remains disabled unless explicitly enabled.

Tests added/updated:

  • Eligibility matrix for all eligible and rejected result classes.
  • Outcome mapping for PASS, RECHECK_CONFIRMED_FAIL, timeout, quorum, and invalid transcript.
  • Finder lookback/order/limit/local-dedup behavior.
  • Network-wide reporter discovery regression so peer-reported failures are discovered and not self-report-only.
  • Self-target and self-reported candidate rejection pinned against chain validation.
  • Service mode gate and submit path.
  • Attestor submit-then-persist, tx hard-fail retry safety, idempotent already-submitted handling, and required-field rejection.
  • SQLite recheck submission idempotence/dedup preservation.
  • Dispatcher RECHECK execution path integration through focused package tests.

Validation:

  • PATH=/home/openclaw/.local/go/bin:$PATH go test ./supernode/recheck ./pkg/storage/queries ./supernode/storage_challenge ./supernode/cmd ./pkg/lumera/modules/audit => PASS
  • PATH=/home/openclaw/.local/go/bin:$PATH go test ./supernode/host_reporter ./supernode/self_healing ./supernode/transport/grpc/self_healing ./supernode/recheck ./pkg/storage/queries ./supernode/storage_challenge ./supernode/cmd ./pkg/lumera/modules/audit => PASS
  • PATH=/home/openclaw/.local/go/bin:$PATH go vet ./supernode/recheck ./pkg/storage/queries ./supernode/storage_challenge ./supernode/cmd ./pkg/lumera/modules/audit ./supernode/host_reporter ./supernode/self_healing ./supernode/transport/grpc/self_healing => PASS
  • git diff --check => PASS
  • PATH=/home/openclaw/.local/go/bin:$PATH go test ./... => expected local environment failure only in pkg/storage/files due missing go-webp system headers webp/decode.h and webp/encode.h; other visible packages pass.

Parent: supernode/LEP-6-heal-op-dispatch @ 043fba4.

@j-rafique j-rafique self-assigned this May 4, 2026
@roomote-v0
Copy link
Copy Markdown

roomote-v0 Bot commented May 4, 2026

Rooviewer Clock   See task

All four items from previous reviews have been addressed. No new issues found in this revision.

  • isAlreadySubmittedError overly broad string matching (supernode/recheck/attestor.go): Fixed -- now matches only "recheck evidence already submitted".
  • Recheck holds mutex across full network I/O (supernode/storage_challenge/lep6_recheck.go): Fixed -- chain queries moved before lock acquisition.
  • fmt.Errorf wraps nil error (supernode/storage_challenge/lep6_recheck.go): Fixed -- separate error/nil handling with clear message.
  • DispatchEpoch holds d.mu across chain queries (supernode/storage_challenge/lep6_dispatch.go): Fixed -- mutex acquisition moved after GetParams/GetEpochAnchor/GetAssignedTargets chain queries, now only held for the dispatchTarget loop.
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

Comment thread supernode/recheck/attestor.go
Comment thread supernode/storage_challenge/lep6_recheck.go Outdated
Comment thread supernode/storage_challenge/lep6_recheck.go Outdated
Comment on lines +23 to +25
paramsResp, err := d.client.Audit().GetParams(ctx)
if err != nil || paramsResp == nil {
return recheck.RecheckResult{}, fmt.Errorf("lep6 recheck: get params: %w", err)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When paramsResp == nil but err == nil, fmt.Errorf("lep6 recheck: get params: %w", err) wraps a nil error, producing the message "lep6 recheck: get params: <nil>". This makes the log output confusing for operators debugging a nil-response scenario. A simple guard like if err != nil { return ..., fmt.Errorf("...: %w", err) }; return ..., fmt.Errorf("lep6 recheck: get params returned nil") would produce a clearer message.

Suggested change
paramsResp, err := d.client.Audit().GetParams(ctx)
if err != nil || paramsResp == nil {
return recheck.RecheckResult{}, fmt.Errorf("lep6 recheck: get params: %w", err)
if err != nil {
return recheck.RecheckResult{}, fmt.Errorf("lep6 recheck: get params: %w", err)
}
if paramsResp == nil {
return recheck.RecheckResult{}, fmt.Errorf("lep6 recheck: get params returned nil response")
}

Fix it with Roo Code or mention @roomote and request a fix.

@j-rafique j-rafique force-pushed the supernode/LEP-6-recheck-evidence branch from b69d272 to 5df3adc Compare May 4, 2026 16:01
Comment on lines +170 to +172
func (d *LEP6Dispatcher) DispatchEpoch(ctx context.Context, epochID uint64) error {
d.mu.Lock()
defer d.mu.Unlock()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DispatchEpoch acquires d.mu here and holds it across GetParams, GetEpochAnchor, and GetAssignedTargets chain queries (lines 174-197) before reaching any d.buffer access. Since Recheck also acquires d.mu, a slow chain query in DispatchEpoch blocks all recheck candidates for the duration. The same narrowing applied to Recheck (moving chain queries before the lock) could be applied here -- acquire the lock only before the dispatchTarget loop at line 226, after the chain queries complete. The severity depends on chain query latency and how often recheck ticks overlap with epoch dispatch.

Fix it with Roo Code or mention @roomote and request a fix.

@j-rafique j-rafique force-pushed the supernode/LEP-6-recheck-evidence branch from 5df3adc to f151e1c Compare May 4, 2026 16:21
Implements the PR-5 Supernode side of LEP-6 storage-truth recheck evidence on top of the PR-4 heal-op dispatch branch.

Public surfaces added:
- supernode/recheck: Candidate, RecheckResult, Finder, Attestor, Service, ReporterSource, SupernodeReporterSource, eligibility and outcome mapping helpers.
- pkg/storage/queries: RecheckQueries plus SQLite-backed HasRecheckSubmission and RecordRecheckSubmission.
- pkg/lumera/modules/audit: GetEpochReportsByReporter query wrapper for network-wide candidate discovery.
- supernode/storage_challenge: LEP6Dispatcher.Recheck to execute RECHECK-bucket proofs without adding results to epoch reports.

Spec/chain alignment decisions:
- Candidate discovery is network-wide: the service lists registered supernodes and scans EpochReportsByReporter over the configured lookback window, rather than only scanning this node's own report.
- Recheck candidate eligibility mirrors chain storage transcript records: only HASH_MISMATCH, TIMEOUT_OR_NO_RESPONSE, OBSERVER_QUORUM_FAIL, and INVALID_TRANSCRIPT originals are eligible.
- The service rejects self-target candidates and self-reported challenged results because chain SubmitStorageRecheckEvidence rejects creator == challenged_supernode_account and creator == challenged result reporter.
- Recheck execution maps local PASS to PASS and confirmed hash mismatch to RECHECK_CONFIRMED_FAIL; timeout/quorum/invalid transcript classes remain explicit and are not collapsed.
- Recheck execution reuses the PR-3 compound dispatcher in RECHECK bucket mode with an isolated temporary buffer so recheck results are submitted only through MsgSubmitStorageRecheckEvidence and are never included in host epoch reports.
- Local dedup is submit-then-persist keyed by epoch_id + ticket_id (creator/self is implicit locally); tx hard-fail does not persist, while chain replay/already-submitted errors persist local dedup for idempotence.
- Startup/config wiring is additive under storage_challenge.lep6.recheck and remains disabled unless explicitly enabled.

Tests added/updated:
- Eligibility matrix for all eligible and rejected result classes.
- Outcome mapping for PASS, RECHECK_CONFIRMED_FAIL, timeout, quorum, and invalid transcript.
- Finder lookback/order/limit/local-dedup behavior.
- Network-wide reporter discovery regression so peer-reported failures are discovered and not self-report-only.
- Self-target and self-reported candidate rejection pinned against chain validation.
- Service mode gate and submit path.
- Attestor submit-then-persist, tx hard-fail retry safety, idempotent already-submitted handling, and required-field rejection.
- SQLite recheck submission idempotence/dedup preservation.
- Dispatcher RECHECK execution path integration through focused package tests.

Validation:
- PATH=/home/openclaw/.local/go/bin:$PATH go test ./supernode/recheck ./pkg/storage/queries ./supernode/storage_challenge ./supernode/cmd ./pkg/lumera/modules/audit => PASS
- PATH=/home/openclaw/.local/go/bin:$PATH go test ./supernode/host_reporter ./supernode/self_healing ./supernode/transport/grpc/self_healing ./supernode/recheck ./pkg/storage/queries ./supernode/storage_challenge ./supernode/cmd ./pkg/lumera/modules/audit => PASS
- PATH=/home/openclaw/.local/go/bin:$PATH go vet ./supernode/recheck ./pkg/storage/queries ./supernode/storage_challenge ./supernode/cmd ./pkg/lumera/modules/audit ./supernode/host_reporter ./supernode/self_healing ./supernode/transport/grpc/self_healing => PASS
- git diff --check => PASS
- PATH=/home/openclaw/.local/go/bin:$PATH go test ./... => expected local environment failure only in pkg/storage/files due missing go-webp system headers webp/decode.h and webp/encode.h; other visible packages pass.

Parent: supernode/LEP-6-heal-op-dispatch @ 043fba4.
@j-rafique j-rafique force-pushed the supernode/LEP-6-recheck-evidence branch from f151e1c to f120f6e Compare May 4, 2026 16:30
@j-rafique j-rafique merged commit 1ee63e1 into supernode/LEP-6-chain-client-extensions May 4, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant