Handle req-resp responses with multiple chunks#407
Conversation
Greptile SummaryThis PR fixes multi-chunk req-resp decoding by reworking
Confidence Score: 3/5The encoding fix is correct and tested, but the range-sync retry path is broken for multi-batch ranges, causing failures to be silently dropped. The encoding fix is sound and well-tested. The concern is handle_retry_range_sync: the deduplication key (start_slot, total_end_slot) is never stored for multi-batch ranges, so the guard always fires and retries are always skipped. Long-range sync will silently stall after any first-batch failure. crates/net/p2p/src/lib.rs (handle_retry_range_sync) and crates/net/p2p/src/req_resp/handlers.rs
|
| Filename | Overview |
|---|---|
| crates/net/p2p/src/req_resp/encoding.rs | Core fix: decode_payload now reads one varint-prefixed snappy payload at a time via read_varint + read_snappy_frame helpers; tests cover multi-chunk, large, and empty-payload cases. |
| crates/net/p2p/src/req_resp/handlers.rs | Adds BlocksByRange handling and range-sync retry; retry guard in handle_retry_range_sync is broken for multi-batch ranges, plus a redundant remove call. |
| crates/net/p2p/src/lib.rs | Adds pending_range_requests, PendingRequestKind, and handle_retry_range_sync; retry guard contains the inverted/wrong-key bug. |
| crates/net/p2p/src/req_resp/mod.rs | Re-exports new functions; no issues. |
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
crates/net/p2p/src/lib.rs:436-448
**Retry guard is inverted and uses a stale/wrong key — range sync retries never fire**
`still_needed` is named backwards: it is `true` when `pending_range_requests` does NOT contain the key, yet the function returns early (`return`) precisely when `still_needed` is `true`, so it _skips_ the retry when it should proceed and _proceeds_ when it should skip. More critically, the lookup key is `(start_slot, total_end_slot)`, but `pending_range_requests` stores `(effective_start, batch_end)` where `batch_end = min(total_end_slot, effective_start + MAX_REQUEST_BLOCKS − 1)`. For any multi-batch range, `batch_end < total_end_slot`, so the lookup always returns `false`, `still_needed` is always `true`, and the retry is unconditionally skipped. Range sync failures silently go unretried.
### Issue 2 of 2
crates/net/p2p/src/req_resp/handlers.rs:330-340
Redundant removal — `pending_range_requests` is already cleared by the caller before `handle_blocks_by_range_response` is invoked, making this a guaranteed no-op that adds confusion about which site is authoritative.
```suggestion
async fn handle_blocks_by_range_response(
server: &mut P2PServer,
blocks: Vec<SignedBlock>,
peer: PeerId,
start_slot: u64,
end_slot: u64,
total_end_slot: u64,
) {
// Note: (start_slot, end_slot) has already been removed from
// pending_range_requests by the caller before this function is invoked.
```
Reviews (1): Last reviewed commit: "fix req-resp multi-chunk response decodi..." | Re-trigger Greptile
6ae58ff to
90891dd
Compare
18035de to
c07324e
Compare
🗒️ Description / Motivation
decode_payloadpreviously read the entire remaining stream, causing BlocksByRange / multi-block responses to fail when the first chunk decoder consumed bytes belonging to following chunks.What Changed
crates/net/p2p/src/req_resp/encoding.rs.decode_payloadto read one varint-prefixed snappy payload at a time.Correctness / Behavior Guarantees
Tests Added / Run
cargo fmt --checkcargo test -p ethlambda-p2pRelated Issues / PRs
✅ Verification Checklist
make fmt— cleanmake lint(clippy with-D warnings) — cleancargo test --workspace --release— all passing