fix(hook): PendingToolRecoveryHook skips patching when memory has historical ToolResults#1512
Conversation
…s historical ToolResultBlocks The hook's `userProvidedResults` check scanned `event.getInputMessages()` which includes the full memory snapshot + user callArgs (see AgentBase.notifyPreCall). Any historical ToolResultBlock in memory from previous tool calls caused the hook to think the user was providing results, silently skipping the auto-recovery. This made the hook ineffective in any conversation that had previously used tools. Fix: extract only the callArgs portion (beyond the memory snapshot boundary) before checking for user-provided ToolResultBlocks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
…ction Cover the defensive branches in the callArgs extraction logic: - callArgs empty (resume scenario, memorySize == inputMessages.size) - user-provided ToolResultBlock in callArgs (HITL scenario) - historical ToolResultBlocks in memory snapshot (the original bug scenario) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AgentScopeJavaBot
left a comment
There was a problem hiding this comment.
🤖 AI Review
The fix targets a real and high-impact bug: PendingToolRecoveryHook.handlePreCall was scanning the full event.getInputMessages() (memory snapshot + user callArgs) for ToolResultBlocks, so any conversation that had ever invoked a tool produced a false positive on userProvidedResults and silently skipped the auto-recovery patch, leaving an orphaned ToolUseBlock that later crashed ReActAgent.doCall with IllegalStateException. The new code slices off the callArgs portion using memory.getMessages().size() as the snapshot boundary, which is the correct shape for the bug. Test coverage is strong: a new PendingToolRecoveryHookTest exercises the three boundary behaviours (historical results present, callArgs empty / resume, user-provided result), and the end-to-end testAutoRecoveryWithHistoricalToolResults in HookStopAgentTest reproduces the production crash path. The main architectural concern (also raised by the existing AI review) is the implicit coupling: the snapshot boundary is recomputed from live memory at hook-execution time, which depends on no higher-priority hook having mutated memory or input length in between. Safe for all built-in hooks today, but worth hardening with an explicit boundary on PreCallEvent in a follow-up.
| // check the user's current input for ToolResultBlocks, not the entire memory | ||
| // history which naturally contains ToolResultBlocks from previous tool calls. | ||
| List<Msg> inputMessages = event.getInputMessages(); | ||
| int memorySize = memory.getMessages().size(); |
There was a problem hiding this comment.
[minor] The callArgs boundary is recomputed here from live memory (memory.getMessages().size()), while inputMessages was built from a snapshot taken at the start of AgentBase.notifyPreCall. This works today because no built-in hook with priority < 10 mutates memory or replaces inputMessages in PreCallEvent, but it bakes in an implicit invariant: a user-defined hook in the 0-9 priority range (the Critical system hooks band documented on Hook.priority()) that calls memory.addMessage(...) will silently shift the boundary, causing either the first N real callArgs to be dropped (when memory.size() > snapshotSize) or the tail of the original snapshot to be misclassified as callArgs (when memory was truncated) — re-introducing the very class of bug this PR fixes. Consider exposing the snapshot boundary on PreCallEvent directly so the contract cannot be invalidated by sibling hooks:
// In AgentBase.notifyPreCall, populate alongside the existing snapshotSize:
PreCallEvent event = new PreCallEvent(this, fullInput, snapshotSize);
// Hook side becomes ordering-independent:
int callArgsStart = event.getCallArgsStartIndex();
List<Msg> callArgs = inputMessages.subList(callArgsStart, inputMessages.size());| List<Msg> inputMessages = event.getInputMessages(); | ||
| int memorySize = memory.getMessages().size(); | ||
| List<Msg> callArgs = | ||
| (inputMessages != null && inputMessages.size() > memorySize) |
There was a problem hiding this comment.
[nit] The inputMessages != null guard is unreachable: PreCallEvent's constructor wraps the input in new ArrayList<>(Objects.requireNonNull(inputMessages, ...)) and setInputMessages also calls Objects.requireNonNull, so event.getInputMessages() cannot return null. The branch can collapse to inputMessages.size() > memorySize ? inputMessages.subList(memorySize, inputMessages.size()) : List.of(). Pure readability nit — not blocking.
| List<Msg> callArgs = | ||
| (inputMessages != null && inputMessages.size() > memorySize) | ||
| ? inputMessages.subList(memorySize, inputMessages.size()) | ||
| : List.of(); |
There was a problem hiding this comment.
[minor] The inputMessages.size() > memorySize check is strict-greater, so the inputMessages.size() < memorySize case (memory grew, or a preceding hook shortened inputMessages) silently degrades to callArgs = List.of() and the patch is skipped without any signal. That state is unexpected and almost certainly indicates a hook-ordering bug — emitting a log.warn (or even an assertion) when inputMessages.size() < memorySize would make such regressions diagnosable instead of manifesting as silent recovery failures downstream.
| memory.getMessages().stream() | ||
| .flatMap(m -> m.getContentBlocks(ToolResultBlock.class).stream()) | ||
| .anyMatch(tr -> "pending_call".equals(tr.getId())); | ||
| assertTrue(patched, "Hook should patch the pending tool call despite historical results"); |
There was a problem hiding this comment.
[minor] patchesDespiteHistoricalToolResults only asserts that a ToolResultBlock for pending_call exists in memory after the hook runs. Tightening the assertion to also check that the output text starts with the [ERROR] prefix (matching the contract enforced in buildErrorToolResult and already asserted in HookStopAgentTest) would catch silent regressions in the recovery payload format, e.g. if someone later refactors buildErrorToolResult without updating both call sites.
| @@ -420,6 +420,101 @@ void testNewMsgWithPendingToolUseContinuesActing() { | |||
| "Memory should contain an error ToolResultBlock for the pending tool call" | |||
| + " id='tool1'"); | |||
There was a problem hiding this comment.
[praise] testAutoRecoveryWithHistoricalToolResults reproduces the production failure faithfully (historical tool call + result pairs, then a fresh pending ToolUseBlock, then a second call) and validates the fix through the full ReActAgent path rather than the hook in isolation. The inline comment block explaining the bug scenario also makes the regression contract very clear to future readers.
Summary
PendingToolRecoveryHook.handlePreCallchecksevent.getInputMessages()forToolResultBlocks to decide whether the user is providing tool results (HITL/Clarify scenario). However,event.getInputMessages()includes the full memory snapshot + user callArgs (constructed inAgentBase.notifyPreCall).Any historical
ToolResultBlockfrom previous tool calls in memory causesuserProvidedResults = true, silently skipping the auto-recovery patch. This makes the hook ineffective in any conversation that has previously used tools — which is essentially all real-world conversations.Root Cause
Fix
Extract only the callArgs portion (beyond the memory snapshot boundary) before checking:
Reproduction
ToolResultBlocks)ToolUseBlockin memoryuserProvidedResultsistruedue to historical results → skips patchReActAgent.doCallthrowsIllegalStateException: Pending tool calls exist without resultsTest plan
testAutoRecoveryWithHistoricalToolResults— pre-populates memory with historical tool call + result pairs, then verifies the hook still patches pending tool callsHookStopAgentTesttests pass (no regression)🤖 Generated with Claude Code