Skip to content

fix(hook): PendingToolRecoveryHook skips patching when memory has historical ToolResults#1512

Open
Xingxing-Wang wants to merge 2 commits into
agentscope-ai:mainfrom
Xingxing-Wang:fix/pending-tool-recovery-hook
Open

fix(hook): PendingToolRecoveryHook skips patching when memory has historical ToolResults#1512
Xingxing-Wang wants to merge 2 commits into
agentscope-ai:mainfrom
Xingxing-Wang:fix/pending-tool-recovery-hook

Conversation

@Xingxing-Wang
Copy link
Copy Markdown

Summary

PendingToolRecoveryHook.handlePreCall checks event.getInputMessages() for ToolResultBlocks to decide whether the user is providing tool results (HITL/Clarify scenario). However, event.getInputMessages() includes the full memory snapshot + user callArgs (constructed in AgentBase.notifyPreCall).

Any historical ToolResultBlock from previous tool calls in memory causes userProvidedResults = true, silently skipping the auto-recovery patch. This makes the hook ineffective in any conversation that has previously used tools — which is essentially all real-world conversations.

Root Cause

// Before (bug): scans full memory history + user input
List<Msg> inputMessages = event.getInputMessages(); // = memory snapshot + callArgs
boolean userProvidedResults =
    inputMessages.stream().anyMatch(m -> m.hasContentBlocks(ToolResultBlock.class));
// → always true when memory has historical tool results → skip patch

Fix

Extract only the callArgs portion (beyond the memory snapshot boundary) before checking:

// After (fix): scans only user's current input
int memorySize = memory.getMessages().size();
List<Msg> callArgs = inputMessages.subList(memorySize, inputMessages.size());
boolean userProvidedResults =
    callArgs.stream().anyMatch(m -> m.hasContentBlocks(ToolResultBlock.class));

Reproduction

  1. Conversation with 250+ historical tool calls (memory contains many ToolResultBlocks)
  2. Agent calls a new tool → execution crashes/interrupts → orphan ToolUseBlock in memory
  3. User retries → hook detects pending IDs, but userProvidedResults is true due to historical results → skips patch
  4. ReActAgent.doCall throws IllegalStateException: Pending tool calls exist without results
  5. Error saved to MySQL → same state on next retry → infinite error loop

Test plan

  • Added testAutoRecoveryWithHistoricalToolResults — pre-populates memory with historical tool call + result pairs, then verifies the hook still patches pending tool calls
  • All 17 existing HookStopAgentTest tests pass (no regression)

🤖 Generated with Claude Code

…s historical ToolResultBlocks

The hook's `userProvidedResults` check scanned `event.getInputMessages()` which
includes the full memory snapshot + user callArgs (see AgentBase.notifyPreCall).
Any historical ToolResultBlock in memory from previous tool calls caused the hook
to think the user was providing results, silently skipping the auto-recovery.
This made the hook ineffective in any conversation that had previously used tools.

Fix: extract only the callArgs portion (beyond the memory snapshot boundary)
before checking for user-provided ToolResultBlocks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Xingxing-Wang Xingxing-Wang requested a review from a team May 28, 2026 03:31
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 28, 2026

CLA assistant check
All committers have signed the CLA.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

❌ Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
.../agentscope/core/hook/PendingToolRecoveryHook.java 83.33% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

…ction

Cover the defensive branches in the callArgs extraction logic:
- callArgs empty (resume scenario, memorySize == inputMessages.size)
- user-provided ToolResultBlock in callArgs (HITL scenario)
- historical ToolResultBlocks in memory snapshot (the original bug scenario)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@AgentScopeJavaBot AgentScopeJavaBot added bug Something isn't working area/core/agent Agent runtime, pipeline, hooks, plan labels May 28, 2026
AgentScopeJavaBot

This comment was marked as outdated.

Copy link
Copy Markdown
Collaborator

@AgentScopeJavaBot AgentScopeJavaBot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Review

The fix targets a real and high-impact bug: PendingToolRecoveryHook.handlePreCall was scanning the full event.getInputMessages() (memory snapshot + user callArgs) for ToolResultBlocks, so any conversation that had ever invoked a tool produced a false positive on userProvidedResults and silently skipped the auto-recovery patch, leaving an orphaned ToolUseBlock that later crashed ReActAgent.doCall with IllegalStateException. The new code slices off the callArgs portion using memory.getMessages().size() as the snapshot boundary, which is the correct shape for the bug. Test coverage is strong: a new PendingToolRecoveryHookTest exercises the three boundary behaviours (historical results present, callArgs empty / resume, user-provided result), and the end-to-end testAutoRecoveryWithHistoricalToolResults in HookStopAgentTest reproduces the production crash path. The main architectural concern (also raised by the existing AI review) is the implicit coupling: the snapshot boundary is recomputed from live memory at hook-execution time, which depends on no higher-priority hook having mutated memory or input length in between. Safe for all built-in hooks today, but worth hardening with an explicit boundary on PreCallEvent in a follow-up.

// check the user's current input for ToolResultBlocks, not the entire memory
// history which naturally contains ToolResultBlocks from previous tool calls.
List<Msg> inputMessages = event.getInputMessages();
int memorySize = memory.getMessages().size();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] The callArgs boundary is recomputed here from live memory (memory.getMessages().size()), while inputMessages was built from a snapshot taken at the start of AgentBase.notifyPreCall. This works today because no built-in hook with priority < 10 mutates memory or replaces inputMessages in PreCallEvent, but it bakes in an implicit invariant: a user-defined hook in the 0-9 priority range (the Critical system hooks band documented on Hook.priority()) that calls memory.addMessage(...) will silently shift the boundary, causing either the first N real callArgs to be dropped (when memory.size() > snapshotSize) or the tail of the original snapshot to be misclassified as callArgs (when memory was truncated) — re-introducing the very class of bug this PR fixes. Consider exposing the snapshot boundary on PreCallEvent directly so the contract cannot be invalidated by sibling hooks:

// In AgentBase.notifyPreCall, populate alongside the existing snapshotSize:
PreCallEvent event = new PreCallEvent(this, fullInput, snapshotSize);

// Hook side becomes ordering-independent:
int callArgsStart = event.getCallArgsStartIndex();
List<Msg> callArgs = inputMessages.subList(callArgsStart, inputMessages.size());

List<Msg> inputMessages = event.getInputMessages();
int memorySize = memory.getMessages().size();
List<Msg> callArgs =
(inputMessages != null && inputMessages.size() > memorySize)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] The inputMessages != null guard is unreachable: PreCallEvent's constructor wraps the input in new ArrayList<>(Objects.requireNonNull(inputMessages, ...)) and setInputMessages also calls Objects.requireNonNull, so event.getInputMessages() cannot return null. The branch can collapse to inputMessages.size() > memorySize ? inputMessages.subList(memorySize, inputMessages.size()) : List.of(). Pure readability nit — not blocking.

List<Msg> callArgs =
(inputMessages != null && inputMessages.size() > memorySize)
? inputMessages.subList(memorySize, inputMessages.size())
: List.of();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] The inputMessages.size() > memorySize check is strict-greater, so the inputMessages.size() < memorySize case (memory grew, or a preceding hook shortened inputMessages) silently degrades to callArgs = List.of() and the patch is skipped without any signal. That state is unexpected and almost certainly indicates a hook-ordering bug — emitting a log.warn (or even an assertion) when inputMessages.size() < memorySize would make such regressions diagnosable instead of manifesting as silent recovery failures downstream.

memory.getMessages().stream()
.flatMap(m -> m.getContentBlocks(ToolResultBlock.class).stream())
.anyMatch(tr -> "pending_call".equals(tr.getId()));
assertTrue(patched, "Hook should patch the pending tool call despite historical results");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] patchesDespiteHistoricalToolResults only asserts that a ToolResultBlock for pending_call exists in memory after the hook runs. Tightening the assertion to also check that the output text starts with the [ERROR] prefix (matching the contract enforced in buildErrorToolResult and already asserted in HookStopAgentTest) would catch silent regressions in the recovery payload format, e.g. if someone later refactors buildErrorToolResult without updating both call sites.

@@ -420,6 +420,101 @@ void testNewMsgWithPendingToolUseContinuesActing() {
"Memory should contain an error ToolResultBlock for the pending tool call"
+ " id='tool1'");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[praise] testAutoRecoveryWithHistoricalToolResults reproduces the production failure faithfully (historical tool call + result pairs, then a fresh pending ToolUseBlock, then a second call) and validates the fix through the full ReActAgent path rather than the hook in isolation. The inline comment block explaining the bug scenario also makes the regression contract very clear to future readers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core/agent Agent runtime, pipeline, hooks, plan bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants