Skip to content

feat(auto-contextMemory): implement AutoContextMemory with multi-strategy intelligent context compression and offloading#1523

Open
BEASTSHRIRAM wants to merge 5 commits into
agentscope-ai:mainfrom
BEASTSHRIRAM:feat/autocontext-async-perstategy-allsystem
Open

feat(auto-contextMemory): implement AutoContextMemory with multi-strategy intelligent context compression and offloading#1523
BEASTSHRIRAM wants to merge 5 commits into
agentscope-ai:mainfrom
BEASTSHRIRAM:feat/autocontext-async-perstategy-allsystem

Conversation

@BEASTSHRIRAM
Copy link
Copy Markdown
Contributor

AgentScope-Java Version

1.0.13-SNAPSHOT

Description

This PR solves issue #1350 by addressing structural and behavioral limitations within AutoContextMemory and AutoContextHook.

Background & Purpose:
While extending AutoContextMemory, several limitations restricted flexibility and latency:

  1. It was impossible to toggle individual compression strategies (e.g., dropping short messages, summarizing previous rounds) independently.
  2. The core compression loop used synchronous .block() calls, breaking the reactive chain and degrading performance under heavy compression scenarios.
  3. AutoContextHook only preserved the first SYSTEM message, accidentally wiping out any subsequent system instructions (e.g., roleplay directives alongside system prompts).

Changes Made:

  • AutoContextConfig: Introduced strategy1Enabled through strategy6Enabled (defaulting to true) to allow fine-grained control over which compression rules execute.
  • AutoContextMemory: Refactored the core implementation to be completely asynchronous using Spring Reactor types (Mono and Flux). All synchronous .block() wrappers were eliminated internally, and the primary entry point is now compressIfNeededAsync(). (The synchronous compressIfNeeded() is kept as a @Deprecated wrapper for backwards compatibility).
  • AutoContextHook: Modified handlePreReasoning to collect and preserve all SYSTEM messages. The memory-offloading instruction is now correctly appended to only the first SYSTEM message without discarding the others.
  • Testing: Comprehensive tests were added to AutoContextHookTest, AutoContextMemoryTest, and AutoContextConfigTest to verify that concurrent evaluation logic resolves correctly and that multiple system prompts remain intact.

How to test this PR:

  • Verify tests via mvn clean test -pl agentscope-extensions/agentscope-extensions-autocontext-memory -am.
  • Spotless formatting can be verified using mvn spotless:check.

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has been formatted with mvn spotless:apply
  • All tests are passing (mvn test)
  • Javadoc comments are complete and follow project conventions
  • Related documentation has been updated (e.g. links, examples, etc.)
  • Code is ready for review

@BEASTSHRIRAM BEASTSHRIRAM requested review from a team and Copilot May 28, 2026 18:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR makes AutoContext context compression fully reactive (non-blocking), adds per-strategy enable/disable switches, and improves SYSTEM message handling so multiple SYSTEM prompts are preserved while appending the offload instruction only to the first.

Changes:

  • Refactor compression pipeline to compressIfNeededAsync() using Reactor (Mono/Flux) and deprecate the blocking compressIfNeeded().
  • Add configuration flags to enable/disable compression strategies 1–6.
  • Preserve multiple SYSTEM messages in AutoContextHook and add a unit test covering this behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 12 comments.

File Description
agentscope-extensions/agentscope-extensions-autocontext-memory/src/main/java/io/agentscope/core/memory/autocontext/AutoContextMemory.java Converts compression strategies to a reactive pipeline; adds concurrency for some summarization paths; keeps a deprecated blocking wrapper.
agentscope-extensions/agentscope-extensions-autocontext-memory/src/main/java/io/agentscope/core/memory/autocontext/AutoContextHook.java Switches PreReasoning handling to the async compressor and preserves multiple SYSTEM messages.
agentscope-extensions/agentscope-extensions-autocontext-memory/src/main/java/io/agentscope/core/memory/autocontext/AutoContextConfig.java Adds per-strategy enable/disable switches and exposes them via getters + builder.
agentscope-extensions/agentscope-extensions-autocontext-memory/src/test/java/io/agentscope/core/memory/autocontext/AutoContextHookTest.java Adds test to ensure multiple SYSTEM messages are preserved and instruction is appended only to the first.

@BEASTSHRIRAM BEASTSHRIRAM changed the title feat: implement AutoContextMemory with multi-strategy intelligent context compression and offloading feat(auto-contextMemory): implement AutoContextMemory with multi-strategy intelligent context compression and offloading May 28, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

❌ Patch coverage is 89.95902% with 49 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...ope/core/memory/autocontext/AutoContextMemory.java 88.29% 14 Missing and 34 partials ⚠️
...scope/core/memory/autocontext/AutoContextHook.java 97.61% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Collaborator

@AgentScopeJavaBot AgentScopeJavaBot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Review

This PR makes three valuable changes: per-strategy enable/disable flags, multi-SYSTEM message preservation in the hook, and conversion of the compression pipeline from blocking .block() calls to a Reactor chain. The refactor is comprehensive and well-documented, with new tests covering the SYSTEM-preservation and disabled-strategy cases. However, the new concurrent execution in Strategy 4 and Strategy 5 introduces a real thread-safety regression: offloadContext is a plain HashMap and is now mutated from multiple boundedElastic worker threads simultaneously. There are also a few smaller issues (loss of subscribeOn(boundedElastic) in the hook, hard-coded block(60s) in the deprecated wrapper, raw Object[] tuple in Strategy 4, misleading "exhausted" log when all strategies are disabled). Recommend changes before merge.

offloadMsg.add(msg);
return Mono.fromRunnable(
() -> {
offload(uuid, offloadMsg);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[critical] Thread-safety violation. offloadContext is declared as a plain HashMap (line 98), but here offload(uuid, offloadMsg) (which calls offloadContext.put) is invoked inside a Flux.flatMap(..., concurrency=5) whose inner Mono.fromRunnable(...) is dispatched onto Schedulers.boundedElastic() (line 705). With concurrency up to 5, multiple boundedElastic worker threads execute HashMap.put on the same instance concurrently, which can cause lost entries, corrupted internal state, or (on older JDKs) infinite loops. Either switch the field to ConcurrentHashMap, run the offload step sequentially before the LLM calls (as Strategy 4 already does in summaryPreviousRoundMessages), or wrap the put in a synchronized(offloadContext) block.


// Trigger compression asynchronously, then rebuild input messages
return autoContextMemory
.compressIfNeededAsync()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[major] The previous implementation explicitly applied .subscribeOn(Schedulers.boundedElastic()) so that all the synchronous CPU-bound work inside compressIfNeededAsync (TokenCounterUtil.calculateToken, offloadingLargePayload, ArrayList copies, etc.) ran on an elastic worker. After this refactor compressIfNeededAsync() itself does not apply any subscribeOn, and the hook also no longer applies one. If the hook chain is subscribed from a non-blocking thread (Netty event loop, Reactor parallel scheduler used by the agent runtime) the synchronous parts of the strategy chain will execute on that caller thread. Please re-add .subscribeOn(Schedulers.boundedElastic()) either inside compressIfNeededAsync() or here at the hook entry to preserve the original threading guarantee.

*/
@Deprecated
public boolean compressIfNeeded() {
Boolean result = compressIfNeededAsync().block(java.time.Duration.ofSeconds(60));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[major] The deprecated sync wrapper now calls compressIfNeededAsync().block(Duration.ofSeconds(60)). Two regressions versus the prior synchronous implementation: (1) Reactor's .block() throws IllegalStateException when invoked from a non-blocking scheduler thread (e.g., Schedulers.parallel()), so legacy callers that previously executed inline will now fail at runtime in those contexts. (2) The 60s timeout is hard-coded; on slow LLMs strategy 4/5 (which can fan out up to 5 LLM calls) may exceed it, returning false even though compression is still in flight. Consider either making the timeout configurable, propagating it from AutoContextConfig, or removing the timeout entirely (block() without argument) for behavioral parity.

candidate ->
summaryPreviousRoundConversationAsync(
candidate.messagesToSummarize(), candidate.uuid())
.filter(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[major] Strategy 4 silently filters out summaries whose text content is empty/whitespace, but the corresponding offload(uuid, messagesToSummarize) for that round has already been written to offloadContext earlier (line ~1358). When the filter drops a summary, the offload entry becomes orphaned (memory leak — never reachable via any rendered offload tag) and there is no log indicating the failure. Add a doOnDiscard/switchIfEmpty log, or call clear(uuid) for filtered-out candidates so the offload map does not grow unbounded on bad LLM responses.

.getChatUsage()
.getTime());
}
return new Object[] {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[major] Strategy 4 emits new Object[]{candidate, summaryMsg, metadata} and casts back via raw (Object[]) and (SummaryCandidate) at lines 1419-1429. This loses type safety, requires @SuppressWarnings("unchecked"), and is inconsistent with Strategy 5 (line 709) which uses reactor.util.function.Tuples.of(...). Recommend using Tuples.of(candidate, summaryMsg, metadata) (a typed Tuple3) or a small private record (you already use one for SummaryCandidate).

applied -> {
if (!applied)
log.warn(
"All compression strategies exhausted but context still"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Misleading log: when all strategies are disabled by config, the chain stays at Mono.just(false) and this doOnNext logs "All compression strategies exhausted but context still exceeds threshold". "Exhausted" implies strategies ran and failed; in this case nothing ran. Distinguish disabled-everything from genuine exhaustion by checking whether any strategy was enabled before logging at WARN level (the disabled case can stay at DEBUG/INFO).

// Step 4: Run summaries concurrently (max 5 in-flight), collect all results
int concurrency = Math.min(candidates.size(), 5);

return Flux.fromIterable(candidates)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Atomicity regression vs. the prior implementation. The old sequential code mutated rawMessages after each successful round summary, so a failure in round N still preserved the work done for rounds N+1..N+M. The new Flux.collectList() is all-or-nothing: if any one of the up to 5 concurrent summary Monos signals an error, no rawMessages mutations are applied — but every offload was already inserted into offloadContext before the Flux subscribed. Consider either applying replacements per-round inside flatMap (with proper synchronization) or using onErrorContinue/onErrorResume per candidate so a single LLM failure does not waste the work of the other rounds.

MsgUtils.calculateMessageCharCount(msg),
uuid);
})
.subscribeOn(reactor.core.scheduler.Schedulers.boundedElastic())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Fully-qualified reactor.core.scheduler.Schedulers.boundedElastic() and reactor.util.function.Tuples.of(...) are used inline instead of importing them. The original file already imported reactor.core.publisher.Flux and reactor.core.publisher.Mono; please add import reactor.core.scheduler.Schedulers; and import reactor.util.function.Tuples; (and Tuple2 if needed) for consistency with the rest of the codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ext/memory Memory/session extension implementations enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants