feat(binding-mcp): mcp · proxy cache option (#1737)#1774
Conversation
Adds the test-first scaffold for the upcoming mcp cache binding: McpCacheIT skeleton with all 22 planned test methods (Group A warmup tests active, Groups B/C/D/F/G/I marked @ignore until their scripts land), four IT zilla.yaml configs, schema patch entries for kind: cache and options.warmup/ttl, and six fully-written Group A warmup .rpt scripts (lifecycle, tools/list, resources/list, prompts/list, lifecycle-persists, guarded-credentials). https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Adds the missing client.rpt for each Group A cache warmup scenario and a CacheIT class in the spec project that runs every script pair peer-to-peer without Zilla. Verifies the scripts are self-consistent before any cache binding implementation exists. Verified locally: ./mvnw -pl specs/binding-mcp.spec verify -Dit.test=CacheIT runs all 6 tests to green. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…pts (#1737) Splits the monolithic CacheIT/McpCacheIT into per-group classes (*WarmupIT, *ListIT, etc.) to keep each group focused as the script set grows. Group A (warmup) renamed to *WarmupIT. Adds Group B (list operations served from cache) — 4 scenarios: agent initialize, tools/list, resources/list, prompts/list. Each scenario carries paired client.rpt (agent at app0) and server.rpt (cache facade) so CacheListIT can verify the agent↔cache contract peer-to-peer. McpCacheListIT pairs the agent client.rpt with the Group A warmup server.rpt at app1 so the cache is populated before the agent's list arrives. B5 (list-before-warmup) stays @ignored with a script TODO. Verified locally: CacheWarmupIT 6/6 + CacheListIT 4/4 green peer-to-peer. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…#1737) The cache binding is one hop above the proxy; proxy fanout across exits is already covered by McpProxyIT (shouldList*WithToolkitMulti). The only cache-specific concern in the original Group D was resilience to a downstream error during warmup, which is more naturally a Group A (warmup) scenario than a fanout one. Adds cache.warmup.session.downstream.error: downstream ABORTs the tools/list stream during warmup; the lifecycle session must survive so the cache can continue with other list types or retry later. CacheWarmupIT 7/7 green peer-to-peer. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Renames/splits the CacheWarmupIT and CacheListIT pairs (peer-to-peer
and engine-driven) into method-scoped ITs aligned with the issue's
Responsibilities-by-MCP-method table:
CacheLifecycleIT — warmup session open/persist/error/guard,
agent initialize-from-cache
CacheToolsListIT — tools/list warmup + served-from-cache
CacheResourcesListIT — resources/list warmup + served-from-cache
CachePromptsListIT — prompts/list warmup + served-from-cache
Same applies to the McpCache* counterparts. No script files moved;
only the IT-class references changed. All 11 peer-to-peer tests
verified green.
Remaining roster slots (CacheToolsCallIT/ResourcesReadIT/PromptsGetIT
for pass-through invocations, and the per-list-method store/refresh
coverage) will be added as their scripts come online.
https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Replaces the separate kind: cache binding with options.cache on the
existing kind: proxy binding. Since all original two-binding topologies
have an exact equivalent in the folded model, no expressiveness is lost
and the engine "kind: cache" prerequisite goes away.
Also:
- Renames "warmup" → "hydrate" throughout (more precise terminology
for cache population).
- Flattens the warmup wrapper: authorization/store/ttl now live
directly under options.cache instead of options.cache.warmup.
- Drops the guarded vs. unguarded test split — the .rpt scripts don't
observe authorization on BEGIN so the scenarios produce identical
transcripts.
- Renames IT classes: CacheXIT → ProxyCacheXIT and McpCacheXIT →
McpProxyCacheXIT to align with the existing McpProxyIT neighbour.
- Configs rewritten: cache.yaml/cache.multi.yaml/cache.refresh.yaml
→ proxy.cache.yaml/proxy.cache.multi.yaml/proxy.cache.refresh.yaml;
each declares kind: proxy with options.cache and a stores: memory0
reference.
- Schema patch: drop "cache" from kind enum; replace flat
options.warmup/options.ttl with options.cache {store, ttl,
authorization}; store is required.
Verified: 10/10 ProxyCache*IT peer-to-peer tests pass.
https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Renames scenarios and test methods to use "hydrate" / "serve" as verbs rather than as noun-modifiers. Examples: cache.hydrate.session.tools.list → cache.hydrate.tools cache.agent.tools.list.from.cache → cache.serve.tools.list shouldPopulateToolsViaHydrate → shouldHydrateTools shouldServeAgentToolsListFromCache → shouldServeToolsList shouldKeepHydrateSessionOpenAfter... → shouldHydratePersist Verified: 10/10 ProxyCache*IT peer-to-peer tests pass. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Adds per-method refresh scenarios that model the cache re-issuing a list call on the same hydrate session after a TTL elapses, plus a refresh-error case where the refresh attempt aborts and the cache must retain its prior cached entry. Scenarios: cache.refresh.tools / cache.refresh.resources / cache.refresh.prompts cache.refresh.tools.error Tests: ProxyCacheToolsListIT.shouldRefreshTools ProxyCacheToolsListIT.shouldRefreshToolsError ProxyCacheResourcesListIT.shouldRefreshResources ProxyCachePromptsListIT.shouldRefreshPrompts (engine-driven counterparts added too) Also renames cache.hydrate.downstream.error -> cache.hydrate.error (and shouldHydrateDownstreamError -> shouldHydrateError) for the single-qualifier convention. Lease-contention coverage (cache.refresh.tools.contended) is deferred: the lease behavior is store-level, only meaningfully testable engine-driven via either a TestStore seeding hook or a multi-worker EngineRule. Both are downstream work from the cache binding implementation; the wire-level refresh tests above cover the protocol shape. Verified: 14/14 ProxyCache*IT peer-to-peer tests pass. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Adds three scenarios where the agent's list request arrives while the cache is still hydrating. The cache facade holds the request read in full, then waits for a per-method hydrate barrier to fire before writing the cached response. True timing isn't observable peer-to-peer, but the scripts document the required wire ordering for engine-driven tests to enforce. Scenarios + tests: cache.serve.tools.list.hydrating shouldServeToolsListHydrating cache.serve.resources.list.hydrating shouldServeResourcesListHydrating cache.serve.prompts.list.hydrating shouldServePromptsListHydrating Verified: 17/17 ProxyCache*IT peer-to-peer tests pass. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Adds TestStoreOptionsConfig with a Map<String,String> entries
field, exposed via the standard options-config builder/adapter
pattern, and wires it through TestStoreContext so each new
TestStoreHandler is pre-populated with the configured entries.
Enables tests to set up store state declaratively before a binding
that uses the store begins operating — e.g., seeding a lease lock
key so a binding observes a held lease and exercises its
already-locked code path deterministically.
Example:
stores:
memory0:
type: test
options:
entries:
tools.lock: "worker-0"
Verified: engine unit tests (320/320), engine.spec ITs (39/39),
and binding-mcp.spec ProxyCache*IT (17/17) all pass.
https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Adds proxy.cache.contended.yaml — a TestStore configured with tools.lock pre-seeded to a foreign worker id — and a corresponding McpProxyCacheToolsListIT.shouldRefreshToolsContended test method (currently @ignore'd until the cache binding implementation lands). When enabled, the test verifies that the cache binding consults the lease before issuing a refresh tools/list: with the lock held in the store, putIfAbsent returns the seeded value and the refresh path is skipped. The downstream server.rpt models only the hydrate exchange, so any spurious refresh tools/list would hit an unmatched stream and fail the test. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…#1737) Replaces the TestStore-seeded contention approach with a more faithful multi-worker engine test: - cache.refresh.tools.contended/{client,server}.rpt — models two hydrate sessions, exactly two tools/list calls on the wire (one initial hydrate by lease-winner, one refresh by lease-winner). Second worker's lifecycle is observed but its tools/list never hits the wire because the lease was lost. - McpProxyCacheContentionIT — new engine-driven IT class configured with ENGINE_WORKERS=2 so both cache binding instances genuinely race for the hydrate / refresh leases against the shared store-memory. @ignore'd until cache binding implementation lands. - ProxyCacheToolsListIT.shouldRefreshToolsContended — peer-to-peer counterpart added to the existing list IT. - proxy.cache.contended.yaml dropped — no longer needed (the test uses proxy.cache.refresh.yaml which already references store-memory). Verified: 18/18 ProxyCache*IT peer-to-peer tests pass (was 17, added shouldRefreshToolsContended). https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
| // Multi-worker cache contention tests. Configured with workers=2 so each | ||
| // worker instantiates its own cache binding and the two race for the | ||
| // hydrate / refresh leases backed by the shared store. The wire pattern | ||
| // observable at the downstream is two lifecycle sessions but only the | ||
| // lease-winner issues each list call. |
There was a problem hiding this comment.
Remove these type of comments from the ITs.
The scenarios and scripts should be the documentation.
There was a problem hiding this comment.
proxy.cache.toolkit.yaml is sufficient here, even though it has multiple toolkit routes.
| routes: | ||
| - exit: app1 |
There was a problem hiding this comment.
| routes: | |
| - exit: app1 | |
| exit: app1 |
| # Downstream errors out during hydrate tools/list. Lifecycle session must | ||
| # survive so the cache can retry later or continue with other list types. | ||
|
|
There was a problem hiding this comment.
Remove these kinds of comments from the spec scripts.
- proxy.cache.yaml / proxy.cache.refresh.yaml: use binding-level exit instead of single-element routes for the single-exit case. - proxy.cache.multi.yaml → proxy.cache.toolkit.yaml. - Remove explanatory # comments from scripts; scenarios + script bodies are the documentation. - Remove class-level comment from McpProxyCacheContentionIT; same reasoning. Verified: 18/18 ProxyCache*IT peer-to-peer tests pass. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…tentionIT The static import of ENGINE_WORKERS was placed after java/org statics with a blank-line separator. Per the project convention static imports go in a single block alphabetically sorted by full path, so io.aklivity.* comes first. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
The engine-driven McpProxyCache*IT tests fail at engine bring-up because the proxy binding does not yet recognize options.cache. This is the expected test-first state, but CI cannot distinguish "expected failure until impl" from "regression". Add class-level @ignore("TODO: enable when proxy cache option lands") to all 5 McpProxyCache*IT classes: McpProxyCacheLifecycleIT, McpProxyCacheToolsListIT, McpProxyCacheResourcesListIT, McpProxyCachePromptsListIT, McpProxyCacheContentionIT. The peer-to-peer ProxyCache*IT tests in specs/binding-mcp.spec remain active and green — they verify script self-consistency independent of the binding implementation. Verified: runtime/binding-mcp clean verify → BUILD SUCCESS, 146 tests run, 5 skipped (the ignored cache ITs). https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
First step of the proxy cache implementation: add the config-layer types that map to options.cache in the schema. No runtime behavior yet — McpProxyFactory still ignores the parsed config. - McpCacheConfig: immutable POJO with store, per-method ttl, authorization map (guard-name → credentials) - McpCacheConfigBuilder: fluent builder mirroring the existing McpAuthorizationConfigBuilder pattern - McpOptionsConfig: add cache field + 4-arg constructor - McpOptionsConfigBuilder: add cache() method (nested-builder pattern matching authorization()) - McpOptionsConfigAdapter: serialize/deserialize the cache block with ttl + per-guard authorization credentials Verified: binding-mcp tests pass, checkstyle clean, binding-mcp.spec SchemaTest still green. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…1737) Phase B of the proxy cache implementation. When options.cache is present, attach() now: - resolves the unconditional exit route - allocates a HydrateSession (initialId/replyId etc.) - schedules a signaler tick that fires immediately - the tick handler issues a lifecycle BEGIN downstream with sessionId="hydrate-1", modelled after KafkaGrpcRemoteServerFactory The session sends reply WINDOW on receipt of the downstream's BEGIN reply, and END on detach. No list-method enumeration yet; tools/ resources/prompts hydrate, store integration, lease coordination, and refresh land in later phases. McpProxyCacheLifecycleIT remains @ignore'd until all four scenarios in the class pass; verified locally that the shouldHydrate scenario itself now runs to green when temporarily un-Ignored. Also adds store-memory test dependency (the cache configs reference type: memory for their backing store). https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Phase C — after the hydrate lifecycle reply arrives, the cache now chains tools/list, resources/list, prompts/list as three sequential sub-streams on the same hydrate session. Each list stream sends BEGIN+END (close write), receives BEGIN reply, DATA and END, then signals the parent HydrateSession to start the next. Response bodies are discarded for now; store integration arrives in Phase D. McpProxyCacheLifecycleIT now passes all 4 tests with the engine (shouldHydrate, shouldHydratePersist, shouldHydrateError, shouldServeInitialize) — class-level @ignore removed. McpProxyCache{ToolsList,ResourcesList,PromptsList,Contention}IT remain @ignore'd until serve-from-cache / refresh / lease land. Verified: ./mvnw -pl runtime/binding-mcp clean verify → 149 pass, 4 ITs skipped (the still-Ignored cache list / contention ITs). https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Phase D — resolve the configured store at attach() time via
context.supplyStore(resolveId.applyAsLong(cache.store)), thread the
handle through HydrateSession into each HydrateListStream, and on
each list reply END write the accumulated body to the store under
the per-method key ("tools" / "resources" / "prompts") with no
expiry (Long.MAX_VALUE).
Response bodies are buffered byte-by-byte into a per-list-stream
byte[] that grows as needed. On END the accumulated bytes are
decoded as UTF-8 and pushed to the StoreHandler via put().
No serve-from-cache yet — agent list requests still pass through
to the existing proxy code path. Phase E will intercept them and
respond from the store.
Verified: ./mvnw -pl runtime/binding-mcp clean verify → 149 pass,
4 list/contention ITs still @ignore'd.
https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Adds McpCacheListServer that intercepts tools/resources/prompts list streams when options.cache is configured on a proxy binding, looks up the cached envelope via StoreHandler.get(), and emits the cached bytes as DATA followed by END without forwarding to upstream. Un-ignores the hydrate + serve tests across the per-kind cache IT classes; periodic refresh and hydrating-wait tests remain ignored pending later phases.
…lel (#1737) Move the per-kind store-key plumbing out of HydrateSession into a new McpListCache attached to McpBindingConfig, dropping the redundant cacheStores map. HydrateSession becomes a populator only; the cache (backed by StoreHandler) is the source of truth for "is kind X ready?" via an async get. With per-kind status independent of session sequencing, the three list-stream round-trips dispatch on the same worker tick - wall-clock hydration drops to a single round-trip, and an error on one kind no longer delays the others or blocks a retry on reconnect. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
| private static final String MCP_TYPE_NAME = "mcp"; | ||
|
|
||
| private static final int SIGNAL_INITIATE_HYDRATE = 1; | ||
| private static final String HYDRATE_SESSION_ID = "hydrate-1"; |
There was a problem hiding this comment.
This should not be hardcoded in the implementation.
Instead it should use a session id supplier, same as other binding types, and override to fix the value for the ITs that need a deterministic value.
| } | ||
| } | ||
|
|
||
| private final class HydrateSession |
There was a problem hiding this comment.
| private final class HydrateSession | |
| private final class McpHydrateSession |
- Rename `HydrateSession` to `McpHydrateSession` to follow the binding's type-prefixed inner class convention. - Make `McpOptionsConfig` constructor package-private; construction is via `McpOptionsConfig.builder()`. - Replace the hardcoded `"hydrate-1"` constant in `McpProxyFactory` with a `Supplier<String>` obtained from `McpConfiguration.sessionIdSupplier()`. Cache ITs configure `MCP_SESSION_ID_NAME` to a static method returning `"hydrate-1"`, mirroring the `McpServerIT` override pattern. - Replace the three flat `Duration ttlTools/ttlResources/ttlPrompts` fields on `McpCacheConfig` with a nested `McpCacheTtlConfig` (built via `McpCacheConfigBuilder.ttl()` returning `McpCacheTtlConfigBuilder`). `McpOptionsConfigAdapter` serializes and deserializes the nested form. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…ngConfig Replace the two factory-level maps on `McpProxyFactory` (`sessions: Map<String, McpLifecycleServer>` shared across all bindings, and `hydrateSessions: Long2ObjectHashMap<McpHydrateSession>`) with per-binding fields on `McpBindingConfig`: - `sessions: Map<String, McpProxySession>` — now correctly scoped per binding rather than per worker. Session ids are only meaningful within one binding's namespace. - `hydrate: McpProxyHydrate` — the per-binding hydrate session. `McpProxySession` and `McpProxyHydrate` are package-visible interfaces declared in `internal.config` so `McpBindingConfig` can type the fields without depending on the inner classes inside `McpProxyFactory`. The inner classes implement the interfaces — no behavior change. `McpLifecycleServer` gains a constructor reference to its `McpBindingConfig` so `cleanup` can call `binding.sessions.remove(sessionId)`. This sets up the per-binding hook that the upcoming per-kind factory extraction will route state through. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Move McpLifecycleServer and McpLifecycleClient from McpProxyFactory into a new McpProxyLifecycleFactory as inner classes - same enclosing-instance pattern, just a smaller enclosing factory. McpLifecycleServer (and the cross-class accessors `sender`, `originId`, `routedId`, `sessionId`, `supplyClient`) is package-private so the still-inline call/list dispatch in McpProxyFactory can pattern-match against it. McpLifecycleClient is also package-private because McpClient and McpListClient hold typed references to it. Introduce `Int2ObjectHashMap<BindingHandler> factories` on McpProxyFactory and delegate KIND_LIFECYCLE to the new factory via the dispatch table; other kinds remain inline until subsequent phases extract them. Mirrors KafkaClientFactory's per-kind factory pattern. McpProxyFactory drops 560 lines. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…ompts/get, resources/read Introduce an abstract McpProxyItemFactory implementing BindingHandler plus three concrete per-kind subclasses (McpProxyToolsCallFactory, McpProxyPromptsGetFactory, McpProxyResourcesReadFactory). The base owns the shared stream state machine - the McpServer and McpClient inner classes move verbatim out of McpProxyFactory - and exposes three hooks for the kind-specific bits: protected abstract int kind(); protected abstract void injectInitialBeginEx(McpBeginExFW.Builder, String sid, String identifier); protected abstract void injectReplyBeginEx(McpBeginExFW.Builder, String sid, McpBeginExFW upstream); Each subclass implements ~30 lines: its KIND constant and the tools-call/prompts-get/resources-read variants of the BEGIN extension builder. Naming "Item" reflects that the three operations are different verbs (call/get/read) acting on a single identified MCP item - opposite to the List kinds. McpProxyFactory registers all three subclasses in its existing Int2ObjectHashMap<BindingHandler> factories map next to the Phase 2 McpProxyLifecycleFactory entry; the newStream dispatcher's else branch collapses to factories.get(kind).newStream(...). The local McpServer, McpClient, and rewriteReplyBeginEx are removed from McpProxyFactory along with three now-unused flyweight fields (flushRO, challengeRO, mcpChallengeExRW) and the McpChallengeExFW import. Net change: McpProxyFactory shrinks 735 lines (3071 -> 2336); McpProxyItemFactory adds 1229 lines. Each subclass receives the same (McpConfiguration, EngineContext, LongFunction<McpBindingConfig>) constructor signature as the Phase 2 lifecycle factory, and the per-factory do-helpers (doBegin, doData, doEnd, doAbort, doFlush, doChallenge, doReset, doWindow) are local copies rather than parent-shared - matches Phase 2 precedent. No visibility widening was needed beyond what Phase 2 already opened on McpLifecycleServer/McpLifecycleClient. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…e hydrate ITs
The parallel-hydrate path in McpHydrateSession.onBegin iterates over
[KIND_TOOLS_LIST, KIND_RESOURCES_LIST, KIND_PROMPTS_LIST] and dispatches
all three list streams on the same worker tick. The per-kind cache ITs
(McpProxyCache{Resources,Prompts}ListIT.shouldHydrate{Resources,Prompts})
each only `read` their own kind's BEGIN ext, so the other two streams
emit unexpected BEGINs that fail the script assertion. Updating each
script to accept three BEGINs would push parallel-hydrate setup into
tests whose intent is the single-kind hydrate behavior in isolation.
Add MCP_HYDRATE_KIND_FILTER (IntPredicate, default `k -> true`) to
McpConfiguration, exposed as `hydrateKindFilter()`. Mirrors the existing
MCP_SESSION_ID supplier pattern: tests resolve a `Class::method` static
reference; the decoder loads it via MethodHandle. Production keeps the
default — all three kinds hydrate in parallel — so the contention IT
exercises the real behavior unchanged. Each per-kind IT now configures
the filter to its single KIND_*_LIST so only that kind's BEGIN is
emitted and the existing per-kind scripts pass unmodified.
Result locally: 3 hydrate failures fixed
(McpProxyCacheToolsListIT had timed out, also resolved). Three
`shouldServe*` failures remain — cache-serve reads empty payload — and
are independent of hydrate dispatch.
https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…ompts/list, resources/list
Mirror the Phase 3 item-factory shape for the list slice. Introduce an
abstract McpProxyListFactory implementing BindingHandler plus three
concrete per-kind subclasses (McpProxyToolsListFactory,
McpProxyPromptsListFactory, McpProxyResourcesListFactory). The base
owns the shared state machine - McpListClient, McpListClientDecoder
with its ten decode states, McpListServer (passthrough), and
McpCacheListServer (cache-serve variant), plus the indexOfByte JSON
helper - and exposes seven hooks for the kind-specific bits:
protected abstract int kind();
protected abstract void injectInitialBeginEx(McpBeginExFW.Builder, String sid);
protected abstract void injectReplyBeginEx(McpBeginExFW.Builder, String sid);
protected abstract DirectBuffer listReplyOpenPrelude();
protected abstract JsonParserFactory listItemParserFactory();
protected abstract String arrayKey();
protected abstract String idKey();
Each subclass owns its per-kind prelude bytes (the JSON envelope-open
literal), JsonParserFactory for the streaming item parser, array key
("tools" / "prompts" / "resources"), and id key ("name" or "uri" for
resources). The kind field is removed from McpListClient, McpListServer,
and McpCacheListServer - each factory instance is bound to one kind, so
the seven `switch (kind)` blocks in those classes collapse to direct
hook calls.
Cache-vs-passthrough is now dispatched internally in
McpProxyListFactory.newStream: when binding.cache is non-null an
McpCacheListServer is constructed, otherwise an McpListServer with its
McpListClient via lifecycle.supplyClient. The McpProxyFactory factories
map gets three new entries (KIND_TOOLS_LIST / KIND_PROMPTS_LIST /
KIND_RESOURCES_LIST) and the newStream dispatcher collapses to
factories.get(kind).newStream(...) for every dispatched kind, including
lifecycle - the per-kind factories enforce their own session/route
preconditions.
HydrateListStream, McpHydrateSession, and SIGNAL_INITIATE_HYDRATE stay
in McpProxyFactory because they coordinate multi-kind hydrate from one
session and aren't request-time dispatch paths; the small kind-switch
in HydrateListStream.initiate persists.
McpProxyFactory shrinks 1777 lines (2343 -> 566), shedding all list
machinery plus six now-unused do* helpers (doBegin, doData, doAbort,
two doFlush overloads, doChallenge, doReset), unused flyweight RO/RW
fields and imports, and the no-longer-called sessionId(McpBeginExFW)
helper. doEnd, doWindow, and a local newStream remain because
HydrateListStream still calls them. McpProxyListFactory adds 2016 lines.
No further visibility widening was needed beyond what Phase 2 opened
on the lifecycle accessors.
ITs: 156 pass / 3 fail / 8 skipped, identical to the pre-refactor
baseline. The three `shouldServe*` failures are a pre-existing
cache-serve empty-payload bug unrelated to this refactor.
https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
The three shouldServe*List runtime ITs raced against hydrate: the agent sent its lifecycle BEGIN, McpProxyLifecycleFactory replied immediately (it has no hydrate awareness), the agent then sent its list BEGIN, and McpCacheListServer hit an empty cache because hydrate was still in flight - the agent received an empty payload. Make these ITs deterministic by pre-seeding the cache instead of racing against hydrate. Mid-hydrate is a distinct scenario that lifecycle gating will cover later (the .hydrating scripts and @ignore'd shouldServe*ListHydrating methods are removed here; when lifecycle gating lands we'll re-introduce a clearer cache.serve.<kind>.refreshing flavor rather than overload "hydrating"). Changes: - New config proxy.cache.seeded.yaml using TestStore with options.entries carrying the tools/resources/prompts JSON. The existing proxy.cache.yaml is unchanged - hydrate ITs continue to verify an empty memory store. - Runtime shouldServe{Tools,Resources,Prompts}List ITs switch to proxy.cache.seeded.yaml and reuse the existing cache.hydrate/server script for the downstream (it only handles the hydrate-1 lifecycle accept + reply, which is exactly what a fully-cached binding produces on the wire when McpHydrateSession.onBegin sees every cache.get returning non-null and spawns no list streams). - Delete the three .hydrating script directories and the @ignore'd shouldServe*ListHydrating test methods (both runtime and peer ITs). Runtime: 156 pass / 0 fail / 5 skipped (was 156p / 3f / 8s). Peer: 15 pass / 0 fail / 0 skipped. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
When proxy.cache is configured, the agent's lifecycle BEGIN reply now waits until the hydrate session has settled every kind permitted by MCP_HYDRATE_KIND_FILTER. By the time an agent's lifecycle is "initialized" from its point of view, every list method served by this binding is in the cache - subsequent tools/list, resources/list, prompts/list requests get a synchronous cache hit and never race the hydrate path. Bindings with no cache configured retain the previous synchronous BEGIN-reply behavior (binding.hydrate == null branch). Mechanism uses the engine's existing signal infrastructure: - McpHydrateSession tracks totalKinds (computed from the filter) and settledKinds. Each kind settles either from cache.get returning non-null at hydrate startup (pre-seeded store) or from the matching HydrateListStream's terminal write (cache.put completion, bodyLen==0 skip-put, or abort/reset). Once settledKinds == totalKinds, the session flips the complete latch and fires signaler.signalNow against every pending agent stream registered via awaitComplete. - McpProxyHydrate gains awaitComplete(originId, routedId, streamId, traceId, signalId). When complete is already true the signal fires immediately (synchronous fast path for the pre-seeded case); otherwise the (streamId, signalId) pair is queued. cleanup discards pending before tearing the session down. - McpProxyLifecycleFactory.McpLifecycleServer.onServerBegin now transitions initial state and emits WINDOW synchronously, then delegates the BEGIN reply via binding.hydrate.awaitComplete using the agent's reply-id as the signal target (the engine registers the binding's stream consumer in throttles[replyId], so SignalFW for that stream id routes back to onServerMessage). A new SignalFW case dispatches to onServerSignal, which fires doDeferredServerBegin for SIGNAL_HYDRATE_COMPLETE. The deferred path guards on replyOpened/replyClosed in case the agent abandoned before complete fired, and rebuilds beginEx inside the deferred path so the shared codecBuffer is fresh at fire time. - HydrateListStream now holds a back-reference to its parent McpHydrateSession (passed via the constructor) and a settled flag to dedupe completion when both onEnd and abort/reset fire on the same stream. McpProxyCacheLifecycleIT.shouldServeInitialize switches to proxy.cache.seeded.yaml: an empty memory store would block forever because the hydrate session's three list streams find no cooperating downstream in cache.hydrate/server. With the pre-seeded test store the hydrate session sees every cache.get return non-null synchronously, the complete latch flips before the agent's lifecycle arrives, and the deferred BEGIN reply takes the synchronous fast path. Runtime: 156 pass / 0 fail / 5 skipped (was 156p / 3f-then-0f after gating with the wrong streamId). https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
McpHydrateSession now uses signaler.signalAt to re-issue list-kind
hydrate streams when the per-kind TTL elapses. Scheduling fires:
- After each kind's initial hydrate completes (HydrateListStream
reaches its terminal: cache.put completion, bodyLen==0 skip-put, or
abort/reset — same gating dedupe applies, so each instance schedules
at most one next refresh).
- After each cache.get returns non-null at startup (pre-seeded value
still refreshes on TTL, so the value stays fresh).
- After each refresh stream's terminal (recursive, same code path).
Each kind has its own signal id (SIGNAL_REFRESH_{TOOLS,RESOURCES,
PROMPTS}) and TTL (McpCacheTtlConfig.tools / .resources / .prompts).
When ttlForKind returns null (no cache.ttl block, or the block omits
that kind), no refresh is scheduled - the cache value stays forever.
On abort/reset of an in-flight refresh, the cache.put never fires, so
the prior cached value is preserved (the test that validates this
behaviour is now un-ignored as shouldRefreshToolsError).
McpHydrateSession.settle takes the kind explicitly and both contributes
to the initial-hydrate gating latch and schedules the next refresh.
HydrateListStream forwards its kind into parent.settle(kind) - no
behavioural change to the gating dedupe.
McpProxyCache{Tools,Resources,Prompts}ListIT.shouldRefresh* and
shouldRefreshToolsError no longer @ignore - now run against
proxy.cache.refresh.yaml (ttl tools=PT1S, resources=PT2S, prompts=PT3S)
via the existing cache.refresh.<kind> / cache.refresh.<kind>.error
spec scripts. Verified: 156 pass / 0 fail / 1 skipped (Contention IT
class-level @ignore for lease coordination remains).
https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…ownstream
When options.cache.authorization is configured, McpProxyFactory.attach
resolves the single configured guard via context.supplyGuard and calls
guard.reauthorize(traceId, binding.id, 0L, credentials) once to mint a
session token. The McpHydrateSession and HydrateListStream now stamp
that token onto every outbound BEGIN/END/WINDOW in place of the prior
hardcoded 0L authorization, so any downstream binding that runs a guard
recognises the cache hydrate session as an authenticated principal.
The map is constrained to a single guard entry by the schema patch
(maxProperties: 1 under cache.authorization); the runtime picks that
entry via iterator without defensive assertion, since invalid yaml is
rejected at config-load time.
Token freshness is left to the guard for now - the binding does not
re-reauthorize per BEGIN. If a future guard implementation needs
short-lived tokens, the call site is a single line and can move to
per-stream resolution without changing the API shape.
New test scaffolding:
- specs/.../config/proxy.cache.auth.yaml - engine test_guard +
test store pre-seeded with empty list payloads (so hydrate spawns
no list streams; the authorization assertion lands on the lifecycle
BEGIN alone) + proxy declaring cache.authorization.test_guard with
matching credentials.
- specs/.../streams/application/cache.hydrate.auth/{client,server}.rpt
mirrors cache.hydrate/ but adds option zilla:authorization 1L (the
TestGuard's first reauthorize result) to assert the proxy stamps the
expected long on its hydrate BEGIN.
- shouldHydrateAuth peer IT in ProxyCacheLifecycleIT and engine-driven
IT in McpProxyCacheLifecycleIT.
Verified: spec ProxyCache*IT 16 pass / 0 fail; runtime binding-mcp
157 pass / 0 fail / 1 skipped (was 156p/0f/1s; +1 shouldHydrateAuth).
McpProxyCacheContentionIT remains the only @ignore'd test pending
lease coordination.
https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
When two workers attach the same proxy-cache binding, both would otherwise race to issue every list call on the wire. Coordinate via the configured store's putIfAbsent so exactly one worker issues each call. Two leases, one shared key namespace on the configured store: - Binding-wide "lifecycle.lock": acquired before opening the hydrate lifecycle stream to downstream. Loser polls every 100ms until the lock frees; winner holds it through the initial hydrate. Released inside markComplete once every kind permitted by the filter has settled, so other workers can then open their own lifecycle (independent stream onward) and observe the now-populated cache. - Per-kind "<kind>.lock": acquired before issuing a HydrateListStream (both initial and refresh paths). Loser at initial-hydrate time markSettled-s without scheduling a refresh - the lease holder owns refresh scheduling because settle (gated by the per-stream settled flag) is the only path that calls scheduleRefresh. Released after cache.put completes (or after onEnd's skip-put branch, or after an abort/reset terminal) and before parent.settle fires. settle is split into markSettled (gating-latch only) and settle (markSettled + scheduleRefresh). cache hits at startup, lease losses at initial hydrate, and lease losses at refresh-time all funnel through markSettled so they contribute to lifecycle gating without arming a refresh signal on a worker that didn't actually do the work. McpProxyCacheContentionIT (was class-level @ignore'd) is now enabled, configured with MCP_HYDRATE_KIND_FILTER restricting to KIND_TOOLS_LIST (since cache.refresh.tools.contended/server.rpt only models the tools exchanges; resources and prompts are not in scope for this scenario). hydrateSessionId now cycles "hydrate-A"/"hydrate-B" per call so the two workers' attach-time calls match the script's session ids. Lease TTLs: - lifecycle: 30s (safety net for crashes; explicit release covers the happy path). - per-kind: 30s ditto. - Retry polling for lifecycle: 100ms. Verified: 157 pass / 0 fail / 0 skipped. Every test that was @ignore'd at the start of the PR is now active and green. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Type-prefixed inner class naming convention matches the rest of the file (McpServer, McpClient, McpListServer, McpCacheListServer, McpLifecycleServer, McpLifecycleClient, McpHydrateSession). https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
TestStoreHandler previously held its own HashMap<String, String> per-handler, so workers running in the same engine each saw an isolated copy of the configured entries (putIfAbsent from one worker was invisible to another). Lease coordination in binding-mcp's cache hydrate path relies on cross-worker putIfAbsent visibility, so move the entries map up one level: TestStore now owns a ConcurrentMap<Long, ConcurrentMap<String, String>> keyed by storeId, and TestStoreContext.attach hands out a reference to the per-storeId shared map to each worker's TestStoreHandler. Seeding via options.entries uses putIfAbsent so the first worker's attach is the one that loads the map; subsequent attaches see the existing entries and skip. No TTL behaviour was added — production binding code releases leases explicitly, and the existing skip-TTL semantics is sufficient for the cache test scenarios. binding-mcp switches its three remaining memory-store configs (proxy.cache.yaml, proxy.cache.refresh.yaml, proxy.cache.toolkit.yaml) to type: test, matching the dependency hygiene rule that implementations must not depend on other implementations - the cache ITs now run entirely against the engine's test-jar store. The store-memory test dependency drops out of runtime/binding-mcp/pom.xml. Verified: 157 pass / 0 fail / 0 skipped (all cache ITs, McpServerIT, McpClientIT, McpProxyIT, McpProxyCacheContentionIT) and the engine spec IT suite (39 / 0 / 0) — no regression in either project from the TestStore sharing change. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
# Conflicts: # runtime/binding-mcp/src/main/java/io/aklivity/zilla/runtime/binding/mcp/internal/McpConfiguration.java # runtime/binding-mcp/src/test/java/io/aklivity/zilla/runtime/binding/mcp/internal/McpConfigurationTest.java
McpProxyFactory had grown to ~566 lines absorbing the proxy-cache feature; the actual dispatch (parse BEGIN ext, look up by kind, delegate to a per-kind factory) was a small slice of that. The rest - McpHydrateSession, McpHydrateListStream, the SIGNAL_INITIATE_HYDRATE / SIGNAL_REFRESH_* / LEASE_TTL_MS / LEASE_RETRY_MS constants, the PendingAwait record, the local newStream/doEnd/doWindow helpers, the cache attach wiring (store resolve, McpListCache instantiate, cacheGuard reauthorize, McpHydrateSession construct, schedule SIGNAL_INITIATE_HYDRATE), and the matching detach cleanup - is now owned by a new sibling McpCacheHydrater. McpCacheHydrater exposes attach(McpBindingConfig) and detach(McpBindingConfig) mirroring the BindingHandler lifecycle the engine drives. McpProxyFactory constructs one in its ctor and delegates both calls. McpProxyFactory drops from 795 lines to 136 (a pure dispatcher now): keeps the factories map and its seven per-kind registrations, the bindings map, supplyGuard for McpBindingConfig construction, mcpBeginExRO/beginRO for parsing the inbound BEGIN to extract the kind, and the new hydrater field. McpCacheHydrater is 724 lines holding everything cache-hydrate: signal ids, lease ttls/retry, PendingAwait record, McpHydrateSession (still implementing McpProxyHydrate so McpProxyLifecycleFactory can defer the agent BEGIN reply via awaitComplete), McpHydrateListStream (settle / cache.put / lease release path unchanged), and per-worker flyweights duplicated for the hydrate stream-write helpers. The hydrate session keeps its own codecBuffer (separate UnsafeBuffer allocated against writeBuffer.capacity), matching the per-handler flyweight pattern. McpBindingConfig gains one accessor: long resolveId(String name) plus the corresponding private ToLongFunction<String> field initialized from binding.resolveId. Needed because McpCacheHydrater.attach receives only the wrapped McpBindingConfig and must resolve the cache store name to a binding id at attach time. Cleaner than passing the raw BindingConfig alongside. Verified: 158 pass / 0 fail / 0 skipped (identical to pre-refactor baseline). Checkstyle 0 violations. JaCoCo coverage met (bundle grew 116 -> 117 classes for the new McpCacheHydrater). https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
| if (binding.options != null && binding.options.cache != null) | ||
| { | ||
| final long storeId = binding.resolveId(binding.options.cache.store); | ||
| final StoreHandler store = supplyStore.apply(storeId); | ||
| binding.cache = new McpListCache(store); |
There was a problem hiding this comment.
I think this initialization logic should belong in the McpBindingConfig constructor, to initialize cache and then this code block can be gated on if (binding.cache != null) instead.
| private record PendingAwait( | ||
| long originId, | ||
| long routedId, | ||
| long streamId, | ||
| long traceId, | ||
| int signalId) | ||
| { | ||
| } |
There was a problem hiding this comment.
| private record PendingAwait( | |
| long originId, | |
| long routedId, | |
| long streamId, | |
| long traceId, | |
| int signalId) | |
| { | |
| } | |
| private record McpSignalHandle( | |
| long originId, | |
| long routedId, | |
| long streamId, | |
| long traceId, | |
| int signalId) | |
| { | |
| } |
Perhaps have a convenience method to invoke the signaler, passing all the record fields to signaler.signalNow?
| { | ||
| if (acquired) | ||
| { | ||
| doLifecycleBegin(traceId); | ||
| } | ||
| else | ||
| { | ||
| signaler.signalAt(currentTimeMillis() + LEASE_RETRY_MS, SIGNAL_INITIATE_HYDRATE, | ||
| this::onInitiateSignal); | ||
| } | ||
| }); |
There was a problem hiding this comment.
This can move to a method called onAcquireLifecycleLeaseComplete, passed to cache.acquireLifecycleLease.
Note: the traceId can be moved into the method body.
| { | ||
| if (McpState.initialOpening(state)) | ||
| { | ||
| return; |
There was a problem hiding this comment.
Avoid multiple returns, instead conditional surrounds remainder of body with inverted condition.
| for (int kind : new int[] { KIND_TOOLS_LIST, KIND_RESOURCES_LIST, KIND_PROMPTS_LIST }) | ||
| { | ||
| if (hydrateKindFilter.test(kind)) | ||
| { | ||
| filtered++; | ||
| } | ||
| } | ||
| totalKinds = filtered; |
There was a problem hiding this comment.
This approach seems overly complex.
Perhaps it would simplify If we had separate signal handlers for refreshing each different kind, and initially populating the cache is treated as just the first refresh.
Perhaps we can move the signaling to mark complete into the McpListCache instead, and then McpCacheHydrater can simplify to be per kind only.
| { | ||
| doEnd(receiver, originId, routedId, initialId, initialSeq, initialAck, initialMax, | ||
| traceId, authorization); | ||
| state = McpState.closedInitial(state); | ||
| } |
There was a problem hiding this comment.
This should be a standard doInitialEnd method.
Address PR #1774 review comments: - #1 cache instantiation moves into McpBindingConfig ctor (supplyStore becomes a third constructor parameter, parallel to supplyGuard). - #2 PendingAwait -> McpSignalHandle as a standalone package-private record with signalVia(Signaler) convenience method; lives in its own file in the stream package. - #3 acquire-lifecycle-lease callback is a named method reference rather than an inline lambda; traceId is re-supplied inside the method body via supplyTraceId.getAsLong() instead of captured. - #4 doLifecycleBegin guard inverted to wrap the body, single return per method. - #5 separate signal handlers per kind; initial population is the first refresh. The binding-level orchestrator (McpProxyCacheHydrater) owns the lifecycle stream and tracks readiness via a populated counter rather than a totalKinds/settledKinds pair on a McpHydrateSession. Per-kind work moves into McpProxyCacheListHydrater (abstract base) with three concrete subclasses (McpProxyCacheToolsListHydrater / ResourcesListHydrater / PromptsListHydrater), mirroring the McpProxyListFactory hierarchy. - #6 cleanup's inline doEnd / state update is consolidated into a standard doInitialEnd helper on McpProxyCacheHydrater. Kind and signal id are passed via the abstract base's constructor as protected final fields rather than abstract methods - only the truly kind-specific behaviour (injectInitialBeginEx and ttl()) remains abstract. Each concrete subclass collapses to ~15 lines. McpProxyHydrate interface and the old McpCacheHydrater are deleted. McpProxyLifecycleFactory now calls binding.hydrater.register(handle) directly with an McpSignalHandle instance instead of going through an interface. McpListCache stays at its current public surface (get / put / acquireLease / releaseLease / acquireLifecycleLease / releaseLifecycleLease) - the readiness tracking lives on the orchestrator because it already knows the filtered kind set from hydrateKindFilter at attach time, so no expecting(kind) plumbing was needed. Files: - McpProxyHydrate.java deleted - McpCacheHydrater.java deleted (replaced) - McpProxyCacheHydrater.java new - per-binding orchestrator - McpProxyCacheListHydrater.java new - abstract base + 3 subclasses - McpSignalHandle.java new - McpBindingConfig.java - supplyStore ctor param, cache field initialised in ctor, hydrate field renamed to hydrater and retyped - McpProxyFactory.java - per-binding hydrater instantiation in attach - McpProxyLifecycleFactory.java - binding.hydrater.register instead of binding.hydrate.awaitComplete - McpClientFactory.java - 1-line ctor signature update for the now 3-arg McpBindingConfig ctor Verified: 158 pass / 0 fail / 0 skipped, identical to baseline. Checkstyle 0 violations. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
| } | ||
| } | ||
|
|
||
| final class McpProxyCacheToolsListHydrater extends McpProxyCacheListHydrater |
There was a problem hiding this comment.
Move these to top level classes.
| } | ||
| } | ||
|
|
||
| final class McpProxyToolsListFactory extends McpProxyListFactory |
There was a problem hiding this comment.
Move these to top level classes.
| } | ||
| } | ||
|
|
||
| final class McpProxyToolsCallFactory extends McpProxyItemFactory |
There was a problem hiding this comment.
Move these to top level classes.
Address PR #1774 review comments asking that each concrete per-kind subclass living as a package-private sibling next to its abstract base be moved to its own top-level file. Three files affected, nine new files created: McpProxyItemFactory.java (1229 -> 1124) loses three subclasses: - McpProxyToolsCallFactory.java - McpProxyPromptsGetFactory.java - McpProxyResourcesReadFactory.java McpProxyListFactory.java (2016 -> 1819) loses three subclasses: - McpProxyToolsListFactory.java - McpProxyPromptsListFactory.java - McpProxyResourcesListFactory.java McpProxyCacheListHydrater.java (329 -> 257) loses three subclasses: - McpProxyCacheToolsListHydrater.java - McpProxyCacheResourcesListHydrater.java - McpProxyCachePromptsListHydrater.java Each new file has the standard Aklivity Community License header, a minimal import set (alphabetical within groups), and contains only the final class declaration extending the base. Base files trim imports that were only used by the lifted subclasses (Map and StreamingJson out of McpProxyListFactory; KIND_TOOLS_LIST / KIND_RESOURCES_LIST / KIND_PROMPTS_LIST static imports out of McpProxyCacheListHydrater). No visibility widening was needed - subclasses already touched only protected abstract hooks on the bases or package-private fields on parent objects. Verified: 158 pass / 0 fail / 0 skipped. Checkstyle 0 violations. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Drop McpCacheTtlConfig and its three per-kind fields (tools, resources, prompts) in favour of a single Duration on McpCacheConfig. No real-world signal that the per-kind knobs are needed; can be reintroduced additively if/when a use case appears. Default lives in the JSON schema (PT5M), with the adapter mirroring it as fallback when ttl is omitted.
onRefreshAcquireLeaseComplete previously only acted on the lease winner; losers stopped refreshing forever, leaving the cluster dependent on a single worker's timer. Mirror the initial-acquire path: losers reschedule their refresh so every worker stays armed and re-attempts on the next tick.
Replace the (BindingConfig, LongFunction<GuardHandler>, LongFunction<StoreHandler>) ctor and its 1-arg sugar with a single (BindingConfig, EngineContext) ctor. McpBindingConfig pulls supplyGuard and supplyStore from the context directly, and the null-guards on the suppliers disappear because every caller now passes a real context. The supplier fields on McpProxyFactory and McpClientFactory go with it - they only existed to be passed through. Bind options.authorization and options.cache to locals so the long null-chains collapse.
| public static String hydrateSessionId() | ||
| { | ||
| return "hydrate-1"; | ||
| } |
There was a problem hiding this comment.
| public static String hydrateSessionId() | |
| { | |
| return "hydrate-1"; | |
| } | |
| public static String sessionId() | |
| { | |
| return "hydrate-1"; | |
| } |
| public static IntPredicate hydrateKindFilter() | ||
| { | ||
| return kind -> kind == KIND_TOOLS_LIST; | ||
| } |
There was a problem hiding this comment.
| public static IntPredicate hydrateKindFilter() | |
| { | |
| return kind -> kind == KIND_TOOLS_LIST; | |
| } | |
| public static IntPredicate hydrateToolsOnly() | |
| { | |
| return kind -> kind == KIND_TOOLS_LIST; | |
| } |
| public static final String MCP_SSE_KEEPALIVE_INTERVAL_NAME = "zilla.binding.mcp.sse.keepalive.interval"; | ||
| public static final String MCP_ALT_SVC_ENABLED_NAME = "zilla.binding.mcp.alt.svc.enabled"; | ||
| public static final String MCP_ALT_SVC_MAX_AGE_NAME = "zilla.binding.mcp.alt.svc.max.age"; | ||
| public static final String MCP_HYDRATE_KIND_FILTER_NAME = "zilla.binding.mcp.hydrate.kind.filter"; |
There was a problem hiding this comment.
| public static final String MCP_HYDRATE_KIND_FILTER_NAME = "zilla.binding.mcp.hydrate.kind.filter"; | |
| public static final String MCP_HYDRATE_FILTER_NAME = "zilla.binding.mcp.hydrate.filter"; |
| assertEquals(MCP_SSE_KEEPALIVE_INTERVAL.name(), MCP_SSE_KEEPALIVE_INTERVAL_NAME); | ||
| assertEquals(MCP_ALT_SVC_ENABLED.name(), MCP_ALT_SVC_ENABLED_NAME); | ||
| assertEquals(MCP_ALT_SVC_MAX_AGE.name(), MCP_ALT_SVC_MAX_AGE_NAME); | ||
| assertEquals(MCP_HYDRATE_KIND_FILTER.name(), MCP_HYDRATE_KIND_FILTER_NAME); |
There was a problem hiding this comment.
| assertEquals(MCP_HYDRATE_KIND_FILTER.name(), MCP_HYDRATE_KIND_FILTER_NAME); | |
| assertEquals(MCP_HYDRATE_FILTER.name(), MCP_HYDRATE_FILTER_NAME); |
| tools: '{"tools":[{"name": "get_weather","title": "Weather Information Provider","description": "Get current weather information for a location","inputSchema": {"type": "object","properties": {"location": {"type": "string","description": "City name or zip code"}},"required": ["location"]},"icons": [{"src": "https://example.com/weather-icon.png","mimeType": "image/png","sizes": ["48x48"]}],"execution": {"taskSupport": "optional"}}]}' | ||
| resources: '{"resources":[{"uri": "file:///docs/welcome.md","name": "welcome","description": "Welcome document","mimeType": "text/markdown"}]}' | ||
| prompts: '{"prompts":[{"name": "summarize","description": "Summarize a document"}]}' |
There was a problem hiding this comment.
Use multi-line yaml syntax to make this more readable.
- Rename MCP_HYDRATE_KIND_FILTER → MCP_HYDRATE_FILTER; expose lease ttl/retry as MCP_LEASE_TTL_MS / MCP_LEASE_RETRY_MS configuration properties (phases A, B) - Split McpListCache into per-kind instances; introduce McpLifecycleCache for lifecycle distributed lock; drop kind parameter from cache ops (phase C) - McpBindingConfig owns per-kind caches (toolsCache, resourcesCache, promptsCache, lifecycleCache) and the hydrater; Optional chains for guard/store resolution; McpProxyFactory attach/detach simplified (phases D, G) - Extract McpHydrateLifecycleStream inner class; guard-then-route ordering; complete End/Abort/Reset state gating; all fields final (phase E) - Decouple McpProxyCacheListHydrater from parent reference; pass EngineContext + LongSupplier supplyAuthorization directly; rename onMessage → onListHydrateMessage (phase F) - McpProxyLifecycleFactory: rename doDeferredServerBegin → doServerBeginDeferred; invert early-return guard (phase H) - McpProxyItemFactory: rename builder parameter b → builder in abstract signatures; update subclasses (phase I) - McpProxyListFactory: replace abstract kind() with final int kind field; make sessionId() abstract; move JsonParserFactory construction to base class constructor accepting List<String> pathIncludes; remove listItemParserFactory() abstract method (phases J, K) - IT helpers: extract sessionId() and hydrateXxxOnly() static helpers; update configure() calls to method-reference form (phase L) - proxy.cache.seeded.yaml: convert embedded JSON to YAML block scalars (phase M) https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
| public final McpListCache resourcesCache; | ||
| public final McpListCache promptsCache; | ||
| public final McpLifecycleCache lifecycleCache; | ||
| public Map<String, McpProxySession> sessions; |
There was a problem hiding this comment.
Can this be made final?
| McpListCache cache) | ||
| { | ||
| super(context, originId, routedId, supplyAuthorization, supplySessionId, onReady, | ||
| leaseTtlMs, cacheTtl, cache, SIGNAL_REFRESH_RESOURCES); |
There was a problem hiding this comment.
Let's avoid passing kind to super and have abstract overrides for each part that needs kind specific awareness.
| MCP_LEASE_TTL_MS = config.property("lease.ttl.ms", Duration.ofSeconds(30).toMillis()); | ||
| MCP_LEASE_RETRY_MS = config.property("lease.retry.ms", 100L); |
There was a problem hiding this comment.
Let's change these to remove the _MS suffix and make their type Duration instead, similar to MAX_AGE above.
| public static final String MCP_LEASE_TTL_MS_NAME = "zilla.binding.mcp.lease.ttl.ms"; | ||
| public static final String MCP_LEASE_RETRY_MS_NAME = "zilla.binding.mcp.lease.retry.ms"; |
There was a problem hiding this comment.
| public static final String MCP_LEASE_TTL_MS_NAME = "zilla.binding.mcp.lease.ttl.ms"; | |
| public static final String MCP_LEASE_RETRY_MS_NAME = "zilla.binding.mcp.lease.retry.ms"; | |
| public static final String MCP_LEASE_TTL_NAME = "zilla.binding.mcp.lease.ttl"; | |
| public static final String MCP_LEASE_RETRY_NAME = "zilla.binding.mcp.lease.retry"; |
| assertEquals(MCP_LEASE_TTL_MS.name(), MCP_LEASE_TTL_MS_NAME); | ||
| assertEquals(MCP_LEASE_RETRY_MS.name(), MCP_LEASE_RETRY_MS_NAME); |
There was a problem hiding this comment.
| assertEquals(MCP_LEASE_TTL_MS.name(), MCP_LEASE_TTL_MS_NAME); | |
| assertEquals(MCP_LEASE_RETRY_MS.name(), MCP_LEASE_RETRY_MS_NAME); | |
| assertEquals(MCP_LEASE_TTL.name(), MCP_LEASE_TTL_NAME); | |
| assertEquals(MCP_LEASE_RETRY.name(), MCP_LEASE_RETRY_NAME); |
| tools: |- | ||
| {"tools":[{"name": "get_weather","title": "Weather Information Provider","description": "Get current weather information for a location","inputSchema": {"type": "object","properties": {"location": {"type": "string","description": "City name or zip code"}},"required": ["location"]},"icons": [{"src": "https://example.com/weather-icon.png","mimeType": "image/png","sizes": ["48x48"]}],"execution": {"taskSupport": "optional"}}]} |
There was a problem hiding this comment.
| tools: |- | |
| {"tools":[{"name": "get_weather","title": "Weather Information Provider","description": "Get current weather information for a location","inputSchema": {"type": "object","properties": {"location": {"type": "string","description": "City name or zip code"}},"required": ["location"]},"icons": [{"src": "https://example.com/weather-icon.png","mimeType": "image/png","sizes": ["48x48"]}],"execution": {"taskSupport": "optional"}}]} | |
| tools: |- | |
| { | |
| "tools": | |
| [ | |
| { | |
| "name": "get_weather", | |
| "title": "Weather Information Provider", | |
| "description": "Get current weather information for a location", | |
| "inputSchema": | |
| { | |
| "type": "object", | |
| "properties": | |
| { | |
| "location": | |
| { | |
| "type": "string", | |
| "description": "City name or zip code" | |
| } | |
| }, | |
| "required": | |
| [ | |
| "location" | |
| ] | |
| }, | |
| "icons": | |
| [ | |
| { | |
| "src": "https://example.com/weather-icon.png", | |
| "mimeType": "image/png", | |
| "sizes": ["48x48"] | |
| } | |
| ], | |
| "execution": | |
| { | |
| "taskSupport": "optional" | |
| } | |
| } | |
| ] | |
| } |
Perhaps we can use this if it folds to JSON with minimal whitespace.
| resources: |- | ||
| {"resources":[{"uri": "file:///docs/welcome.md","name": "welcome","description": "Welcome document","mimeType": "text/markdown"}]} | ||
| prompts: |- | ||
| {"prompts":[{"name": "summarize","description": "Summarize a document"}]} |
There was a problem hiding this comment.
Suggest similar formatting to above feedback for these if it works out for tools.
- Rename MCP_LEASE_TTL_MS/RETRY_MS to MCP_LEASE_TTL/RETRY as Duration properties with ISO-8601 defaults (PT30S, PT0.1S) - Extract list hydrate stream into inner class McpListHydrateStream with ExpandableArrayBuffer body accumulator - Add abstract signalId() override on each list hydrater subclass instead of passing signalId as constructor parameter - Replace String prefix with String8FW in McpRoutePrefix to eliminate per-stream byte allocation in list factory - Remove kind() abstract method from item/list factories; pass kind as constructor parameter and store as final field - Move sessions map initialization into McpBindingConfig constructor #1774
| final long authorization = binding.cacheGuard != null | ||
| ? binding.cacheGuard.reauthorize(traceId, originId, 0L, binding.cacheCredentials) | ||
| : 0L; |
There was a problem hiding this comment.
If the McpLifecycleCache is 1-1 with the McpBindingConfig, then does it make sense to move the cache guard and cache credentials onto the cache, as guard and credentials?
If we scope things correctly, then it seems like the McpProxyCacheHydrater would only need the McpLifecycleCache and not the binding, agree?
| final McpRouteConfig route = binding.resolve(authorization); | ||
| if (route != null) | ||
| { | ||
| binding.lifecycleCache.acquireLifecycle(leaseTtl.toMillis(), |
There was a problem hiding this comment.
Suggest updating the acquireXxx methods to take duration, then convert toMillis() as needed in the acquireXxx implementation.
| signaler.signalAt(currentTimeMillis() + leaseRetry.toMillis(), SIGNAL_INITIATE_LIFECYCLE, | ||
| this::onInitiateLifecycle); |
There was a problem hiding this comment.
| signaler.signalAt(currentTimeMillis() + leaseRetry.toMillis(), SIGNAL_INITIATE_LIFECYCLE, | |
| this::onInitiateLifecycle); | |
| signaler.signalAt(Instant.now().plus(leaseRetry).toEpochMillis(), SIGNAL_INITIATE_LIFECYCLE, | |
| this::onInitiateLifecycle); |
Let's also add an overload of Signaler.signalAt that takes an Instant instead of long, and fills in default method via Instant.toEpochMillis().
| this.initialId = supplyInitialId.applyAsLong(routedId); | ||
| this.replyId = supplyReplyId.applyAsLong(initialId); | ||
| this.replyMax = bufferPool.slotCapacity(); | ||
| doLifecycleBegin(traceId); |
There was a problem hiding this comment.
This method should not be called from the constuctor.
The creator should explicitly call doLifecycleBegin post construction at the constructor call site.
| } | ||
| } | ||
|
|
||
| private void onInitialAcquireLeaseComplete( |
There was a problem hiding this comment.
| private void onInitialAcquireLeaseComplete( | |
| private void onInitialAcquireComplete( |
| cache.acquire(leaseTtl.toMillis(), this::onRefreshAcquireLeaseComplete); | ||
| } | ||
|
|
||
| private void onRefreshAcquireLeaseComplete( |
There was a problem hiding this comment.
| private void onRefreshAcquireLeaseComplete( | |
| private void onRefreshAcquireComplete( |
| final McpBeginExFW beginEx = mcpBeginExRW | ||
| .wrap(codecBuffer, 0, codecBuffer.capacity()) | ||
| .typeId(mcpTypeId) | ||
| .inject(builder -> injectInitialBeginEx(builder, sessionId)) | ||
| .build(); | ||
|
|
||
| receiver = newStream(this::onListHydrateMessage, originId, routedId, initialId, | ||
| initialSeq, initialAck, initialMax, traceId, authorization, 0L, beginEx); | ||
| state = McpState.openingInitial(state); | ||
|
|
||
| doEnd(receiver, originId, routedId, initialId, | ||
| initialSeq, initialAck, initialMax, traceId, authorization); | ||
| state = McpState.closedInitial(state); |
There was a problem hiding this comment.
Avoid doing this behavioral work in the constructor.
Instead have a doListHydrateBegin(...) and a doListHydrateEnd(...) and call them from the call site of the constructor.
| state = McpState.closedReply(state); | ||
| terminal(supplyTraceId.getAsLong()); |
There was a problem hiding this comment.
Need a proper reaction to onListHydrateAbort -> doListHydrateAbort, and onListHydrateReset -> doListHydrateReset with appropriate streamId, ie. initialId or replyId, and gated by McpState.initial/replyClosed.
- McpListCache.acquire and McpLifecycleCache.acquireLifecycle now take Duration instead of long milliseconds; conversion happens internally - Drop "Lease" from on*AcquireComplete callback names on list hydrater - Move cacheGuard and cacheCredentials onto McpLifecycleCache as guard/credentials fields; McpBindingConfig no longer exposes them - Split lifecycle and list hydrate stream constructors from their behavioral work; creators explicitly call doListHydrateBegin/End and doLifecycleBegin post-construction - Add proper onListHydrateAbort/Reset handlers with do* helpers and McpState.initialClosed/replyClosed gating - Use Signaler.signalAt(Instant) overloads with Instant.now().plus(...) for lease retry and cache refresh scheduling #1774
Closes #1737.
Summary
Adds
options.cacheto themcp · proxybinding. When configured, the proxy attaches a single per-binding hydrate session to its downstream exit, populates a shared store withtools/list,resources/list, andprompts/listpayloads, and serves those list calls from cache without round-tripping downstream. Folds what was originally proposed as a separatemcp · cachekind into an option on the existing proxy — every original topology has an exact equivalent in the folded model, and the enginekind: cacheprerequisite drops out.Config
Runtime behaviour
MCP_SESSION_ID) and issues per-kind list BEGINs in parallel. Each kind's response body is written to the configured store under keystools/resources/prompts.signaler.signalNow(delivered to the agent's reply-direction stream — the engine registers binding consumers inthrottles[replyId], where SignalFW is dispatched) until every configured kind has settled. By the time an agent seeslifecycle initialized, every list call hits a populated cache.options.cache.authorizationis set, attach resolves the single guard, callsguard.reauthorize(traceId, binding.id, 0L, credentials)once, and stamps the resulting token onto every outbound BEGIN/END/WINDOW on the hydrate session and its list streams.tools/list,resources/list,prompts/listrequests hitMcpCacheListServer, which reads the cached payload from the store and emits it as DATA + END without forwarding.signaler.signalAt(now + ttl)for that kind. On expiry, a new list stream is issued on the same hydrate session and the cache value is overwritten. Aborts during refresh preserve the prior cached entry.lifecycle.lockarbitrates which worker opens the hydrate session (loser polls at 100 ms intervals); a per-kind<kind>.lockarbitrates which worker issues each list call on the wire. Winners release on hydrate complete (lifecycle lease) or aftercache.put/ abort terminal (per-kind lease). Settle paths split: cache hits funnel throughmarkSettledso only the worker that actually did the list-stream work arms a refresh signal.notifications/{tools,resources,prompts}/list_changed) — deferred per the issue description.Refactor
McpProxyFactoryballooned absorbing cache support. Extracted into four per-kind-family factories, each implementingBindingHandlerand dispatched fromMcpProxyFactory.factories: Int2ObjectHashMap<BindingHandler>:McpProxyLifecycleFactory—KIND_LIFECYCLE.McpProxyItemFactory(abstract base +McpProxyToolsCallFactory/McpProxyPromptsGetFactory/McpProxyResourcesReadFactory) —KIND_TOOLS_CALL/KIND_PROMPTS_GET/KIND_RESOURCES_READ.McpProxyListFactory(abstract base +McpProxyToolsListFactory/McpProxyPromptsListFactory/McpProxyResourcesListFactory) —KIND_TOOLS_LIST/KIND_PROMPTS_LIST/KIND_RESOURCES_LIST.newStreambranches internally onbinding.cache != nullto pickMcpListServer(passthrough) orMcpCacheListServer(serve-from-cache).McpProxyFactoryshrinks from ~3631 lines to ~566 lines and now owns only the dispatch map, attach/detach, and the multi-kind hydrate coordinator (McpHydrateSession+McpHydrateListStream) which cannot be cleanly carved per kind.Spec scripts
cache.hydratecache.hydrate.tools/.resources/.promptscache.hydrate.persistcache.hydrate.errorcache.hydrate.authcache.serve.initializecache.serve.tools.list/.resources.list/.prompts.listcache.refresh.tools/.resources/.promptscache.refresh.tools.errorcache.refresh.tools.contendedITs
specs/binding-mcp.spec/.../streams/cache/)runtime/binding-mcp/...)ProxyCacheLifecycleIT(5 tests)McpProxyCacheLifecycleIT(5 tests)ProxyCacheToolsListIT(5 tests)McpProxyCacheToolsListIT(4 tests)ProxyCacheResourcesListIT(3 tests)McpProxyCacheResourcesListIT(3 tests)ProxyCachePromptsListIT(3 tests)McpProxyCachePromptsListIT(3 tests)McpProxyCacheContentionIT(1 test,ENGINE_WORKERS=2)Every
@Ignorefrom the original test-first scaffold has been removed.Test-only configs
proxy.cache.yaml— empty test store, single worker; used by hydrate ITs.proxy.cache.refresh.yaml— empty test store with per-kind TTLs; refresh + contention ITs.proxy.cache.seeded.yaml— test store pre-seeded with the cached JSON; serve ITs.proxy.cache.auth.yaml— test store + enginetestguard with credentials; auth IT.proxy.cache.toolkit.yaml— multi-route variant for toolkit-flavoured tests.TestStore enhancement (
runtime/enginetest sources)TestStorenow ownsConcurrentMap<Long, ConcurrentMap<String, String>>keyed bystoreId; workers attached to the same store share entries.options.entriesseeds viaputIfAbsentso the first attach wins. Previously the entries map was per-handler, soputIfAbsent-based leases were trivially won by every worker — required forMcpProxyCacheContentionITto observe a single winner. Transparent for single-worker scenarios.binding-mcpno longer test-depends onstore-memory; all cache ITs run ontype: test.Test plan
specs/binding-mcp.specProxyCache*IT) → 16 pass / 0 fail.runtime/binding-mcp) → 158 pass / 0 fail / 0 skipped.specs/engine.spec→ 39 pass / 0 fail after the TestStore sharing change.🤖 Generated with Claude Code
Generated by Claude Code