Skip to content

feat(binding-mcp): mcp · proxy cache option (#1737)#1774

Open
jfallows wants to merge 43 commits into
developfrom
feature/1737-mcp-cache
Open

feat(binding-mcp): mcp · proxy cache option (#1737)#1774
jfallows wants to merge 43 commits into
developfrom
feature/1737-mcp-cache

Conversation

@jfallows
Copy link
Copy Markdown
Contributor

@jfallows jfallows commented May 16, 2026

Closes #1737.

Summary

Adds options.cache to the mcp · proxy binding. When configured, the proxy attaches a single per-binding hydrate session to its downstream exit, populates a shared store with tools/list, resources/list, and prompts/list payloads, and serves those list calls from cache without round-tripping downstream. Folds what was originally proposed as a separate mcp · cache kind into an option on the existing proxy — every original topology has an exact equivalent in the folded model, and the engine kind: cache prerequisite drops out.

Config

options:
  cache:
    store: <store-name>      # required
    ttl:                     # optional; per kind
      tools: PT1S
      resources: PT2S
      prompts: PT3S
    authorization:           # optional; max one entry (schema-enforced)
      <guard-name>:
        credentials: "<value>"

Runtime behaviour

  • Hydrate. At binding attach, the proxy opens one lifecycle stream to downstream (session id from MCP_SESSION_ID) and issues per-kind list BEGINs in parallel. Each kind's response body is written to the configured store under keys tools / resources / prompts.
  • Lifecycle gating. Agent lifecycle BEGIN replies are deferred via signaler.signalNow (delivered to the agent's reply-direction stream — the engine registers binding consumers in throttles[replyId], where SignalFW is dispatched) until every configured kind has settled. By the time an agent sees lifecycle initialized, every list call hits a populated cache.
  • Authorization. When options.cache.authorization is set, attach resolves the single guard, calls guard.reauthorize(traceId, binding.id, 0L, credentials) once, and stamps the resulting token onto every outbound BEGIN/END/WINDOW on the hydrate session and its list streams.
  • Serve. Agent tools/list, resources/list, prompts/list requests hit McpCacheListServer, which reads the cached payload from the store and emits it as DATA + END without forwarding.
  • Periodic refresh. After each kind's terminal write (cache.put completion, bodyLen==0 skip-put, or abort/reset), the binding schedules signaler.signalAt(now + ttl) for that kind. On expiry, a new list stream is issued on the same hydrate session and the cache value is overwritten. Aborts during refresh preserve the prior cached entry.
  • Lease coordination across workers. A binding-wide lifecycle.lock arbitrates which worker opens the hydrate session (loser polls at 100 ms intervals); a per-kind <kind>.lock arbitrates which worker issues each list call on the wire. Winners release on hydrate complete (lifecycle lease) or after cache.put / abort terminal (per-kind lease). Settle paths split: cache hits funnel through markSettled so only the worker that actually did the list-stream work arms a refresh signal.
  • Notification-driven refresh (notifications/{tools,resources,prompts}/list_changed) — deferred per the issue description.

Refactor

McpProxyFactory ballooned absorbing cache support. Extracted into four per-kind-family factories, each implementing BindingHandler and dispatched from McpProxyFactory.factories: Int2ObjectHashMap<BindingHandler>:

  • McpProxyLifecycleFactoryKIND_LIFECYCLE.
  • McpProxyItemFactory (abstract base + McpProxyToolsCallFactory / McpProxyPromptsGetFactory / McpProxyResourcesReadFactory) — KIND_TOOLS_CALL / KIND_PROMPTS_GET / KIND_RESOURCES_READ.
  • McpProxyListFactory (abstract base + McpProxyToolsListFactory / McpProxyPromptsListFactory / McpProxyResourcesListFactory) — KIND_TOOLS_LIST / KIND_PROMPTS_LIST / KIND_RESOURCES_LIST. newStream branches internally on binding.cache != null to pick McpListServer (passthrough) or McpCacheListServer (serve-from-cache).

McpProxyFactory shrinks from ~3631 lines to ~566 lines and now owns only the dispatch map, attach/detach, and the multi-kind hydrate coordinator (McpHydrateSession + McpHydrateListStream) which cannot be cleanly carved per kind.

Spec scripts

Folder Verifies
cache.hydrate hydrate session opens with lifecycle
cache.hydrate.tools / .resources / .prompts each kind populated during hydrate
cache.hydrate.persist lifecycle stays open after all three list streams close
cache.hydrate.error downstream abort during hydrate; lifecycle survives
cache.hydrate.auth hydrate BEGIN carries the guard-issued auth token
cache.serve.initialize agent observes lifecycle initialised once cache is ready
cache.serve.tools.list / .resources.list / .prompts.list each list served from cache, no downstream call
cache.refresh.tools / .resources / .prompts second list call after TTL on same hydrate session
cache.refresh.tools.error refresh aborts; prior cached entry retained
cache.refresh.tools.contended two workers, exactly one issues each list call on the wire

ITs

Peer-to-peer (specs/binding-mcp.spec/.../streams/cache/) Engine-driven (runtime/binding-mcp/...)
ProxyCacheLifecycleIT (5 tests) McpProxyCacheLifecycleIT (5 tests)
ProxyCacheToolsListIT (5 tests) McpProxyCacheToolsListIT (4 tests)
ProxyCacheResourcesListIT (3 tests) McpProxyCacheResourcesListIT (3 tests)
ProxyCachePromptsListIT (3 tests) McpProxyCachePromptsListIT (3 tests)
McpProxyCacheContentionIT (1 test, ENGINE_WORKERS=2)

Every @Ignore from the original test-first scaffold has been removed.

Test-only configs

  • proxy.cache.yaml — empty test store, single worker; used by hydrate ITs.
  • proxy.cache.refresh.yaml — empty test store with per-kind TTLs; refresh + contention ITs.
  • proxy.cache.seeded.yaml — test store pre-seeded with the cached JSON; serve ITs.
  • proxy.cache.auth.yaml — test store + engine test guard with credentials; auth IT.
  • proxy.cache.toolkit.yaml — multi-route variant for toolkit-flavoured tests.

TestStore enhancement (runtime/engine test sources)

TestStore now owns ConcurrentMap<Long, ConcurrentMap<String, String>> keyed by storeId; workers attached to the same store share entries. options.entries seeds via putIfAbsent so the first attach wins. Previously the entries map was per-handler, so putIfAbsent-based leases were trivially won by every worker — required for McpProxyCacheContentionIT to observe a single winner. Transparent for single-worker scenarios. binding-mcp no longer test-depends on store-memory; all cache ITs run on type: test.

Test plan

  • Peer-to-peer ITs (specs/binding-mcp.spec ProxyCache*IT) → 16 pass / 0 fail.
  • Engine-driven binding-mcp ITs (runtime/binding-mcp) → 158 pass / 0 fail / 0 skipped.
  • specs/engine.spec39 pass / 0 fail after the TestStore sharing change.

🤖 Generated with Claude Code


Generated by Claude Code

claude added 12 commits May 13, 2026 05:56
Adds the test-first scaffold for the upcoming mcp cache binding:
McpCacheIT skeleton with all 22 planned test methods (Group A
warmup tests active, Groups B/C/D/F/G/I marked @ignore until their
scripts land), four IT zilla.yaml configs, schema patch entries
for kind: cache and options.warmup/ttl, and six fully-written
Group A warmup .rpt scripts (lifecycle, tools/list, resources/list,
prompts/list, lifecycle-persists, guarded-credentials).

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Adds the missing client.rpt for each Group A cache warmup scenario
and a CacheIT class in the spec project that runs every script pair
peer-to-peer without Zilla. Verifies the scripts are self-consistent
before any cache binding implementation exists.

Verified locally: ./mvnw -pl specs/binding-mcp.spec verify -Dit.test=CacheIT
runs all 6 tests to green.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…pts (#1737)

Splits the monolithic CacheIT/McpCacheIT into per-group classes
(*WarmupIT, *ListIT, etc.) to keep each group focused as the
script set grows. Group A (warmup) renamed to *WarmupIT.

Adds Group B (list operations served from cache) — 4 scenarios:
agent initialize, tools/list, resources/list, prompts/list. Each
scenario carries paired client.rpt (agent at app0) and server.rpt
(cache facade) so CacheListIT can verify the agent↔cache contract
peer-to-peer. McpCacheListIT pairs the agent client.rpt with the
Group A warmup server.rpt at app1 so the cache is populated before
the agent's list arrives. B5 (list-before-warmup) stays @ignored
with a script TODO.

Verified locally: CacheWarmupIT 6/6 + CacheListIT 4/4 green
peer-to-peer.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…#1737)

The cache binding is one hop above the proxy; proxy fanout across
exits is already covered by McpProxyIT (shouldList*WithToolkitMulti).
The only cache-specific concern in the original Group D was resilience
to a downstream error during warmup, which is more naturally a Group A
(warmup) scenario than a fanout one.

Adds cache.warmup.session.downstream.error: downstream ABORTs the
tools/list stream during warmup; the lifecycle session must survive
so the cache can continue with other list types or retry later.

CacheWarmupIT 7/7 green peer-to-peer.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Renames/splits the CacheWarmupIT and CacheListIT pairs (peer-to-peer
and engine-driven) into method-scoped ITs aligned with the issue's
Responsibilities-by-MCP-method table:

  CacheLifecycleIT       — warmup session open/persist/error/guard,
                           agent initialize-from-cache
  CacheToolsListIT       — tools/list warmup + served-from-cache
  CacheResourcesListIT   — resources/list warmup + served-from-cache
  CachePromptsListIT     — prompts/list warmup + served-from-cache

Same applies to the McpCache* counterparts. No script files moved;
only the IT-class references changed. All 11 peer-to-peer tests
verified green.

Remaining roster slots (CacheToolsCallIT/ResourcesReadIT/PromptsGetIT
for pass-through invocations, and the per-list-method store/refresh
coverage) will be added as their scripts come online.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Replaces the separate kind: cache binding with options.cache on the
existing kind: proxy binding. Since all original two-binding topologies
have an exact equivalent in the folded model, no expressiveness is lost
and the engine "kind: cache" prerequisite goes away.

Also:
- Renames "warmup" → "hydrate" throughout (more precise terminology
  for cache population).
- Flattens the warmup wrapper: authorization/store/ttl now live
  directly under options.cache instead of options.cache.warmup.
- Drops the guarded vs. unguarded test split — the .rpt scripts don't
  observe authorization on BEGIN so the scenarios produce identical
  transcripts.
- Renames IT classes: CacheXIT → ProxyCacheXIT and McpCacheXIT →
  McpProxyCacheXIT to align with the existing McpProxyIT neighbour.
- Configs rewritten: cache.yaml/cache.multi.yaml/cache.refresh.yaml
  → proxy.cache.yaml/proxy.cache.multi.yaml/proxy.cache.refresh.yaml;
  each declares kind: proxy with options.cache and a stores: memory0
  reference.
- Schema patch: drop "cache" from kind enum; replace flat
  options.warmup/options.ttl with options.cache {store, ttl,
  authorization}; store is required.

Verified: 10/10 ProxyCache*IT peer-to-peer tests pass.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Renames scenarios and test methods to use "hydrate" / "serve" as
verbs rather than as noun-modifiers. Examples:

  cache.hydrate.session.tools.list     → cache.hydrate.tools
  cache.agent.tools.list.from.cache    → cache.serve.tools.list
  shouldPopulateToolsViaHydrate        → shouldHydrateTools
  shouldServeAgentToolsListFromCache   → shouldServeToolsList
  shouldKeepHydrateSessionOpenAfter... → shouldHydratePersist

Verified: 10/10 ProxyCache*IT peer-to-peer tests pass.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Adds per-method refresh scenarios that model the cache re-issuing a
list call on the same hydrate session after a TTL elapses, plus a
refresh-error case where the refresh attempt aborts and the cache must
retain its prior cached entry.

Scenarios:
  cache.refresh.tools / cache.refresh.resources / cache.refresh.prompts
  cache.refresh.tools.error

Tests:
  ProxyCacheToolsListIT.shouldRefreshTools
  ProxyCacheToolsListIT.shouldRefreshToolsError
  ProxyCacheResourcesListIT.shouldRefreshResources
  ProxyCachePromptsListIT.shouldRefreshPrompts
  (engine-driven counterparts added too)

Also renames cache.hydrate.downstream.error -> cache.hydrate.error
(and shouldHydrateDownstreamError -> shouldHydrateError) for the
single-qualifier convention.

Lease-contention coverage (cache.refresh.tools.contended) is
deferred: the lease behavior is store-level, only meaningfully
testable engine-driven via either a TestStore seeding hook or a
multi-worker EngineRule. Both are downstream work from the cache
binding implementation; the wire-level refresh tests above cover
the protocol shape.

Verified: 14/14 ProxyCache*IT peer-to-peer tests pass.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Adds three scenarios where the agent's list request arrives while
the cache is still hydrating. The cache facade holds the request
read in full, then waits for a per-method hydrate barrier to fire
before writing the cached response. True timing isn't observable
peer-to-peer, but the scripts document the required wire ordering
for engine-driven tests to enforce.

Scenarios + tests:
  cache.serve.tools.list.hydrating       shouldServeToolsListHydrating
  cache.serve.resources.list.hydrating   shouldServeResourcesListHydrating
  cache.serve.prompts.list.hydrating     shouldServePromptsListHydrating

Verified: 17/17 ProxyCache*IT peer-to-peer tests pass.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Adds TestStoreOptionsConfig with a Map<String,String> entries
field, exposed via the standard options-config builder/adapter
pattern, and wires it through TestStoreContext so each new
TestStoreHandler is pre-populated with the configured entries.

Enables tests to set up store state declaratively before a binding
that uses the store begins operating — e.g., seeding a lease lock
key so a binding observes a held lease and exercises its
already-locked code path deterministically.

Example:

  stores:
    memory0:
      type: test
      options:
        entries:
          tools.lock: "worker-0"

Verified: engine unit tests (320/320), engine.spec ITs (39/39),
and binding-mcp.spec ProxyCache*IT (17/17) all pass.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Adds proxy.cache.contended.yaml — a TestStore configured with
tools.lock pre-seeded to a foreign worker id — and a corresponding
McpProxyCacheToolsListIT.shouldRefreshToolsContended test method
(currently @ignore'd until the cache binding implementation lands).

When enabled, the test verifies that the cache binding consults
the lease before issuing a refresh tools/list: with the lock held
in the store, putIfAbsent returns the seeded value and the
refresh path is skipped. The downstream server.rpt models only
the hydrate exchange, so any spurious refresh tools/list would
hit an unmatched stream and fail the test.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…#1737)

Replaces the TestStore-seeded contention approach with a more faithful
multi-worker engine test:

- cache.refresh.tools.contended/{client,server}.rpt — models two
  hydrate sessions, exactly two tools/list calls on the wire (one
  initial hydrate by lease-winner, one refresh by lease-winner).
  Second worker's lifecycle is observed but its tools/list never
  hits the wire because the lease was lost.
- McpProxyCacheContentionIT — new engine-driven IT class configured
  with ENGINE_WORKERS=2 so both cache binding instances genuinely
  race for the hydrate / refresh leases against the shared
  store-memory. @ignore'd until cache binding implementation lands.
- ProxyCacheToolsListIT.shouldRefreshToolsContended — peer-to-peer
  counterpart added to the existing list IT.
- proxy.cache.contended.yaml dropped — no longer needed (the test
  uses proxy.cache.refresh.yaml which already references store-memory).

Verified: 18/18 ProxyCache*IT peer-to-peer tests pass (was 17,
added shouldRefreshToolsContended).

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Comment on lines +35 to +39
// Multi-worker cache contention tests. Configured with workers=2 so each
// worker instantiates its own cache binding and the two race for the
// hydrate / refresh leases backed by the shared store. The wire pattern
// observable at the downstream is two lifecycle sessions but only the
// lease-winner issues each list call.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these type of comments from the ITs.
The scenarios and scripts should be the documentation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proxy.cache.toolkit.yaml is sufficient here, even though it has multiple toolkit routes.

Comment on lines +28 to +29
routes:
- exit: app1
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
routes:
- exit: app1
exit: app1

Comment on lines +16 to +18
# Downstream errors out during hydrate tools/list. Lifecycle session must
# survive so the cache can retry later or continue with other list types.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these kinds of comments from the spec scripts.

claude added 9 commits May 16, 2026 04:53
- proxy.cache.yaml / proxy.cache.refresh.yaml: use binding-level exit
  instead of single-element routes for the single-exit case.
- proxy.cache.multi.yaml → proxy.cache.toolkit.yaml.
- Remove explanatory # comments from scripts; scenarios + script bodies
  are the documentation.
- Remove class-level comment from McpProxyCacheContentionIT; same
  reasoning.

Verified: 18/18 ProxyCache*IT peer-to-peer tests pass.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…tentionIT

The static import of ENGINE_WORKERS was placed after java/org statics
with a blank-line separator. Per the project convention static
imports go in a single block alphabetically sorted by full path, so
io.aklivity.* comes first.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
The engine-driven McpProxyCache*IT tests fail at engine bring-up
because the proxy binding does not yet recognize options.cache.
This is the expected test-first state, but CI cannot distinguish
"expected failure until impl" from "regression".

Add class-level @ignore("TODO: enable when proxy cache option lands")
to all 5 McpProxyCache*IT classes:
  McpProxyCacheLifecycleIT, McpProxyCacheToolsListIT,
  McpProxyCacheResourcesListIT, McpProxyCachePromptsListIT,
  McpProxyCacheContentionIT.

The peer-to-peer ProxyCache*IT tests in specs/binding-mcp.spec
remain active and green — they verify script self-consistency
independent of the binding implementation.

Verified: runtime/binding-mcp clean verify → BUILD SUCCESS, 146
tests run, 5 skipped (the ignored cache ITs).

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
First step of the proxy cache implementation: add the config-layer
types that map to options.cache in the schema. No runtime behavior
yet — McpProxyFactory still ignores the parsed config.

- McpCacheConfig: immutable POJO with store, per-method ttl,
  authorization map (guard-name → credentials)
- McpCacheConfigBuilder: fluent builder mirroring the existing
  McpAuthorizationConfigBuilder pattern
- McpOptionsConfig: add cache field + 4-arg constructor
- McpOptionsConfigBuilder: add cache() method (nested-builder
  pattern matching authorization())
- McpOptionsConfigAdapter: serialize/deserialize the cache block
  with ttl + per-guard authorization credentials

Verified: binding-mcp tests pass, checkstyle clean,
binding-mcp.spec SchemaTest still green.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…1737)

Phase B of the proxy cache implementation. When options.cache is
present, attach() now:

- resolves the unconditional exit route
- allocates a HydrateSession (initialId/replyId etc.)
- schedules a signaler tick that fires immediately
- the tick handler issues a lifecycle BEGIN downstream with
  sessionId="hydrate-1", modelled after KafkaGrpcRemoteServerFactory

The session sends reply WINDOW on receipt of the downstream's BEGIN
reply, and END on detach. No list-method enumeration yet; tools/
resources/prompts hydrate, store integration, lease coordination,
and refresh land in later phases.

McpProxyCacheLifecycleIT remains @ignore'd until all four scenarios
in the class pass; verified locally that the shouldHydrate scenario
itself now runs to green when temporarily un-Ignored.

Also adds store-memory test dependency (the cache configs reference
type: memory for their backing store).

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Phase C — after the hydrate lifecycle reply arrives, the cache now
chains tools/list, resources/list, prompts/list as three sequential
sub-streams on the same hydrate session. Each list stream sends
BEGIN+END (close write), receives BEGIN reply, DATA and END, then
signals the parent HydrateSession to start the next.

Response bodies are discarded for now; store integration arrives in
Phase D.

McpProxyCacheLifecycleIT now passes all 4 tests with the engine
(shouldHydrate, shouldHydratePersist, shouldHydrateError,
shouldServeInitialize) — class-level @ignore removed.

McpProxyCache{ToolsList,ResourcesList,PromptsList,Contention}IT
remain @ignore'd until serve-from-cache / refresh / lease land.

Verified: ./mvnw -pl runtime/binding-mcp clean verify → 149 pass,
4 ITs skipped (the still-Ignored cache list / contention ITs).

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Phase D — resolve the configured store at attach() time via
context.supplyStore(resolveId.applyAsLong(cache.store)), thread the
handle through HydrateSession into each HydrateListStream, and on
each list reply END write the accumulated body to the store under
the per-method key ("tools" / "resources" / "prompts") with no
expiry (Long.MAX_VALUE).

Response bodies are buffered byte-by-byte into a per-list-stream
byte[] that grows as needed. On END the accumulated bytes are
decoded as UTF-8 and pushed to the StoreHandler via put().

No serve-from-cache yet — agent list requests still pass through
to the existing proxy code path. Phase E will intercept them and
respond from the store.

Verified: ./mvnw -pl runtime/binding-mcp clean verify → 149 pass,
4 list/contention ITs still @ignore'd.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Adds McpCacheListServer that intercepts tools/resources/prompts list
streams when options.cache is configured on a proxy binding, looks up
the cached envelope via StoreHandler.get(), and emits the cached bytes
as DATA followed by END without forwarding to upstream.

Un-ignores the hydrate + serve tests across the per-kind cache IT
classes; periodic refresh and hydrating-wait tests remain ignored
pending later phases.
…lel (#1737)

Move the per-kind store-key plumbing out of HydrateSession into a new
McpListCache attached to McpBindingConfig, dropping the redundant
cacheStores map. HydrateSession becomes a populator only; the cache
(backed by StoreHandler) is the source of truth for "is kind X ready?"
via an async get. With per-kind status independent of session
sequencing, the three list-stream round-trips dispatch on the same
worker tick - wall-clock hydration drops to a single round-trip, and
an error on one kind no longer delays the others or blocks a retry
on reconnect.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
private static final String MCP_TYPE_NAME = "mcp";

private static final int SIGNAL_INITIATE_HYDRATE = 1;
private static final String HYDRATE_SESSION_ID = "hydrate-1";
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be hardcoded in the implementation.
Instead it should use a session id supplier, same as other binding types, and override to fix the value for the ITs that need a deterministic value.

}
}

private final class HydrateSession
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private final class HydrateSession
private final class McpHydrateSession

claude added 7 commits May 16, 2026 20:16
- Rename `HydrateSession` to `McpHydrateSession` to follow the binding's
  type-prefixed inner class convention.
- Make `McpOptionsConfig` constructor package-private; construction is
  via `McpOptionsConfig.builder()`.
- Replace the hardcoded `"hydrate-1"` constant in `McpProxyFactory` with
  a `Supplier<String>` obtained from `McpConfiguration.sessionIdSupplier()`.
  Cache ITs configure `MCP_SESSION_ID_NAME` to a static method returning
  `"hydrate-1"`, mirroring the `McpServerIT` override pattern.
- Replace the three flat `Duration ttlTools/ttlResources/ttlPrompts` fields
  on `McpCacheConfig` with a nested `McpCacheTtlConfig` (built via
  `McpCacheConfigBuilder.ttl()` returning `McpCacheTtlConfigBuilder`).
  `McpOptionsConfigAdapter` serializes and deserializes the nested form.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…ngConfig

Replace the two factory-level maps on `McpProxyFactory` (`sessions:
Map<String, McpLifecycleServer>` shared across all bindings, and
`hydrateSessions: Long2ObjectHashMap<McpHydrateSession>`) with per-binding
fields on `McpBindingConfig`:

- `sessions: Map<String, McpProxySession>` — now correctly scoped per
  binding rather than per worker. Session ids are only meaningful within
  one binding's namespace.
- `hydrate: McpProxyHydrate` — the per-binding hydrate session.

`McpProxySession` and `McpProxyHydrate` are package-visible interfaces
declared in `internal.config` so `McpBindingConfig` can type the fields
without depending on the inner classes inside `McpProxyFactory`. The inner
classes implement the interfaces — no behavior change. `McpLifecycleServer`
gains a constructor reference to its `McpBindingConfig` so `cleanup` can
call `binding.sessions.remove(sessionId)`.

This sets up the per-binding hook that the upcoming per-kind factory
extraction will route state through.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Move McpLifecycleServer and McpLifecycleClient from McpProxyFactory
into a new McpProxyLifecycleFactory as inner classes - same
enclosing-instance pattern, just a smaller enclosing factory.
McpLifecycleServer (and the cross-class accessors `sender`, `originId`,
`routedId`, `sessionId`, `supplyClient`) is package-private so the
still-inline call/list dispatch in McpProxyFactory can pattern-match
against it. McpLifecycleClient is also package-private because McpClient
and McpListClient hold typed references to it.

Introduce `Int2ObjectHashMap<BindingHandler> factories` on
McpProxyFactory and delegate KIND_LIFECYCLE to the new factory via the
dispatch table; other kinds remain inline until subsequent phases
extract them. Mirrors KafkaClientFactory's per-kind factory pattern.

McpProxyFactory drops 560 lines.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…ompts/get, resources/read

Introduce an abstract McpProxyItemFactory implementing BindingHandler
plus three concrete per-kind subclasses (McpProxyToolsCallFactory,
McpProxyPromptsGetFactory, McpProxyResourcesReadFactory). The base owns
the shared stream state machine - the McpServer and McpClient inner
classes move verbatim out of McpProxyFactory - and exposes three hooks
for the kind-specific bits:

  protected abstract int kind();
  protected abstract void injectInitialBeginEx(McpBeginExFW.Builder, String sid, String identifier);
  protected abstract void injectReplyBeginEx(McpBeginExFW.Builder, String sid, McpBeginExFW upstream);

Each subclass implements ~30 lines: its KIND constant and the
tools-call/prompts-get/resources-read variants of the BEGIN extension
builder. Naming "Item" reflects that the three operations are different
verbs (call/get/read) acting on a single identified MCP item - opposite
to the List kinds.

McpProxyFactory registers all three subclasses in its existing
Int2ObjectHashMap<BindingHandler> factories map next to the Phase 2
McpProxyLifecycleFactory entry; the newStream dispatcher's else branch
collapses to factories.get(kind).newStream(...). The local McpServer,
McpClient, and rewriteReplyBeginEx are removed from McpProxyFactory
along with three now-unused flyweight fields (flushRO, challengeRO,
mcpChallengeExRW) and the McpChallengeExFW import. Net change:
McpProxyFactory shrinks 735 lines (3071 -> 2336); McpProxyItemFactory
adds 1229 lines.

Each subclass receives the same (McpConfiguration, EngineContext,
LongFunction<McpBindingConfig>) constructor signature as the Phase 2
lifecycle factory, and the per-factory do-helpers (doBegin, doData,
doEnd, doAbort, doFlush, doChallenge, doReset, doWindow) are local
copies rather than parent-shared - matches Phase 2 precedent. No
visibility widening was needed beyond what Phase 2 already opened on
McpLifecycleServer/McpLifecycleClient.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…e hydrate ITs

The parallel-hydrate path in McpHydrateSession.onBegin iterates over
[KIND_TOOLS_LIST, KIND_RESOURCES_LIST, KIND_PROMPTS_LIST] and dispatches
all three list streams on the same worker tick. The per-kind cache ITs
(McpProxyCache{Resources,Prompts}ListIT.shouldHydrate{Resources,Prompts})
each only `read` their own kind's BEGIN ext, so the other two streams
emit unexpected BEGINs that fail the script assertion. Updating each
script to accept three BEGINs would push parallel-hydrate setup into
tests whose intent is the single-kind hydrate behavior in isolation.

Add MCP_HYDRATE_KIND_FILTER (IntPredicate, default `k -> true`) to
McpConfiguration, exposed as `hydrateKindFilter()`. Mirrors the existing
MCP_SESSION_ID supplier pattern: tests resolve a `Class::method` static
reference; the decoder loads it via MethodHandle. Production keeps the
default — all three kinds hydrate in parallel — so the contention IT
exercises the real behavior unchanged. Each per-kind IT now configures
the filter to its single KIND_*_LIST so only that kind's BEGIN is
emitted and the existing per-kind scripts pass unmodified.

Result locally: 3 hydrate failures fixed
(McpProxyCacheToolsListIT had timed out, also resolved). Three
`shouldServe*` failures remain — cache-serve reads empty payload — and
are independent of hydrate dispatch.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…ompts/list, resources/list

Mirror the Phase 3 item-factory shape for the list slice. Introduce an
abstract McpProxyListFactory implementing BindingHandler plus three
concrete per-kind subclasses (McpProxyToolsListFactory,
McpProxyPromptsListFactory, McpProxyResourcesListFactory). The base
owns the shared state machine - McpListClient, McpListClientDecoder
with its ten decode states, McpListServer (passthrough), and
McpCacheListServer (cache-serve variant), plus the indexOfByte JSON
helper - and exposes seven hooks for the kind-specific bits:

  protected abstract int kind();
  protected abstract void injectInitialBeginEx(McpBeginExFW.Builder, String sid);
  protected abstract void injectReplyBeginEx(McpBeginExFW.Builder, String sid);
  protected abstract DirectBuffer listReplyOpenPrelude();
  protected abstract JsonParserFactory listItemParserFactory();
  protected abstract String arrayKey();
  protected abstract String idKey();

Each subclass owns its per-kind prelude bytes (the JSON envelope-open
literal), JsonParserFactory for the streaming item parser, array key
("tools" / "prompts" / "resources"), and id key ("name" or "uri" for
resources). The kind field is removed from McpListClient, McpListServer,
and McpCacheListServer - each factory instance is bound to one kind, so
the seven `switch (kind)` blocks in those classes collapse to direct
hook calls.

Cache-vs-passthrough is now dispatched internally in
McpProxyListFactory.newStream: when binding.cache is non-null an
McpCacheListServer is constructed, otherwise an McpListServer with its
McpListClient via lifecycle.supplyClient. The McpProxyFactory factories
map gets three new entries (KIND_TOOLS_LIST / KIND_PROMPTS_LIST /
KIND_RESOURCES_LIST) and the newStream dispatcher collapses to
factories.get(kind).newStream(...) for every dispatched kind, including
lifecycle - the per-kind factories enforce their own session/route
preconditions.

HydrateListStream, McpHydrateSession, and SIGNAL_INITIATE_HYDRATE stay
in McpProxyFactory because they coordinate multi-kind hydrate from one
session and aren't request-time dispatch paths; the small kind-switch
in HydrateListStream.initiate persists.

McpProxyFactory shrinks 1777 lines (2343 -> 566), shedding all list
machinery plus six now-unused do* helpers (doBegin, doData, doAbort,
two doFlush overloads, doChallenge, doReset), unused flyweight RO/RW
fields and imports, and the no-longer-called sessionId(McpBeginExFW)
helper. doEnd, doWindow, and a local newStream remain because
HydrateListStream still calls them. McpProxyListFactory adds 2016 lines.

No further visibility widening was needed beyond what Phase 2 opened
on the lifecycle accessors.

ITs: 156 pass / 3 fail / 8 skipped, identical to the pre-refactor
baseline. The three `shouldServe*` failures are a pre-existing
cache-serve empty-payload bug unrelated to this refactor.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
The three shouldServe*List runtime ITs raced against hydrate: the agent
sent its lifecycle BEGIN, McpProxyLifecycleFactory replied immediately
(it has no hydrate awareness), the agent then sent its list BEGIN, and
McpCacheListServer hit an empty cache because hydrate was still in
flight - the agent received an empty payload.

Make these ITs deterministic by pre-seeding the cache instead of racing
against hydrate. Mid-hydrate is a distinct scenario that lifecycle
gating will cover later (the .hydrating scripts and @ignore'd
shouldServe*ListHydrating methods are removed here; when lifecycle
gating lands we'll re-introduce a clearer cache.serve.<kind>.refreshing
flavor rather than overload "hydrating").

Changes:
- New config proxy.cache.seeded.yaml using TestStore with options.entries
  carrying the tools/resources/prompts JSON. The existing proxy.cache.yaml
  is unchanged - hydrate ITs continue to verify an empty memory store.
- Runtime shouldServe{Tools,Resources,Prompts}List ITs switch to
  proxy.cache.seeded.yaml and reuse the existing cache.hydrate/server
  script for the downstream (it only handles the hydrate-1 lifecycle
  accept + reply, which is exactly what a fully-cached binding produces
  on the wire when McpHydrateSession.onBegin sees every cache.get
  returning non-null and spawns no list streams).
- Delete the three .hydrating script directories and the @ignore'd
  shouldServe*ListHydrating test methods (both runtime and peer ITs).

Runtime: 156 pass / 0 fail / 5 skipped (was 156p / 3f / 8s).
Peer: 15 pass / 0 fail / 0 skipped.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
claude added 7 commits May 17, 2026 06:36
When proxy.cache is configured, the agent's lifecycle BEGIN reply now
waits until the hydrate session has settled every kind permitted by
MCP_HYDRATE_KIND_FILTER. By the time an agent's lifecycle is
"initialized" from its point of view, every list method served by this
binding is in the cache - subsequent tools/list, resources/list,
prompts/list requests get a synchronous cache hit and never race the
hydrate path. Bindings with no cache configured retain the previous
synchronous BEGIN-reply behavior (binding.hydrate == null branch).

Mechanism uses the engine's existing signal infrastructure:

- McpHydrateSession tracks totalKinds (computed from the filter) and
  settledKinds. Each kind settles either from cache.get returning
  non-null at hydrate startup (pre-seeded store) or from the matching
  HydrateListStream's terminal write (cache.put completion, bodyLen==0
  skip-put, or abort/reset). Once settledKinds == totalKinds, the
  session flips the complete latch and fires signaler.signalNow against
  every pending agent stream registered via awaitComplete.

- McpProxyHydrate gains awaitComplete(originId, routedId, streamId,
  traceId, signalId). When complete is already true the signal fires
  immediately (synchronous fast path for the pre-seeded case); otherwise
  the (streamId, signalId) pair is queued. cleanup discards pending
  before tearing the session down.

- McpProxyLifecycleFactory.McpLifecycleServer.onServerBegin now
  transitions initial state and emits WINDOW synchronously, then
  delegates the BEGIN reply via binding.hydrate.awaitComplete using the
  agent's reply-id as the signal target (the engine registers the
  binding's stream consumer in throttles[replyId], so SignalFW for that
  stream id routes back to onServerMessage). A new SignalFW case
  dispatches to onServerSignal, which fires doDeferredServerBegin for
  SIGNAL_HYDRATE_COMPLETE. The deferred path guards on
  replyOpened/replyClosed in case the agent abandoned before complete
  fired, and rebuilds beginEx inside the deferred path so the shared
  codecBuffer is fresh at fire time.

- HydrateListStream now holds a back-reference to its parent
  McpHydrateSession (passed via the constructor) and a settled flag to
  dedupe completion when both onEnd and abort/reset fire on the same
  stream.

McpProxyCacheLifecycleIT.shouldServeInitialize switches to
proxy.cache.seeded.yaml: an empty memory store would block forever
because the hydrate session's three list streams find no cooperating
downstream in cache.hydrate/server. With the pre-seeded test store the
hydrate session sees every cache.get return non-null synchronously, the
complete latch flips before the agent's lifecycle arrives, and the
deferred BEGIN reply takes the synchronous fast path.

Runtime: 156 pass / 0 fail / 5 skipped (was 156p / 3f-then-0f after
gating with the wrong streamId).

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
McpHydrateSession now uses signaler.signalAt to re-issue list-kind
hydrate streams when the per-kind TTL elapses. Scheduling fires:

- After each kind's initial hydrate completes (HydrateListStream
  reaches its terminal: cache.put completion, bodyLen==0 skip-put, or
  abort/reset — same gating dedupe applies, so each instance schedules
  at most one next refresh).
- After each cache.get returns non-null at startup (pre-seeded value
  still refreshes on TTL, so the value stays fresh).
- After each refresh stream's terminal (recursive, same code path).

Each kind has its own signal id (SIGNAL_REFRESH_{TOOLS,RESOURCES,
PROMPTS}) and TTL (McpCacheTtlConfig.tools / .resources / .prompts).
When ttlForKind returns null (no cache.ttl block, or the block omits
that kind), no refresh is scheduled - the cache value stays forever.
On abort/reset of an in-flight refresh, the cache.put never fires, so
the prior cached value is preserved (the test that validates this
behaviour is now un-ignored as shouldRefreshToolsError).

McpHydrateSession.settle takes the kind explicitly and both contributes
to the initial-hydrate gating latch and schedules the next refresh.
HydrateListStream forwards its kind into parent.settle(kind) - no
behavioural change to the gating dedupe.

McpProxyCache{Tools,Resources,Prompts}ListIT.shouldRefresh* and
shouldRefreshToolsError no longer @ignore - now run against
proxy.cache.refresh.yaml (ttl tools=PT1S, resources=PT2S, prompts=PT3S)
via the existing cache.refresh.<kind> / cache.refresh.<kind>.error
spec scripts. Verified: 156 pass / 0 fail / 1 skipped (Contention IT
class-level @ignore for lease coordination remains).

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…ownstream

When options.cache.authorization is configured, McpProxyFactory.attach
resolves the single configured guard via context.supplyGuard and calls
guard.reauthorize(traceId, binding.id, 0L, credentials) once to mint a
session token. The McpHydrateSession and HydrateListStream now stamp
that token onto every outbound BEGIN/END/WINDOW in place of the prior
hardcoded 0L authorization, so any downstream binding that runs a guard
recognises the cache hydrate session as an authenticated principal.

The map is constrained to a single guard entry by the schema patch
(maxProperties: 1 under cache.authorization); the runtime picks that
entry via iterator without defensive assertion, since invalid yaml is
rejected at config-load time.

Token freshness is left to the guard for now - the binding does not
re-reauthorize per BEGIN. If a future guard implementation needs
short-lived tokens, the call site is a single line and can move to
per-stream resolution without changing the API shape.

New test scaffolding:
- specs/.../config/proxy.cache.auth.yaml - engine test_guard +
  test store pre-seeded with empty list payloads (so hydrate spawns
  no list streams; the authorization assertion lands on the lifecycle
  BEGIN alone) + proxy declaring cache.authorization.test_guard with
  matching credentials.
- specs/.../streams/application/cache.hydrate.auth/{client,server}.rpt
  mirrors cache.hydrate/ but adds option zilla:authorization 1L (the
  TestGuard's first reauthorize result) to assert the proxy stamps the
  expected long on its hydrate BEGIN.
- shouldHydrateAuth peer IT in ProxyCacheLifecycleIT and engine-driven
  IT in McpProxyCacheLifecycleIT.

Verified: spec ProxyCache*IT 16 pass / 0 fail; runtime binding-mcp
157 pass / 0 fail / 1 skipped (was 156p/0f/1s; +1 shouldHydrateAuth).
McpProxyCacheContentionIT remains the only @ignore'd test pending
lease coordination.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
When two workers attach the same proxy-cache binding, both would
otherwise race to issue every list call on the wire. Coordinate via
the configured store's putIfAbsent so exactly one worker issues each
call.

Two leases, one shared key namespace on the configured store:

- Binding-wide "lifecycle.lock": acquired before opening the hydrate
  lifecycle stream to downstream. Loser polls every 100ms until the
  lock frees; winner holds it through the initial hydrate. Released
  inside markComplete once every kind permitted by the filter has
  settled, so other workers can then open their own lifecycle
  (independent stream onward) and observe the now-populated cache.
- Per-kind "<kind>.lock": acquired before issuing a HydrateListStream
  (both initial and refresh paths). Loser at initial-hydrate time
  markSettled-s without scheduling a refresh - the lease holder owns
  refresh scheduling because settle (gated by the per-stream settled
  flag) is the only path that calls scheduleRefresh. Released after
  cache.put completes (or after onEnd's skip-put branch, or after an
  abort/reset terminal) and before parent.settle fires.

settle is split into markSettled (gating-latch only) and settle
(markSettled + scheduleRefresh). cache hits at startup, lease losses
at initial hydrate, and lease losses at refresh-time all funnel
through markSettled so they contribute to lifecycle gating without
arming a refresh signal on a worker that didn't actually do the work.

McpProxyCacheContentionIT (was class-level @ignore'd) is now enabled,
configured with MCP_HYDRATE_KIND_FILTER restricting to KIND_TOOLS_LIST
(since cache.refresh.tools.contended/server.rpt only models the tools
exchanges; resources and prompts are not in scope for this scenario).
hydrateSessionId now cycles "hydrate-A"/"hydrate-B" per call so the
two workers' attach-time calls match the script's session ids.

Lease TTLs:
- lifecycle: 30s (safety net for crashes; explicit release covers the
  happy path).
- per-kind: 30s ditto.
- Retry polling for lifecycle: 100ms.

Verified: 157 pass / 0 fail / 0 skipped. Every test that was
@ignore'd at the start of the PR is now active and green.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Type-prefixed inner class naming convention matches the rest of the
file (McpServer, McpClient, McpListServer, McpCacheListServer,
McpLifecycleServer, McpLifecycleClient, McpHydrateSession).

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
TestStoreHandler previously held its own HashMap<String, String>
per-handler, so workers running in the same engine each saw an
isolated copy of the configured entries (putIfAbsent from one worker
was invisible to another). Lease coordination in binding-mcp's cache
hydrate path relies on cross-worker putIfAbsent visibility, so move
the entries map up one level: TestStore now owns a
ConcurrentMap<Long, ConcurrentMap<String, String>> keyed by storeId,
and TestStoreContext.attach hands out a reference to the per-storeId
shared map to each worker's TestStoreHandler.

Seeding via options.entries uses putIfAbsent so the first worker's
attach is the one that loads the map; subsequent attaches see the
existing entries and skip. No TTL behaviour was added — production
binding code releases leases explicitly, and the existing skip-TTL
semantics is sufficient for the cache test scenarios.

binding-mcp switches its three remaining memory-store configs
(proxy.cache.yaml, proxy.cache.refresh.yaml, proxy.cache.toolkit.yaml)
to type: test, matching the dependency hygiene rule that
implementations must not depend on other implementations - the cache
ITs now run entirely against the engine's test-jar store. The
store-memory test dependency drops out of runtime/binding-mcp/pom.xml.

Verified: 157 pass / 0 fail / 0 skipped (all cache ITs, McpServerIT,
McpClientIT, McpProxyIT, McpProxyCacheContentionIT) and the engine
spec IT suite (39 / 0 / 0) — no regression in either project from
the TestStore sharing change.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
# Conflicts:
#	runtime/binding-mcp/src/main/java/io/aklivity/zilla/runtime/binding/mcp/internal/McpConfiguration.java
#	runtime/binding-mcp/src/test/java/io/aklivity/zilla/runtime/binding/mcp/internal/McpConfigurationTest.java
@jfallows jfallows changed the title test(binding-mcp): scaffold mcp · proxy cache option contract (#1737) feat(binding-mcp): mcp · proxy cache option (#1737) May 18, 2026
@jfallows jfallows marked this pull request as ready for review May 18, 2026 01:24
McpProxyFactory had grown to ~566 lines absorbing the proxy-cache
feature; the actual dispatch (parse BEGIN ext, look up by kind, delegate
to a per-kind factory) was a small slice of that. The rest -
McpHydrateSession, McpHydrateListStream, the SIGNAL_INITIATE_HYDRATE /
SIGNAL_REFRESH_* / LEASE_TTL_MS / LEASE_RETRY_MS constants, the
PendingAwait record, the local newStream/doEnd/doWindow helpers, the
cache attach wiring (store resolve, McpListCache instantiate,
cacheGuard reauthorize, McpHydrateSession construct, schedule
SIGNAL_INITIATE_HYDRATE), and the matching detach cleanup - is now
owned by a new sibling McpCacheHydrater.

McpCacheHydrater exposes attach(McpBindingConfig) and
detach(McpBindingConfig) mirroring the BindingHandler lifecycle the
engine drives. McpProxyFactory constructs one in its ctor and delegates
both calls.

McpProxyFactory drops from 795 lines to 136 (a pure dispatcher now):
keeps the factories map and its seven per-kind registrations, the
bindings map, supplyGuard for McpBindingConfig construction,
mcpBeginExRO/beginRO for parsing the inbound BEGIN to extract the
kind, and the new hydrater field.

McpCacheHydrater is 724 lines holding everything cache-hydrate: signal
ids, lease ttls/retry, PendingAwait record, McpHydrateSession (still
implementing McpProxyHydrate so McpProxyLifecycleFactory can defer the
agent BEGIN reply via awaitComplete), McpHydrateListStream (settle /
cache.put / lease release path unchanged), and per-worker flyweights
duplicated for the hydrate stream-write helpers. The hydrate session
keeps its own codecBuffer (separate UnsafeBuffer allocated against
writeBuffer.capacity), matching the per-handler flyweight pattern.

McpBindingConfig gains one accessor: long resolveId(String name) plus
the corresponding private ToLongFunction<String> field initialized
from binding.resolveId. Needed because McpCacheHydrater.attach
receives only the wrapped McpBindingConfig and must resolve the cache
store name to a binding id at attach time. Cleaner than passing the
raw BindingConfig alongside.

Verified: 158 pass / 0 fail / 0 skipped (identical to pre-refactor
baseline). Checkstyle 0 violations. JaCoCo coverage met (bundle grew
116 -> 117 classes for the new McpCacheHydrater).

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Comment on lines +113 to +117
if (binding.options != null && binding.options.cache != null)
{
final long storeId = binding.resolveId(binding.options.cache.store);
final StoreHandler store = supplyStore.apply(storeId);
binding.cache = new McpListCache(store);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this initialization logic should belong in the McpBindingConfig constructor, to initialize cache and then this code block can be gated on if (binding.cache != null) instead.

Comment on lines +150 to +157
private record PendingAwait(
long originId,
long routedId,
long streamId,
long traceId,
int signalId)
{
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private record PendingAwait(
long originId,
long routedId,
long streamId,
long traceId,
int signalId)
{
}
private record McpSignalHandle(
long originId,
long routedId,
long streamId,
long traceId,
int signalId)
{
}

Perhaps have a convenience method to invoke the signaler, passing all the record fields to signaler.signalNow?

Comment on lines +209 to +219
{
if (acquired)
{
doLifecycleBegin(traceId);
}
else
{
signaler.signalAt(currentTimeMillis() + LEASE_RETRY_MS, SIGNAL_INITIATE_HYDRATE,
this::onInitiateSignal);
}
});
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can move to a method called onAcquireLifecycleLeaseComplete, passed to cache.acquireLifecycleLease.

Note: the traceId can be moved into the method body.

{
if (McpState.initialOpening(state))
{
return;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid multiple returns, instead conditional surrounds remainder of body with inverted condition.

Comment on lines +271 to +278
for (int kind : new int[] { KIND_TOOLS_LIST, KIND_RESOURCES_LIST, KIND_PROMPTS_LIST })
{
if (hydrateKindFilter.test(kind))
{
filtered++;
}
}
totalKinds = filtered;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach seems overly complex.
Perhaps it would simplify If we had separate signal handlers for refreshing each different kind, and initially populating the cache is treated as just the first refresh.

Perhaps we can move the signaling to mark complete into the McpListCache instead, and then McpCacheHydrater can simplify to be per kind only.

Comment on lines +454 to +458
{
doEnd(receiver, originId, routedId, initialId, initialSeq, initialAck, initialMax,
traceId, authorization);
state = McpState.closedInitial(state);
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a standard doInitialEnd method.

Address PR #1774 review comments:
- #1 cache instantiation moves into McpBindingConfig ctor (supplyStore
  becomes a third constructor parameter, parallel to supplyGuard).
- #2 PendingAwait -> McpSignalHandle as a standalone package-private
  record with signalVia(Signaler) convenience method; lives in its own
  file in the stream package.
- #3 acquire-lifecycle-lease callback is a named method reference
  rather than an inline lambda; traceId is re-supplied inside the
  method body via supplyTraceId.getAsLong() instead of captured.
- #4 doLifecycleBegin guard inverted to wrap the body, single return
  per method.
- #5 separate signal handlers per kind; initial population is the first
  refresh. The binding-level orchestrator (McpProxyCacheHydrater) owns
  the lifecycle stream and tracks readiness via a populated counter
  rather than a totalKinds/settledKinds pair on a McpHydrateSession.
  Per-kind work moves into McpProxyCacheListHydrater (abstract base)
  with three concrete subclasses (McpProxyCacheToolsListHydrater /
  ResourcesListHydrater / PromptsListHydrater), mirroring the
  McpProxyListFactory hierarchy.
- #6 cleanup's inline doEnd / state update is consolidated into a
  standard doInitialEnd helper on McpProxyCacheHydrater.

Kind and signal id are passed via the abstract base's constructor as
protected final fields rather than abstract methods - only the truly
kind-specific behaviour (injectInitialBeginEx and ttl()) remains
abstract. Each concrete subclass collapses to ~15 lines.

McpProxyHydrate interface and the old McpCacheHydrater are deleted.
McpProxyLifecycleFactory now calls binding.hydrater.register(handle)
directly with an McpSignalHandle instance instead of going through an
interface.

McpListCache stays at its current public surface (get / put /
acquireLease / releaseLease / acquireLifecycleLease /
releaseLifecycleLease) - the readiness tracking lives on the
orchestrator because it already knows the filtered kind set from
hydrateKindFilter at attach time, so no expecting(kind) plumbing was
needed.

Files:
- McpProxyHydrate.java deleted
- McpCacheHydrater.java deleted (replaced)
- McpProxyCacheHydrater.java new - per-binding orchestrator
- McpProxyCacheListHydrater.java new - abstract base + 3 subclasses
- McpSignalHandle.java new
- McpBindingConfig.java - supplyStore ctor param, cache field
  initialised in ctor, hydrate field renamed to hydrater and retyped
- McpProxyFactory.java - per-binding hydrater instantiation in attach
- McpProxyLifecycleFactory.java - binding.hydrater.register instead of
  binding.hydrate.awaitComplete
- McpClientFactory.java - 1-line ctor signature update for the now
  3-arg McpBindingConfig ctor

Verified: 158 pass / 0 fail / 0 skipped, identical to baseline.
Checkstyle 0 violations.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
}
}

final class McpProxyCacheToolsListHydrater extends McpProxyCacheListHydrater
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move these to top level classes.

}
}

final class McpProxyToolsListFactory extends McpProxyListFactory
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move these to top level classes.

}
}

final class McpProxyToolsCallFactory extends McpProxyItemFactory
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move these to top level classes.

claude added 4 commits May 18, 2026 05:27
Address PR #1774 review comments asking that each concrete per-kind
subclass living as a package-private sibling next to its abstract base
be moved to its own top-level file. Three files affected, nine new
files created:

McpProxyItemFactory.java (1229 -> 1124) loses three subclasses:
- McpProxyToolsCallFactory.java
- McpProxyPromptsGetFactory.java
- McpProxyResourcesReadFactory.java

McpProxyListFactory.java (2016 -> 1819) loses three subclasses:
- McpProxyToolsListFactory.java
- McpProxyPromptsListFactory.java
- McpProxyResourcesListFactory.java

McpProxyCacheListHydrater.java (329 -> 257) loses three subclasses:
- McpProxyCacheToolsListHydrater.java
- McpProxyCacheResourcesListHydrater.java
- McpProxyCachePromptsListHydrater.java

Each new file has the standard Aklivity Community License header, a
minimal import set (alphabetical within groups), and contains only the
final class declaration extending the base. Base files trim imports
that were only used by the lifted subclasses (Map and StreamingJson
out of McpProxyListFactory; KIND_TOOLS_LIST / KIND_RESOURCES_LIST /
KIND_PROMPTS_LIST static imports out of McpProxyCacheListHydrater).

No visibility widening was needed - subclasses already touched only
protected abstract hooks on the bases or package-private fields on
parent objects.

Verified: 158 pass / 0 fail / 0 skipped. Checkstyle 0 violations.

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Drop McpCacheTtlConfig and its three per-kind fields (tools, resources,
prompts) in favour of a single Duration on McpCacheConfig. No real-world
signal that the per-kind knobs are needed; can be reintroduced additively
if/when a use case appears. Default lives in the JSON schema (PT5M), with
the adapter mirroring it as fallback when ttl is omitted.
onRefreshAcquireLeaseComplete previously only acted on the lease winner;
losers stopped refreshing forever, leaving the cluster dependent on a
single worker's timer. Mirror the initial-acquire path: losers reschedule
their refresh so every worker stays armed and re-attempts on the next tick.
Replace the (BindingConfig, LongFunction<GuardHandler>,
LongFunction<StoreHandler>) ctor and its 1-arg sugar with a single
(BindingConfig, EngineContext) ctor. McpBindingConfig pulls supplyGuard
and supplyStore from the context directly, and the null-guards on the
suppliers disappear because every caller now passes a real context. The
supplier fields on McpProxyFactory and McpClientFactory go with it -
they only existed to be passed through. Bind options.authorization and
options.cache to locals so the long null-chains collapse.
Comment on lines +98 to +101
public static String hydrateSessionId()
{
return "hydrate-1";
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static String hydrateSessionId()
{
return "hydrate-1";
}
public static String sessionId()
{
return "hydrate-1";
}

Comment on lines +103 to +106
public static IntPredicate hydrateKindFilter()
{
return kind -> kind == KIND_TOOLS_LIST;
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static IntPredicate hydrateKindFilter()
{
return kind -> kind == KIND_TOOLS_LIST;
}
public static IntPredicate hydrateToolsOnly()
{
return kind -> kind == KIND_TOOLS_LIST;
}

public static final String MCP_SSE_KEEPALIVE_INTERVAL_NAME = "zilla.binding.mcp.sse.keepalive.interval";
public static final String MCP_ALT_SVC_ENABLED_NAME = "zilla.binding.mcp.alt.svc.enabled";
public static final String MCP_ALT_SVC_MAX_AGE_NAME = "zilla.binding.mcp.alt.svc.max.age";
public static final String MCP_HYDRATE_KIND_FILTER_NAME = "zilla.binding.mcp.hydrate.kind.filter";
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static final String MCP_HYDRATE_KIND_FILTER_NAME = "zilla.binding.mcp.hydrate.kind.filter";
public static final String MCP_HYDRATE_FILTER_NAME = "zilla.binding.mcp.hydrate.filter";

assertEquals(MCP_SSE_KEEPALIVE_INTERVAL.name(), MCP_SSE_KEEPALIVE_INTERVAL_NAME);
assertEquals(MCP_ALT_SVC_ENABLED.name(), MCP_ALT_SVC_ENABLED_NAME);
assertEquals(MCP_ALT_SVC_MAX_AGE.name(), MCP_ALT_SVC_MAX_AGE_NAME);
assertEquals(MCP_HYDRATE_KIND_FILTER.name(), MCP_HYDRATE_KIND_FILTER_NAME);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assertEquals(MCP_HYDRATE_KIND_FILTER.name(), MCP_HYDRATE_KIND_FILTER_NAME);
assertEquals(MCP_HYDRATE_FILTER.name(), MCP_HYDRATE_FILTER_NAME);

Comment on lines +23 to +25
tools: '{"tools":[{"name": "get_weather","title": "Weather Information Provider","description": "Get current weather information for a location","inputSchema": {"type": "object","properties": {"location": {"type": "string","description": "City name or zip code"}},"required": ["location"]},"icons": [{"src": "https://example.com/weather-icon.png","mimeType": "image/png","sizes": ["48x48"]}],"execution": {"taskSupport": "optional"}}]}'
resources: '{"resources":[{"uri": "file:///docs/welcome.md","name": "welcome","description": "Welcome document","mimeType": "text/markdown"}]}'
prompts: '{"prompts":[{"name": "summarize","description": "Summarize a document"}]}'
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use multi-line yaml syntax to make this more readable.

Copy link
Copy Markdown
Contributor Author

@jfallows jfallows left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See feedback.

- Rename MCP_HYDRATE_KIND_FILTER → MCP_HYDRATE_FILTER; expose lease
  ttl/retry as MCP_LEASE_TTL_MS / MCP_LEASE_RETRY_MS configuration
  properties (phases A, B)
- Split McpListCache into per-kind instances; introduce McpLifecycleCache
  for lifecycle distributed lock; drop kind parameter from cache ops
  (phase C)
- McpBindingConfig owns per-kind caches (toolsCache, resourcesCache,
  promptsCache, lifecycleCache) and the hydrater; Optional chains for
  guard/store resolution; McpProxyFactory attach/detach simplified
  (phases D, G)
- Extract McpHydrateLifecycleStream inner class; guard-then-route
  ordering; complete End/Abort/Reset state gating; all fields final
  (phase E)
- Decouple McpProxyCacheListHydrater from parent reference; pass
  EngineContext + LongSupplier supplyAuthorization directly; rename
  onMessage → onListHydrateMessage (phase F)
- McpProxyLifecycleFactory: rename doDeferredServerBegin →
  doServerBeginDeferred; invert early-return guard (phase H)
- McpProxyItemFactory: rename builder parameter b → builder in abstract
  signatures; update subclasses (phase I)
- McpProxyListFactory: replace abstract kind() with final int kind
  field; make sessionId() abstract; move JsonParserFactory construction
  to base class constructor accepting List<String> pathIncludes;
  remove listItemParserFactory() abstract method (phases J, K)
- IT helpers: extract sessionId() and hydrateXxxOnly() static helpers;
  update configure() calls to method-reference form (phase L)
- proxy.cache.seeded.yaml: convert embedded JSON to YAML block scalars
  (phase M)

https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
public final McpListCache resourcesCache;
public final McpListCache promptsCache;
public final McpLifecycleCache lifecycleCache;
public Map<String, McpProxySession> sessions;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be made final?

McpListCache cache)
{
super(context, originId, routedId, supplyAuthorization, supplySessionId, onReady,
leaseTtlMs, cacheTtl, cache, SIGNAL_REFRESH_RESOURCES);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid passing kind to super and have abstract overrides for each part that needs kind specific awareness.

Comment on lines +81 to +82
MCP_LEASE_TTL_MS = config.property("lease.ttl.ms", Duration.ofSeconds(30).toMillis());
MCP_LEASE_RETRY_MS = config.property("lease.retry.ms", 100L);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change these to remove the _MS suffix and make their type Duration instead, similar to MAX_AGE above.

Comment on lines +55 to +56
public static final String MCP_LEASE_TTL_MS_NAME = "zilla.binding.mcp.lease.ttl.ms";
public static final String MCP_LEASE_RETRY_MS_NAME = "zilla.binding.mcp.lease.retry.ms";
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static final String MCP_LEASE_TTL_MS_NAME = "zilla.binding.mcp.lease.ttl.ms";
public static final String MCP_LEASE_RETRY_MS_NAME = "zilla.binding.mcp.lease.retry.ms";
public static final String MCP_LEASE_TTL_NAME = "zilla.binding.mcp.lease.ttl";
public static final String MCP_LEASE_RETRY_NAME = "zilla.binding.mcp.lease.retry";

Comment on lines +76 to +77
assertEquals(MCP_LEASE_TTL_MS.name(), MCP_LEASE_TTL_MS_NAME);
assertEquals(MCP_LEASE_RETRY_MS.name(), MCP_LEASE_RETRY_MS_NAME);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assertEquals(MCP_LEASE_TTL_MS.name(), MCP_LEASE_TTL_MS_NAME);
assertEquals(MCP_LEASE_RETRY_MS.name(), MCP_LEASE_RETRY_MS_NAME);
assertEquals(MCP_LEASE_TTL.name(), MCP_LEASE_TTL_NAME);
assertEquals(MCP_LEASE_RETRY.name(), MCP_LEASE_RETRY_NAME);

Comment on lines +23 to +24
tools: |-
{"tools":[{"name": "get_weather","title": "Weather Information Provider","description": "Get current weather information for a location","inputSchema": {"type": "object","properties": {"location": {"type": "string","description": "City name or zip code"}},"required": ["location"]},"icons": [{"src": "https://example.com/weather-icon.png","mimeType": "image/png","sizes": ["48x48"]}],"execution": {"taskSupport": "optional"}}]}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tools: |-
{"tools":[{"name": "get_weather","title": "Weather Information Provider","description": "Get current weather information for a location","inputSchema": {"type": "object","properties": {"location": {"type": "string","description": "City name or zip code"}},"required": ["location"]},"icons": [{"src": "https://example.com/weather-icon.png","mimeType": "image/png","sizes": ["48x48"]}],"execution": {"taskSupport": "optional"}}]}
tools: |-
{
"tools":
[
{
"name": "get_weather",
"title": "Weather Information Provider",
"description": "Get current weather information for a location",
"inputSchema":
{
"type": "object",
"properties":
{
"location":
{
"type": "string",
"description": "City name or zip code"
}
},
"required":
[
"location"
]
},
"icons":
[
{
"src": "https://example.com/weather-icon.png",
"mimeType": "image/png",
"sizes": ["48x48"]
}
],
"execution":
{
"taskSupport": "optional"
}
}
]
}

Perhaps we can use this if it folds to JSON with minimal whitespace.

Comment on lines +25 to +28
resources: |-
{"resources":[{"uri": "file:///docs/welcome.md","name": "welcome","description": "Welcome document","mimeType": "text/markdown"}]}
prompts: |-
{"prompts":[{"name": "summarize","description": "Summarize a document"}]}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest similar formatting to above feedback for these if it works out for tools.

- Rename MCP_LEASE_TTL_MS/RETRY_MS to MCP_LEASE_TTL/RETRY as Duration
  properties with ISO-8601 defaults (PT30S, PT0.1S)
- Extract list hydrate stream into inner class McpListHydrateStream with
  ExpandableArrayBuffer body accumulator
- Add abstract signalId() override on each list hydrater subclass instead
  of passing signalId as constructor parameter
- Replace String prefix with String8FW in McpRoutePrefix to eliminate
  per-stream byte allocation in list factory
- Remove kind() abstract method from item/list factories; pass kind as
  constructor parameter and store as final field
- Move sessions map initialization into McpBindingConfig constructor

#1774
Comment on lines +227 to +229
final long authorization = binding.cacheGuard != null
? binding.cacheGuard.reauthorize(traceId, originId, 0L, binding.cacheCredentials)
: 0L;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the McpLifecycleCache is 1-1 with the McpBindingConfig, then does it make sense to move the cache guard and cache credentials onto the cache, as guard and credentials?
If we scope things correctly, then it seems like the McpProxyCacheHydrater would only need the McpLifecycleCache and not the binding, agree?

final McpRouteConfig route = binding.resolve(authorization);
if (route != null)
{
binding.lifecycleCache.acquireLifecycle(leaseTtl.toMillis(),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest updating the acquireXxx methods to take duration, then convert toMillis() as needed in the acquireXxx implementation.

Comment on lines +249 to +250
signaler.signalAt(currentTimeMillis() + leaseRetry.toMillis(), SIGNAL_INITIATE_LIFECYCLE,
this::onInitiateLifecycle);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
signaler.signalAt(currentTimeMillis() + leaseRetry.toMillis(), SIGNAL_INITIATE_LIFECYCLE,
this::onInitiateLifecycle);
signaler.signalAt(Instant.now().plus(leaseRetry).toEpochMillis(), SIGNAL_INITIATE_LIFECYCLE,
this::onInitiateLifecycle);

Let's also add an overload of Signaler.signalAt that takes an Instant instead of long, and fills in default method via Instant.toEpochMillis().

this.initialId = supplyInitialId.applyAsLong(routedId);
this.replyId = supplyReplyId.applyAsLong(initialId);
this.replyMax = bufferPool.slotCapacity();
doLifecycleBegin(traceId);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should not be called from the constuctor.
The creator should explicitly call doLifecycleBegin post construction at the constructor call site.

}
}

private void onInitialAcquireLeaseComplete(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private void onInitialAcquireLeaseComplete(
private void onInitialAcquireComplete(

cache.acquire(leaseTtl.toMillis(), this::onRefreshAcquireLeaseComplete);
}

private void onRefreshAcquireLeaseComplete(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private void onRefreshAcquireLeaseComplete(
private void onRefreshAcquireComplete(

Comment on lines +216 to +228
final McpBeginExFW beginEx = mcpBeginExRW
.wrap(codecBuffer, 0, codecBuffer.capacity())
.typeId(mcpTypeId)
.inject(builder -> injectInitialBeginEx(builder, sessionId))
.build();

receiver = newStream(this::onListHydrateMessage, originId, routedId, initialId,
initialSeq, initialAck, initialMax, traceId, authorization, 0L, beginEx);
state = McpState.openingInitial(state);

doEnd(receiver, originId, routedId, initialId,
initialSeq, initialAck, initialMax, traceId, authorization);
state = McpState.closedInitial(state);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid doing this behavioral work in the constructor.
Instead have a doListHydrateBegin(...) and a doListHydrateEnd(...) and call them from the call site of the constructor.

Comment on lines +250 to +251
state = McpState.closedReply(state);
terminal(supplyTraceId.getAsLong());
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a proper reaction to onListHydrateAbort -> doListHydrateAbort, and onListHydrateReset -> doListHydrateReset with appropriate streamId, ie. initialId or replyId, and gated by McpState.initial/replyClosed.

jfallows pushed a commit that referenced this pull request May 19, 2026
- McpListCache.acquire and McpLifecycleCache.acquireLifecycle now take
  Duration instead of long milliseconds; conversion happens internally
- Drop "Lease" from on*AcquireComplete callback names on list hydrater
- Move cacheGuard and cacheCredentials onto McpLifecycleCache as
  guard/credentials fields; McpBindingConfig no longer exposes them
- Split lifecycle and list hydrate stream constructors from their
  behavioral work; creators explicitly call doListHydrateBegin/End and
  doLifecycleBegin post-construction
- Add proper onListHydrateAbort/Reset handlers with do* helpers and
  McpState.initialClosed/replyClosed gating
- Use Signaler.signalAt(Instant) overloads with Instant.now().plus(...)
  for lease retry and cache refresh scheduling

#1774
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

binding-mcp: add cache option to mcp · proxy binding

2 participants