You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a coordinated Embedder SDK Hardening request — a set of 14 small, individually-justified additions and fixes that together unblock building higher-level agent frameworks on top of pkg/kit. Each item is scoped, has a proposed API, and can be implemented additively without breaking the existing public surface. Together they're sized to land as a single PR (most items are <50 LOC; a handful are interface extensions that need default-implementation shims).
The catalogue below is grouped into three tiers by impact. Tier 1 items are currently forcing embedders to either drop functionality or maintain non-trivial workarounds; Tier 2 items are simplifications that remove ~100–300 LOC of embedder-side boilerplate apiece; Tier 3 items are papercuts but cheap.
One-PR feasibility: items #1, #5, #7, #9, #10, #11, #12, #14 are pure additive (new fields / new exported symbols / interface methods with default no-op implementations on shipped managers). #2, #3, #4, #6, #8, #13 add new types/functions alongside the existing surface. Nothing requires breaking an existing signature.
Tier 1 — currently forces workarounds or feature drops
1. configPath package-global is written without a mutex
Location:internal/config/config.go:584
varconfigPathstring// SetConfigPath sets the configuration file path for resolving relative pathsfuncSetConfigPath(pathstring) {
configPath=path
}
Call site:pkg/kit/config.go:175 (loadConfigWithEnvSubstitution → config.SetConfigPath), reached from initConfig → kit.New whenever a .kit.yml is discovered in $PWD or $HOME.
Repro: two concurrent kit.New() calls in a process that has a .kit.yml in $HOME race on this write. go test -race catches it immediately:
WARNING: DATA RACE
Write at 0x000000... by goroutine N:
github.com/mark3labs/kit/internal/config.SetConfigPath
.../internal/config/config.go:590
github.com/mark3labs/kit/pkg/kit.loadConfigWithEnvSubstitution
.../pkg/kit/config.go:175
github.com/mark3labs/kit/pkg/kit.(*Kit).initConfig
github.com/mark3labs/kit/pkg/kit.New
The viper.New() per-instance fix from #40 isolated the viper store but left this residual global behind because the v0.79.0 isolation tests set SkipConfig: true, never exercising this path.
Proposed fix: move configPath onto a per-*viper.Viper-keyed sync.Map (or, simpler, just guard the global with a sync.Mutex):
grep -rn 'WithProvider\b\|InProcessProvider\|FakeProvider\|TestProvider' pkg/kit/ returns nothing. There's no seam for an embedder to inject a synchronous in-process responder.
Cost: every embedder writing tests against kit.Kit stands up an httptest.Server that speaks the OpenAI/Anthropic protocol they're targeting. That's 200–400 LOC of harness per project, every test pays a TCP round-trip + JSON encode/decode, and there's no structural guarantee that a KIT_NO_NETWORK=1-style flag is honoured.
Proposed fix: add a Provider option:
// pkg/kit/options.go// Provider is a synchronous in-process replacement for the HTTP-backed// providers that kit constructs from a model string. When non-nil, kit// skips models.CreateProvider entirely and routes every turn through fn.//// Provider takes precedence over Model resolution. Intended primarily for// tests and offline harnesses; production embedders should prefer the// model-string path.typeProviderFnfunc(ctx context.Context, reqProviderRequest) (ProviderResponse, error)
typeProviderRequeststruct {
Messages []LLMMessageTools []ToolSystemstringMaxTokensintStreamingboolTemperature*float64// …whatever the existing internal provider request carries
}
typeProviderResponsestruct {
Messages []LLMMessage// assistant + tool messages in orderStopReasonstringUsage*LLMUsage// Optional: streamed deltas, if Streaming was true and the caller// wants to assert delta order.Deltas []StreamDelta
}
// In Options:typeOptionsstruct {
// … existing fields …// Provider, when non-nil, replaces the model-string-driven provider// construction. The Model field is ignored.ProviderProviderFn
}
Then in kit.go where models.CreateProvider is called, branch on m.opts.Provider != nil and wrap it in an adapter that satisfies the same fantasy.LanguageModel interface the rest of the loop uses.
3. PromptOptions only carries SystemMessage — no per-call model/tool/thinking overrides
Location:pkg/kit/kit.go:2626-2649
typePromptOptionsstruct {
// SystemMessage is prepended as a system message before the user prompt.SystemMessagestring
}
func (m*Kit) PromptWithOptions(ctx context.Context, msgstring, optsPromptOptions) (string, error) {
// …
}
Any embedder that wants session.Prompt(ctx, text, WithModel(...), WithTools(...), WithThinking(...)) semantics must rebuild a fresh *Kit per call — the whole kit.New() → viper.New() → provider validation → Close() cycle, on every prompt. For test-driven harnesses where the actual model call is cheap (in-process fake), this setup-teardown dominates total runtime by 10–100×.
Proposed fix: extend PromptOptions with the override fields that already exist as Options construction-time settings, and have runTurn apply them as scoped overrides:
typePromptOptionsstruct {
SystemMessagestring// Model overrides the agent's configured model for this call only.// Empty string means "use the agent's default".Modelstring// Tools, when non-nil, replaces the agent's core tool set for this// call. Use ExtraTools to add tools without replacing the core set.Tools []Tool// ExtraTools are added to the effective tool set for this call only.ExtraTools []Tool// ThinkingLevel overrides the agent's reasoning level for this call.// Empty string means "use the agent's default".ThinkingLevelstring// ProviderURL / ProviderAPIKey override provider credentials for this// call only — useful for multi-tenant embedders that resolve creds// per request.ProviderURLstringProviderAPIKeystring
}
// Add a streaming-aware variant that returns the full turn:func (m*Kit) PromptResultWithOptions(ctx context.Context, msgstring, optsPromptOptions) (*TurnResult, error)
Internally, runTurn already takes the provider/tools/system as locals; the change is to thread the override through the same pre-call resolution path.
4. PromptResult() semantics in streaming mode are unclear
Location:pkg/kit/kit.go:2650-2657
// PromptResult sends a message and returns the full turn result including// usage statistics and conversation messages.func (m*Kit) PromptResult(ctx context.Context, messagestring) (*TurnResult, error) {
returnm.runTurn(ctx, message, message, []fantasy.Message{
fantasy.NewUserMessage(message),
})
}
When Options.Streaming == true (the default), the only way to observe the streamed deltas is via OnMessageUpdate / OnReasoningDelta hooks. The doc-comment doesn't specify whether PromptResult blocks until end-of-stream or returns when the first chunk arrives, and the returned TurnResult.Messages doesn't carry the deltas. Embedders therefore can't write a deterministic test that asserts "stream emitted chunks A, B, C in order, then final text was 'ABC'" without re-implementing the delta-collection in their own OnMessageUpdate handler.
Proposed fix: clarify the contract and add a captured-delta slice to TurnResult:
typeTurnResultstruct {
// … existing fields …// Stream, when streaming was enabled for this turn, contains every// delta event observed during the turn in emit order. Empty when// Options.Streaming was disabled.Stream []StreamEvent
}
typeStreamEventstruct {
KindStreamEventKind// text_delta | reasoning_start | reasoning_delta | reasoning_end | tool_call_chunkIndexint// step index this delta belongs toTextstring// for *_delta kindsReasoningstring// for reasoning_*ToolNamestring// for tool_call_chunkToolIDstringArgsstring// for tool_call_chunk (accumulating JSON)
}
typeStreamEventKindstringconst (
StreamEventTextDeltaStreamEventKind="text_delta"StreamEventReasoningStartStreamEventKind="reasoning_start"StreamEventReasoningDeltaStreamEventKind="reasoning_delta"StreamEventReasoningEndStreamEventKind="reasoning_end"StreamEventToolCallChunkStreamEventKind="tool_call_chunk"
)
…and explicitly document that PromptResult blocks until end-of-turn regardless of streaming mode.
Tier 2 — meaningfully simplify embedder code
5. ToolOutput has no way to signal "halt the agent loop and return this value"
Embedders building "structured result extraction" patterns (model is asked to call a finish(...) tool with a typed argument, the agent loop terminates, and the typed value is returned to the caller) currently smuggle the captured value out via a sync.Mutex-guarded side-channel because ToolOutput only round-trips a Content string back to the model.
Proposed fix: add an opaque carrier + a halt signal:
typeToolOutputstruct {
ContentstringIsErrorboolData []byteMediaTypestring// FinalValue, when non-nil and Halt is true, is propagated to the// turn's TurnResult.FinalValue and the agent loop terminates without// submitting Content back to the model.FinalValueany// Halt, when true, terminates the agent loop after this tool call.// The model is not given a chance to respond to the tool result.Haltbool
}
And on TurnResult:
typeTurnResultstruct {
// … existing fields …// FinalValue is set when a tool returned ToolOutput with Halt=true.// The dynamic type is whatever the tool handler put in// ToolOutput.FinalValue.FinalValueany// HaltedByTool is the name of the tool that returned Halt=true, or// empty if the turn ended for any other reason.HaltedByToolstring
}
6. No structured provider error taxonomy — wrap with sentinels
Location:grep 'var Err' pkg/kit/ returns only ErrMCPNoToken. Provider errors propagate up as fmt.Errorf("provider %q: %w", name, err) chains where err is whatever the underlying SDK returned — usually a string-formatted HTTP error.
Embedders that want to react to "context window exceeded → auto-compact" or "rate-limited → backoff" or "auth failed → re-prompt for key" must strings.Contains over the error text, which is fragile and provider-specific.
Proposed fix: export sentinels at the kit package level and have every provider wrap with them:
// pkg/kit/errors.go (new file)package kit
import"errors"// Provider-error sentinels. Providers wrap these via fmt.Errorf("%w: %s", …)// so embedders can classify via errors.Is.var (
// ErrContextOverflow indicates the request exceeded the model's// maximum context window. Embedders typically respond by compacting// and retrying.ErrContextOverflow=errors.New("context window exceeded")
// ErrRateLimit indicates the provider throttled the request.// Embedders typically respond by backing off and retrying.ErrRateLimit=errors.New("rate limited by provider")
// ErrAuth indicates a credential / authorization failure.ErrAuth=errors.New("provider authentication failed")
// ErrProviderUnavailable indicates a transient provider/upstream// failure (5xx, network error, timeout).ErrProviderUnavailable=errors.New("provider unavailable")
// ErrInvalidRequest indicates the request was structurally invalid// and retrying will not help.ErrInvalidRequest=errors.New("invalid request to provider")
)
Then audit internal/providers/* and internal/models/providers.go to wrap responses by status code:
When a new provider exposes a new bucket (audio tokens, image tokens, web-search tokens, etc.), embedders mapping kit.LLMUsage into their own typed Usage silently drop the new bucket because the field doesn't exist on fantasy.Usage.
Proposed fix: add an Extra map[string]int64 to fantasy.Usage upstream (or, if that's a separate repo, add kit.LLMUsage as a kit-owned struct that embeds fantasy.Usage plus the extra map). Then have every provider populate the extras it knows about:
typeLLMUsagestruct {
fantasy.Usage// Extra carries provider-specific token buckets that don't fit the// fixed fields on fantasy.Usage. Keys are provider-prefixed// (e.g. "anthropic.cache_creation_input_tokens",// "openai.audio_tokens") to avoid collisions.Extramap[string]int64
}
(The upstream-fantasy route is cleaner if you control that repo; the kit-side wrapper is the back-compat-safe path if you don't.)
8. NewTool[T any] requires a Go-typed T — no untyped schema+handler form
Embedders that source tools dynamically (skill markdown files, MCP server tool catalogs, user-supplied JSON-schema definitions) don't have a Go type at registration time — they have a JSON Schema + a func(ctx, map[string]any) (any, error). To register such a tool today, the embedder must implement the underlying fantasy.AgentTool interface by hand (~140 LOC of boilerplate per project: Info(), Run(ctx, LLMToolCall) (LLMToolResponse, error), ProviderOptions(), SetProviderOptions(...)).
Proposed fix: add an untyped constructor:
// NewRawTool is the schema-driven counterpart to NewTool. Use it when the// tool's input shape isn't known at compile time (e.g. tools loaded from// JSON Schema definitions in skill files or MCP server catalogs).//// schema must be a valid JSON Schema describing the tool's input object.// fn receives the decoded JSON arguments as a map and returns a ToolOutput.// Schema validation is performed before fn is invoked; invalid inputs// short-circuit with an error ToolResponse.funcNewRawTool(
name, descriptionstring,
schemamap[string]any,
fnfunc(ctx context.Context, argsmap[string]any) (ToolOutput, error),
) Tool
Internally this can share most of the implementation with NewTool[T]: decode LLMToolCall.Input (already a JSON string) into map[string]any, validate against schema, invoke fn, wrap the ToolOutput via the existing toolOutputToResponse helper.
9. CompactionEvent has no failure-path variant
Location:pkg/kit/events.go:373-385 and pkg/kit/compaction.go:270-280
// CompactionEvent fires after a successful compaction.typeCompactionEventstruct {
SummarystringOriginalTokensintCompactedTokensintMessagesRemovedintReadFiles []stringModifiedFiles []string
}
The "fires after a successful compaction" doc-comment is accurate — no event fires when compaction fails. Embedders wiring lifecycle telemetry around Kit.Compact (start timestamp, end timestamp, success/failure, error message) have to hand-roll start/end timing because OnCompaction won't fire on the error path.
Proposed fix: either add an Err field to CompactionEvent and always emit (success or failure), or add a separate CompactionFailedEvent:
// Option A — additive on CompactionEvent (preferred, since OnCompaction// already exists):typeCompactionEventstruct {
SummarystringOriginalTokensintCompactedTokensintMessagesRemovedintReadFiles []stringModifiedFiles []string// Err is non-nil when compaction failed. On the failure path,// Summary / *Tokens / *Files are zero-valued.Errerror
}
…and have compactInternal emit a CompactionEvent{Err: err} on the failure return paths.
10. SessionManager.AppendBranchSummary is missing
Location:pkg/kit/sessions.go:223-233
func (m*Kit) CollapseBranch(fromID, toID, summarystring) error {
ifm.session==nil {
returnfmt.Errorf("no session available")
}
// Note: This operation is not directly supported by SessionManager interface// as it requires AppendBranchSummary which is TreeManager-specific.// For custom SessionManagers, this would need to be implemented differently.// For now, we try to use the underlying TreeManager if available.ifadapter, ok:=m.session.(*treeManagerAdapter); ok {
_, err:=adapter.inner.AppendBranchSummary(fromID, summary)
returnerr
}
returnfmt.Errorf("CollapseBranch not supported by custom session manager")
}
Embedders implementing a custom SessionManager (e.g. backed by a relational DB or a remote store) can't support Kit.CollapseBranch because the underlying primitive isn't on the interface.
Proposed fix: promote AppendBranchSummary to SessionManager:
// pkg/kit/session.gotypeSessionManagerinterface {
// … existing methods …// AppendBranchSummary creates a summary entry that replaces the range// [fromID, leaf] on the current branch with a single summary. Returns// the new entry ID. Managers that do not track branch summaries may// return ErrBranchSummaryNotSupported.AppendBranchSummary(fromID, summarystring) (entryIDstring, errerror)
}
varErrBranchSummaryNotSupported=errors.New("session manager does not support branch summaries")
Update treeManagerAdapter (which already has the method on the inner manager) and any other shipped manager. Remove the type-assertion fallback in Kit.CollapseBranch.
Tier 3 — papercuts
11. Kit.Close() doesn't take a context
Location:pkg/kit/kit.go:2790
func (m*Kit) Close() error
Graceful shutdown with a deadline isn't possible.
Proposed fix: add CloseContext(ctx context.Context) error alongside Close(). Keep Close() as return m.CloseContext(context.Background()) for back-compat.
Package-global state (the configPath global in #1, the embedded models registry cache, the global viper if anyone still touches it) survives across tests in the same binary. With v0.79.0's per-instance viper store this is mostly cosmetic, but the leftover configPath global is exactly the kind of process-residual that a ResetForTesting() would scrub.
Proposed fix:
// pkg/kit/export_test.go OR pkg/kit/testing.go behind a build tag://go:build testingpackage kit
// ResetForTesting clears all package-global state. Intended for use in// test-binary teardown / between-test cleanup. Safe to call concurrently// with no in-flight kit.New() calls.funcResetForTesting() {
config.SetConfigPath("")
// …any other globals that get stamped during kit.New
}
13. Kit.Subagent always does a full kit.New() per spawn — add Kit.Fork
Location:pkg/kit/kit.go:1936-2060
Kit.Subagent already inherits provider config and MCP task options via inheritProviderConfig(childOpts, m.v) and inheritMCPTaskOptions(childOpts, m.opts), but the child still goes through full New(ctx, childOpts) — new config store, re-validate provider credentials, possibly re-load the models DB. For embedders that spawn many short-lived sub-tasks, this dominates per-task cost.
Proposed fix: add a lighter-weight Kit.Fork:
// ForkConfig configures a forked Kit instance for a short-lived sub-task.// Unlike Subagent, Fork reuses the parent's provider client, session// manager (if not overridden), event bus, and MCP connections — only the// system prompt and tool set are scoped to the fork.//// The returned Kit must be Close()d to release the fork-local resources.// Closing the fork does not affect the parent.typeForkConfigstruct {
SystemPromptstringTools []ToolExtraTools []Tool// Model is optional; nil/empty means "inherit from parent".Modelstring// SessionManager, if non-nil, scopes the fork to a child session.// nil means "share the parent's session" (the fork's appends go to// the parent's branch).SessionManagerSessionManager
}
func (m*Kit) Fork(ctx context.Context, cfgForkConfig) (*Kit, error)
Internally Fork constructs a *Kit that shares the parent's *viper.Viper, the parent's provider client, and the parent's event bus, only re-allocating what the fork-config actually changes.
All three loaders are path-string-based, so embedders that wrap skill discovery in an fs.FS abstraction (for embed.FS distribution, fstest.MapFS tests, or per-tenant virtual filesystems) can't use the upstream loaders directly.
Proposed fix: add an fs.FS-typed overload:
// LoadSkillsFromFS is the fs.FS-typed counterpart of LoadSkillsFromDir.// It walks fsys starting at root (which may be "." or a subdirectory),// finds *.md files, parses YAML frontmatter + markdown body, and returns// the loaded skills.funcLoadSkillsFromFS(fsys fs.FS, rootstring) ([]*Skill, error)
Implementation is mechanical — share the parsing helper with the existing path-based loader, just substitute fs.WalkDir(fsys, root, …) and fs.ReadFile(fsys, p) for filepath.WalkDir and os.ReadFile.
Motivation / Use Case
Building higher-level agent frameworks on top of pkg/kit repeatedly runs into the same set of paper cuts. The patterns are:
Tools — schema-driven tool registration (skills, MCP catalogs, user-supplied tools) requires hand-rolling the fantasy.AgentTool interface because NewTool[T] is Go-typed (fix: error calling tool #8). Structured-result patterns (model calls finish(typed_arg) and the loop terminates) need a side-channel because ToolOutput has no FinalValue/Halt (feat: open external $EDITOR for composing long prompts via ctrl+x e chord #5).
None of these individually is a blocker, but collectively they push 800–1200 LOC of adapter/workaround code into every embedder, and three of them (#1, #2, #4) force test architectures that are fundamentally slower and less deterministic than they need to be.
Proposed Implementation
See the per-gap "Proposed fix" sections above. All 14 items are designed to be additive — no existing exported signature changes. The interface extensions (#10) ship with default implementations on all shipped managers (treeManagerAdapter already has it; in-memory / no-op managers return ErrBranchSummaryNotSupported). The Options and PromptOptions extensions (#2, #3) are new fields on existing structs (zero-valued = current behavior).
A coding agent implementing this in a single PR would tackle the items in roughly this order:
Tests for each item are colocated: extend pkg/kit/*_test.go with a focused subtest per gap. The data-race for #1 is testable via kit_test.go running two New() calls under t.Parallel() (with -race). The Provider injection (#2) lets every other test in the binary stop spinning up httptest.Servers, which should be a measurable wall-clock improvement on go test ./pkg/kit/....
Alternatives Considered
I've considered workarounds or alternative approaches
For each Tier-1 item, embedder-side workarounds exist and are in production use, but they're either fragile (string-matching error messages), slow (HTTP loopback for every test turn), or correctness-compromising (process-global races mitigated only by SkipConfig: true, which bypasses legitimate .kit.yml loading). The Tier-2/3 items are pure boilerplate-reduction.
Checklist
I've searched existing issues and this hasn't been requested yet
This feature aligns with Kit's design philosophy (TUI-first, extension-based) — every item makes Kit more usable as an embeddable SDK without altering the TUI/extension surface.
Feature Description
This is a coordinated Embedder SDK Hardening request — a set of 14 small, individually-justified additions and fixes that together unblock building higher-level agent frameworks on top of
pkg/kit. Each item is scoped, has a proposed API, and can be implemented additively without breaking the existing public surface. Together they're sized to land as a single PR (most items are <50 LOC; a handful are interface extensions that need default-implementation shims).The catalogue below is grouped into three tiers by impact. Tier 1 items are currently forcing embedders to either drop functionality or maintain non-trivial workarounds; Tier 2 items are simplifications that remove ~100–300 LOC of embedder-side boilerplate apiece; Tier 3 items are papercuts but cheap.
Tier 1 — currently forces workarounds or feature drops
1.
configPathpackage-global is written without a mutexLocation:
internal/config/config.go:584Call site:
pkg/kit/config.go:175(loadConfigWithEnvSubstitution→config.SetConfigPath), reached frominitConfig→kit.Newwhenever a.kit.ymlis discovered in$PWDor$HOME.Repro: two concurrent
kit.New()calls in a process that has a.kit.ymlin$HOMErace on this write.go test -racecatches it immediately:The
viper.New()per-instance fix from #40 isolated the viper store but left this residual global behind because the v0.79.0 isolation tests setSkipConfig: true, never exercising this path.Proposed fix: move
configPathonto a per-*viper.Viper-keyed sync.Map (or, simpler, just guard the global with async.Mutex):…and apply the same
RLockto every existing read site ofconfigPath.2. No pluggable
Providerinjection point — tests must run an HTTP loopbackLocation:
pkg/kit/kit.go:812—models.CreateProvideris the only path to construct a model backend, and it always spins up a real HTTP client.grep -rn 'WithProvider\b\|InProcessProvider\|FakeProvider\|TestProvider' pkg/kit/returns nothing. There's no seam for an embedder to inject a synchronous in-process responder.Cost: every embedder writing tests against
kit.Kitstands up anhttptest.Serverthat speaks the OpenAI/Anthropic protocol they're targeting. That's 200–400 LOC of harness per project, every test pays a TCP round-trip + JSON encode/decode, and there's no structural guarantee that aKIT_NO_NETWORK=1-style flag is honoured.Proposed fix: add a
Provideroption:Then in
kit.gowheremodels.CreateProvideris called, branch onm.opts.Provider != niland wrap it in an adapter that satisfies the samefantasy.LanguageModelinterface the rest of the loop uses.3.
PromptOptionsonly carriesSystemMessage— no per-call model/tool/thinking overridesLocation:
pkg/kit/kit.go:2626-2649Any embedder that wants
session.Prompt(ctx, text, WithModel(...), WithTools(...), WithThinking(...))semantics must rebuild a fresh*Kitper call — the wholekit.New()→viper.New()→ provider validation →Close()cycle, on every prompt. For test-driven harnesses where the actual model call is cheap (in-process fake), this setup-teardown dominates total runtime by 10–100×.Proposed fix: extend
PromptOptionswith the override fields that already exist asOptionsconstruction-time settings, and haverunTurnapply them as scoped overrides:Internally,
runTurnalready takes the provider/tools/system as locals; the change is to thread the override through the same pre-call resolution path.4.
PromptResult()semantics in streaming mode are unclearLocation:
pkg/kit/kit.go:2650-2657When
Options.Streaming == true(the default), the only way to observe the streamed deltas is viaOnMessageUpdate/OnReasoningDeltahooks. The doc-comment doesn't specify whetherPromptResultblocks until end-of-stream or returns when the first chunk arrives, and the returnedTurnResult.Messagesdoesn't carry the deltas. Embedders therefore can't write a deterministic test that asserts "stream emitted chunks A, B, C in order, then final text was 'ABC'" without re-implementing the delta-collection in their ownOnMessageUpdatehandler.Proposed fix: clarify the contract and add a captured-delta slice to
TurnResult:…and explicitly document that
PromptResultblocks until end-of-turn regardless of streaming mode.Tier 2 — meaningfully simplify embedder code
5.
ToolOutputhas no way to signal "halt the agent loop and return this value"Location:
pkg/kit/tools.go:24-44Embedders building "structured result extraction" patterns (model is asked to call a
finish(...)tool with a typed argument, the agent loop terminates, and the typed value is returned to the caller) currently smuggle the captured value out via async.Mutex-guarded side-channel becauseToolOutputonly round-trips aContentstring back to the model.Proposed fix: add an opaque carrier + a halt signal:
And on
TurnResult:6. No structured provider error taxonomy — wrap with sentinels
Location:
grep 'var Err' pkg/kit/returns onlyErrMCPNoToken. Provider errors propagate up asfmt.Errorf("provider %q: %w", name, err)chains whereerris whatever the underlying SDK returned — usually a string-formatted HTTP error.Embedders that want to react to "context window exceeded → auto-compact" or "rate-limited → backoff" or "auth failed → re-prompt for key" must
strings.Containsover the error text, which is fragile and provider-specific.Proposed fix: export sentinels at the kit package level and have every provider wrap with them:
Then audit
internal/providers/*andinternal/models/providers.goto wrap responses by status code:Once landed, embedders do
errors.Is(err, kit.ErrContextOverflow)and lose the string-matching.7.
fantasy.Usagehas fixed buckets — noExtrafor provider-specific tokensLocation:
pkg/kit/types.go:276—kit.LLMUsage = fantasy.Usage(type alias).When a new provider exposes a new bucket (audio tokens, image tokens, web-search tokens, etc.), embedders mapping
kit.LLMUsageinto their own typedUsagesilently drop the new bucket because the field doesn't exist onfantasy.Usage.Proposed fix: add an
Extra map[string]int64tofantasy.Usageupstream (or, if that's a separate repo, addkit.LLMUsageas a kit-owned struct that embedsfantasy.Usageplus the extra map). Then have every provider populate the extras it knows about:(The upstream-
fantasyroute is cleaner if you control that repo; the kit-side wrapper is the back-compat-safe path if you don't.)8.
NewTool[T any]requires a Go-typedT— no untyped schema+handler formLocation:
pkg/kit/tools.go:139Embedders that source tools dynamically (skill markdown files, MCP server tool catalogs, user-supplied JSON-schema definitions) don't have a Go type at registration time — they have a JSON Schema + a
func(ctx, map[string]any) (any, error). To register such a tool today, the embedder must implement the underlyingfantasy.AgentToolinterface by hand (~140 LOC of boilerplate per project:Info(),Run(ctx, LLMToolCall) (LLMToolResponse, error),ProviderOptions(),SetProviderOptions(...)).Proposed fix: add an untyped constructor:
Internally this can share most of the implementation with
NewTool[T]: decodeLLMToolCall.Input(already a JSON string) intomap[string]any, validate againstschema, invokefn, wrap theToolOutputvia the existingtoolOutputToResponsehelper.9.
CompactionEventhas no failure-path variantLocation:
pkg/kit/events.go:373-385andpkg/kit/compaction.go:270-280The "fires after a successful compaction" doc-comment is accurate — no event fires when compaction fails. Embedders wiring lifecycle telemetry around
Kit.Compact(start timestamp, end timestamp, success/failure, error message) have to hand-roll start/end timing becauseOnCompactionwon't fire on the error path.Proposed fix: either add an
Errfield toCompactionEventand always emit (success or failure), or add a separateCompactionFailedEvent:…and have
compactInternalemit aCompactionEvent{Err: err}on the failure return paths.10.
SessionManager.AppendBranchSummaryis missingLocation:
pkg/kit/sessions.go:223-233Embedders implementing a custom
SessionManager(e.g. backed by a relational DB or a remote store) can't supportKit.CollapseBranchbecause the underlying primitive isn't on the interface.Proposed fix: promote
AppendBranchSummarytoSessionManager:Update
treeManagerAdapter(which already has the method on the inner manager) and any other shipped manager. Remove the type-assertion fallback inKit.CollapseBranch.Tier 3 — papercuts
11.
Kit.Close()doesn't take a contextLocation:
pkg/kit/kit.go:2790Graceful shutdown with a deadline isn't possible.
Proposed fix: add
CloseContext(ctx context.Context) erroralongsideClose(). KeepClose()asreturn m.CloseContext(context.Background())for back-compat.12. No
kit.ResetForTesting()under a build tagLocation:
grep -rn 'ResetForTesting' pkg/kit/returns nothing.Package-global state (the
configPathglobal in #1, the embedded models registry cache, the global viper if anyone still touches it) survives across tests in the same binary. With v0.79.0's per-instance viper store this is mostly cosmetic, but the leftoverconfigPathglobal is exactly the kind of process-residual that aResetForTesting()would scrub.Proposed fix:
13.
Kit.Subagentalways does a fullkit.New()per spawn — addKit.ForkLocation:
pkg/kit/kit.go:1936-2060Kit.Subagentalready inherits provider config and MCP task options viainheritProviderConfig(childOpts, m.v)andinheritMCPTaskOptions(childOpts, m.opts), but the child still goes through fullNew(ctx, childOpts)— new config store, re-validate provider credentials, possibly re-load the models DB. For embedders that spawn many short-lived sub-tasks, this dominates per-task cost.Proposed fix: add a lighter-weight
Kit.Fork:Internally
Forkconstructs a*Kitthat shares the parent's*viper.Viper, the parent's provider client, and the parent's event bus, only re-allocating what the fork-config actually changes.14.
LoadSkillsFromFS(fs.FS)overload missingLocation:
pkg/kit/skills.go:26-46All three loaders are path-string-based, so embedders that wrap skill discovery in an
fs.FSabstraction (forembed.FSdistribution,fstest.MapFStests, or per-tenant virtual filesystems) can't use the upstream loaders directly.Proposed fix: add an
fs.FS-typed overload:Implementation is mechanical — share the parsing helper with the existing path-based loader, just substitute
fs.WalkDir(fsys, root, …)andfs.ReadFile(fsys, p)forfilepath.WalkDirandos.ReadFile.Motivation / Use Case
Building higher-level agent frameworks on top of
pkg/kitrepeatedly runs into the same set of paper cuts. The patterns are:httptest.Serverbecause there's noProviderinjection seam (fix: ToolRenderConfig BorderColor and Background fields are ignored #2). Per-test setup time is dominated bykit.New()rebuild because there's no per-call override path (AddMCPServer() creates MCPToolManager without inheriting OAuth AuthHandler and TokenStoreFactory #3). Streaming-mode tests can't deterministically assert delta order (OAuth fails for remote MCP servers without dynamic client registration (e.g. GitHub) — no way to provide ClientID #4). TheconfigPathglobal makes concurrent tests data-race (Add support for overriding model registry and capabilities #1) —SkipConfig: trueis a workaround but doesn't help anyone who legitimately wants.kit.ymlloading.fantasy.AgentToolinterface becauseNewTool[T]is Go-typed (fix: error calling tool #8). Structured-result patterns (model callsfinish(typed_arg)and the loop terminates) need a side-channel becauseToolOutputhas noFinalValue/Halt(feat: open external $EDITOR for composing long prompts via ctrl+x e chord #5).strings.Containsover wrapped messages (feat: highlight @file tokens in input with accent color #6). Failed compactions emit no event, so symmetric lifecycle telemetry has to hand-roll timing (OpenAI login does not become the default provider, and starting with --model openai/... fails until model is reselected in /model #9). Usage maps silently drop new provider buckets (feat(sdk): expose generation and provider params on Options #7).CollapseBranchdoesn't work with custom session managers becauseAppendBranchSummaryisn't on the interface (Shift+Enter does not insert a newline even though the UI hint says it should #10).fs.FS-typed (Text selection in the TUI is awkward and makes copying partial output frustrating #14).Subagentstill pays fullkit.New()cost per spawn (Ctrl+C exits the app instead of clearing the current input #13).Close()can't be bounded by a context (/thinking incorrectly reports that GPT-5.4 does not support thinking/reasoning #11); noResetForTesting()helper (TUI feels visually noisy: user label, large prompt placeholder, and input help take too much space #12).None of these individually is a blocker, but collectively they push 800–1200 LOC of adapter/workaround code into every embedder, and three of them (#1, #2, #4) force test architectures that are fundamentally slower and less deterministic than they need to be.
Proposed Implementation
See the per-gap "Proposed fix" sections above. All 14 items are designed to be additive — no existing exported signature changes. The interface extensions (#10) ship with default implementations on all shipped managers (
treeManagerAdapteralready has it; in-memory / no-op managers returnErrBranchSummaryNotSupported). TheOptionsandPromptOptionsextensions (#2, #3) are new fields on existing structs (zero-valued = current behavior).A coding agent implementing this in a single PR would tackle the items in roughly this order:
internal/providers/*.runTurn.Stream []StreamEventtoTurnResult, populates it in the streaming code path, clarifies the doc-comment.Kit.Fork(Ctrl+C exits the app instead of clearing the current input #13) — new method; shares parent's viper/provider/event-bus references.Tests for each item are colocated: extend
pkg/kit/*_test.gowith a focused subtest per gap. The data-race for #1 is testable viakit_test.gorunning twoNew()calls undert.Parallel()(with-race). The Provider injection (#2) lets every other test in the binary stop spinning uphttptest.Servers, which should be a measurable wall-clock improvement ongo test ./pkg/kit/....Alternatives Considered
For each Tier-1 item, embedder-side workarounds exist and are in production use, but they're either fragile (string-matching error messages), slow (HTTP loopback for every test turn), or correctness-compromising (process-global races mitigated only by
SkipConfig: true, which bypasses legitimate.kit.ymlloading). The Tier-2/3 items are pure boilerplate-reduction.Checklist