Skip to content

feat: embedder SDK hardening — 14 scoped additions for higher-level framework authors #68

@ezynda3

Description

@ezynda3

Feature Description

This is a coordinated Embedder SDK Hardening request — a set of 14 small, individually-justified additions and fixes that together unblock building higher-level agent frameworks on top of pkg/kit. Each item is scoped, has a proposed API, and can be implemented additively without breaking the existing public surface. Together they're sized to land as a single PR (most items are <50 LOC; a handful are interface extensions that need default-implementation shims).

The catalogue below is grouped into three tiers by impact. Tier 1 items are currently forcing embedders to either drop functionality or maintain non-trivial workarounds; Tier 2 items are simplifications that remove ~100–300 LOC of embedder-side boilerplate apiece; Tier 3 items are papercuts but cheap.

One-PR feasibility: items #1, #5, #7, #9, #10, #11, #12, #14 are pure additive (new fields / new exported symbols / interface methods with default no-op implementations on shipped managers). #2, #3, #4, #6, #8, #13 add new types/functions alongside the existing surface. Nothing requires breaking an existing signature.


Tier 1 — currently forces workarounds or feature drops

1. configPath package-global is written without a mutex

Location: internal/config/config.go:584

var configPath string

// SetConfigPath sets the configuration file path for resolving relative paths
func SetConfigPath(path string) {
    configPath = path
}

Call site: pkg/kit/config.go:175 (loadConfigWithEnvSubstitutionconfig.SetConfigPath), reached from initConfigkit.New whenever a .kit.yml is discovered in $PWD or $HOME.

Repro: two concurrent kit.New() calls in a process that has a .kit.yml in $HOME race on this write. go test -race catches it immediately:

WARNING: DATA RACE
Write at 0x000000... by goroutine N:
  github.com/mark3labs/kit/internal/config.SetConfigPath
      .../internal/config/config.go:590
  github.com/mark3labs/kit/pkg/kit.loadConfigWithEnvSubstitution
      .../pkg/kit/config.go:175
  github.com/mark3labs/kit/pkg/kit.(*Kit).initConfig
  github.com/mark3labs/kit/pkg/kit.New

The viper.New() per-instance fix from #40 isolated the viper store but left this residual global behind because the v0.79.0 isolation tests set SkipConfig: true, never exercising this path.

Proposed fix: move configPath onto a per-*viper.Viper-keyed sync.Map (or, simpler, just guard the global with a sync.Mutex):

// internal/config/config.go
var (
    configPathMu sync.RWMutex
    configPath   string
)

func SetConfigPath(path string) {
    configPathMu.Lock()
    defer configPathMu.Unlock()
    configPath = path
}

func GetConfigPath() string {
    configPathMu.RLock()
    defer configPathMu.RUnlock()
    return configPath
}

…and apply the same RLock to every existing read site of configPath.


2. No pluggable Provider injection point — tests must run an HTTP loopback

Location: pkg/kit/kit.go:812models.CreateProvider is the only path to construct a model backend, and it always spins up a real HTTP client.

providerResult, err := models.CreateProvider(ctx, config)

grep -rn 'WithProvider\b\|InProcessProvider\|FakeProvider\|TestProvider' pkg/kit/ returns nothing. There's no seam for an embedder to inject a synchronous in-process responder.

Cost: every embedder writing tests against kit.Kit stands up an httptest.Server that speaks the OpenAI/Anthropic protocol they're targeting. That's 200–400 LOC of harness per project, every test pays a TCP round-trip + JSON encode/decode, and there's no structural guarantee that a KIT_NO_NETWORK=1-style flag is honoured.

Proposed fix: add a Provider option:

// pkg/kit/options.go

// Provider is a synchronous in-process replacement for the HTTP-backed
// providers that kit constructs from a model string. When non-nil, kit
// skips models.CreateProvider entirely and routes every turn through fn.
//
// Provider takes precedence over Model resolution. Intended primarily for
// tests and offline harnesses; production embedders should prefer the
// model-string path.
type ProviderFn func(ctx context.Context, req ProviderRequest) (ProviderResponse, error)

type ProviderRequest struct {
    Messages    []LLMMessage
    Tools       []Tool
    System      string
    MaxTokens   int
    Streaming   bool
    Temperature *float64
    // …whatever the existing internal provider request carries
}

type ProviderResponse struct {
    Messages   []LLMMessage // assistant + tool messages in order
    StopReason string
    Usage      *LLMUsage
    // Optional: streamed deltas, if Streaming was true and the caller
    // wants to assert delta order.
    Deltas []StreamDelta
}

// In Options:
type Options struct {
    // … existing fields …

    // Provider, when non-nil, replaces the model-string-driven provider
    // construction. The Model field is ignored.
    Provider ProviderFn
}

Then in kit.go where models.CreateProvider is called, branch on m.opts.Provider != nil and wrap it in an adapter that satisfies the same fantasy.LanguageModel interface the rest of the loop uses.


3. PromptOptions only carries SystemMessage — no per-call model/tool/thinking overrides

Location: pkg/kit/kit.go:2626-2649

type PromptOptions struct {
    // SystemMessage is prepended as a system message before the user prompt.
    SystemMessage string
}

func (m *Kit) PromptWithOptions(ctx context.Context, msg string, opts PromptOptions) (string, error) {
    // …
}

Any embedder that wants session.Prompt(ctx, text, WithModel(...), WithTools(...), WithThinking(...)) semantics must rebuild a fresh *Kit per call — the whole kit.New()viper.New() → provider validation → Close() cycle, on every prompt. For test-driven harnesses where the actual model call is cheap (in-process fake), this setup-teardown dominates total runtime by 10–100×.

Proposed fix: extend PromptOptions with the override fields that already exist as Options construction-time settings, and have runTurn apply them as scoped overrides:

type PromptOptions struct {
    SystemMessage string

    // Model overrides the agent's configured model for this call only.
    // Empty string means "use the agent's default".
    Model string

    // Tools, when non-nil, replaces the agent's core tool set for this
    // call. Use ExtraTools to add tools without replacing the core set.
    Tools []Tool

    // ExtraTools are added to the effective tool set for this call only.
    ExtraTools []Tool

    // ThinkingLevel overrides the agent's reasoning level for this call.
    // Empty string means "use the agent's default".
    ThinkingLevel string

    // ProviderURL / ProviderAPIKey override provider credentials for this
    // call only — useful for multi-tenant embedders that resolve creds
    // per request.
    ProviderURL    string
    ProviderAPIKey string
}

// Add a streaming-aware variant that returns the full turn:
func (m *Kit) PromptResultWithOptions(ctx context.Context, msg string, opts PromptOptions) (*TurnResult, error)

Internally, runTurn already takes the provider/tools/system as locals; the change is to thread the override through the same pre-call resolution path.


4. PromptResult() semantics in streaming mode are unclear

Location: pkg/kit/kit.go:2650-2657

// PromptResult sends a message and returns the full turn result including
// usage statistics and conversation messages.
func (m *Kit) PromptResult(ctx context.Context, message string) (*TurnResult, error) {
    return m.runTurn(ctx, message, message, []fantasy.Message{
        fantasy.NewUserMessage(message),
    })
}

When Options.Streaming == true (the default), the only way to observe the streamed deltas is via OnMessageUpdate / OnReasoningDelta hooks. The doc-comment doesn't specify whether PromptResult blocks until end-of-stream or returns when the first chunk arrives, and the returned TurnResult.Messages doesn't carry the deltas. Embedders therefore can't write a deterministic test that asserts "stream emitted chunks A, B, C in order, then final text was 'ABC'" without re-implementing the delta-collection in their own OnMessageUpdate handler.

Proposed fix: clarify the contract and add a captured-delta slice to TurnResult:

type TurnResult struct {
    // … existing fields …

    // Stream, when streaming was enabled for this turn, contains every
    // delta event observed during the turn in emit order. Empty when
    // Options.Streaming was disabled.
    Stream []StreamEvent
}

type StreamEvent struct {
    Kind     StreamEventKind // text_delta | reasoning_start | reasoning_delta | reasoning_end | tool_call_chunk
    Index    int             // step index this delta belongs to
    Text     string          // for *_delta kinds
    Reasoning string         // for reasoning_*
    ToolName  string         // for tool_call_chunk
    ToolID    string
    Args      string         // for tool_call_chunk (accumulating JSON)
}

type StreamEventKind string

const (
    StreamEventTextDelta      StreamEventKind = "text_delta"
    StreamEventReasoningStart StreamEventKind = "reasoning_start"
    StreamEventReasoningDelta StreamEventKind = "reasoning_delta"
    StreamEventReasoningEnd   StreamEventKind = "reasoning_end"
    StreamEventToolCallChunk  StreamEventKind = "tool_call_chunk"
)

…and explicitly document that PromptResult blocks until end-of-turn regardless of streaming mode.


Tier 2 — meaningfully simplify embedder code

5. ToolOutput has no way to signal "halt the agent loop and return this value"

Location: pkg/kit/tools.go:24-44

type ToolOutput struct {
    Content   string
    IsError   bool
    Data      []byte
    MediaType string
}

Embedders building "structured result extraction" patterns (model is asked to call a finish(...) tool with a typed argument, the agent loop terminates, and the typed value is returned to the caller) currently smuggle the captured value out via a sync.Mutex-guarded side-channel because ToolOutput only round-trips a Content string back to the model.

Proposed fix: add an opaque carrier + a halt signal:

type ToolOutput struct {
    Content   string
    IsError   bool
    Data      []byte
    MediaType string

    // FinalValue, when non-nil and Halt is true, is propagated to the
    // turn's TurnResult.FinalValue and the agent loop terminates without
    // submitting Content back to the model.
    FinalValue any

    // Halt, when true, terminates the agent loop after this tool call.
    // The model is not given a chance to respond to the tool result.
    Halt bool
}

And on TurnResult:

type TurnResult struct {
    // … existing fields …

    // FinalValue is set when a tool returned ToolOutput with Halt=true.
    // The dynamic type is whatever the tool handler put in
    // ToolOutput.FinalValue.
    FinalValue any

    // HaltedByTool is the name of the tool that returned Halt=true, or
    // empty if the turn ended for any other reason.
    HaltedByTool string
}

6. No structured provider error taxonomy — wrap with sentinels

Location: grep 'var Err' pkg/kit/ returns only ErrMCPNoToken. Provider errors propagate up as fmt.Errorf("provider %q: %w", name, err) chains where err is whatever the underlying SDK returned — usually a string-formatted HTTP error.

Embedders that want to react to "context window exceeded → auto-compact" or "rate-limited → backoff" or "auth failed → re-prompt for key" must strings.Contains over the error text, which is fragile and provider-specific.

Proposed fix: export sentinels at the kit package level and have every provider wrap with them:

// pkg/kit/errors.go (new file)
package kit

import "errors"

// Provider-error sentinels. Providers wrap these via fmt.Errorf("%w: %s", …)
// so embedders can classify via errors.Is.
var (
    // ErrContextOverflow indicates the request exceeded the model's
    // maximum context window. Embedders typically respond by compacting
    // and retrying.
    ErrContextOverflow = errors.New("context window exceeded")

    // ErrRateLimit indicates the provider throttled the request.
    // Embedders typically respond by backing off and retrying.
    ErrRateLimit = errors.New("rate limited by provider")

    // ErrAuth indicates a credential / authorization failure.
    ErrAuth = errors.New("provider authentication failed")

    // ErrProviderUnavailable indicates a transient provider/upstream
    // failure (5xx, network error, timeout).
    ErrProviderUnavailable = errors.New("provider unavailable")

    // ErrInvalidRequest indicates the request was structurally invalid
    // and retrying will not help.
    ErrInvalidRequest = errors.New("invalid request to provider")
)

Then audit internal/providers/* and internal/models/providers.go to wrap responses by status code:

// pseudo-code in the openai/anthropic provider wrappers
switch resp.StatusCode {
case 401, 403:
    return fmt.Errorf("%w: %s", kit.ErrAuth, body)
case 429:
    return fmt.Errorf("%w: %s", kit.ErrRateLimit, body)
case 400:
    if strings.Contains(body, "context_length_exceeded") {
        return fmt.Errorf("%w: %s", kit.ErrContextOverflow, body)
    }
    return fmt.Errorf("%w: %s", kit.ErrInvalidRequest, body)
case 500, 502, 503, 504:
    return fmt.Errorf("%w: %s", kit.ErrProviderUnavailable, body)
}

Once landed, embedders do errors.Is(err, kit.ErrContextOverflow) and lose the string-matching.


7. fantasy.Usage has fixed buckets — no Extra for provider-specific tokens

Location: pkg/kit/types.go:276kit.LLMUsage = fantasy.Usage (type alias).

// charm.land/fantasy@v0.32.0/model.go
type Usage struct {
    InputTokens         int64
    OutputTokens        int64
    TotalTokens         int64
    ReasoningTokens     int64
    CacheCreationTokens int64
    CacheReadTokens     int64
}

When a new provider exposes a new bucket (audio tokens, image tokens, web-search tokens, etc.), embedders mapping kit.LLMUsage into their own typed Usage silently drop the new bucket because the field doesn't exist on fantasy.Usage.

Proposed fix: add an Extra map[string]int64 to fantasy.Usage upstream (or, if that's a separate repo, add kit.LLMUsage as a kit-owned struct that embeds fantasy.Usage plus the extra map). Then have every provider populate the extras it knows about:

type LLMUsage struct {
    fantasy.Usage
    // Extra carries provider-specific token buckets that don't fit the
    // fixed fields on fantasy.Usage. Keys are provider-prefixed
    // (e.g. "anthropic.cache_creation_input_tokens",
    // "openai.audio_tokens") to avoid collisions.
    Extra map[string]int64
}

(The upstream-fantasy route is cleaner if you control that repo; the kit-side wrapper is the back-compat-safe path if you don't.)


8. NewTool[T any] requires a Go-typed T — no untyped schema+handler form

Location: pkg/kit/tools.go:139

func NewTool[TInput any](name, description string, fn func(ctx context.Context, input TInput) (ToolOutput, error)) Tool {
    // …
}

Embedders that source tools dynamically (skill markdown files, MCP server tool catalogs, user-supplied JSON-schema definitions) don't have a Go type at registration time — they have a JSON Schema + a func(ctx, map[string]any) (any, error). To register such a tool today, the embedder must implement the underlying fantasy.AgentTool interface by hand (~140 LOC of boilerplate per project: Info(), Run(ctx, LLMToolCall) (LLMToolResponse, error), ProviderOptions(), SetProviderOptions(...)).

Proposed fix: add an untyped constructor:

// NewRawTool is the schema-driven counterpart to NewTool. Use it when the
// tool's input shape isn't known at compile time (e.g. tools loaded from
// JSON Schema definitions in skill files or MCP server catalogs).
//
// schema must be a valid JSON Schema describing the tool's input object.
// fn receives the decoded JSON arguments as a map and returns a ToolOutput.
// Schema validation is performed before fn is invoked; invalid inputs
// short-circuit with an error ToolResponse.
func NewRawTool(
    name, description string,
    schema map[string]any,
    fn func(ctx context.Context, args map[string]any) (ToolOutput, error),
) Tool

Internally this can share most of the implementation with NewTool[T]: decode LLMToolCall.Input (already a JSON string) into map[string]any, validate against schema, invoke fn, wrap the ToolOutput via the existing toolOutputToResponse helper.


9. CompactionEvent has no failure-path variant

Location: pkg/kit/events.go:373-385 and pkg/kit/compaction.go:270-280

// CompactionEvent fires after a successful compaction.
type CompactionEvent struct {
    Summary         string
    OriginalTokens  int
    CompactedTokens int
    MessagesRemoved int
    ReadFiles       []string
    ModifiedFiles   []string
}

The "fires after a successful compaction" doc-comment is accurate — no event fires when compaction fails. Embedders wiring lifecycle telemetry around Kit.Compact (start timestamp, end timestamp, success/failure, error message) have to hand-roll start/end timing because OnCompaction won't fire on the error path.

Proposed fix: either add an Err field to CompactionEvent and always emit (success or failure), or add a separate CompactionFailedEvent:

// Option A — additive on CompactionEvent (preferred, since OnCompaction
// already exists):
type CompactionEvent struct {
    Summary         string
    OriginalTokens  int
    CompactedTokens int
    MessagesRemoved int
    ReadFiles       []string
    ModifiedFiles   []string

    // Err is non-nil when compaction failed. On the failure path,
    // Summary / *Tokens / *Files are zero-valued.
    Err error
}

…and have compactInternal emit a CompactionEvent{Err: err} on the failure return paths.


10. SessionManager.AppendBranchSummary is missing

Location: pkg/kit/sessions.go:223-233

func (m *Kit) CollapseBranch(fromID, toID, summary string) error {
    if m.session == nil {
        return fmt.Errorf("no session available")
    }
    // Note: This operation is not directly supported by SessionManager interface
    // as it requires AppendBranchSummary which is TreeManager-specific.
    // For custom SessionManagers, this would need to be implemented differently.
    // For now, we try to use the underlying TreeManager if available.
    if adapter, ok := m.session.(*treeManagerAdapter); ok {
        _, err := adapter.inner.AppendBranchSummary(fromID, summary)
        return err
    }
    return fmt.Errorf("CollapseBranch not supported by custom session manager")
}

Embedders implementing a custom SessionManager (e.g. backed by a relational DB or a remote store) can't support Kit.CollapseBranch because the underlying primitive isn't on the interface.

Proposed fix: promote AppendBranchSummary to SessionManager:

// pkg/kit/session.go
type SessionManager interface {
    // … existing methods …

    // AppendBranchSummary creates a summary entry that replaces the range
    // [fromID, leaf] on the current branch with a single summary. Returns
    // the new entry ID. Managers that do not track branch summaries may
    // return ErrBranchSummaryNotSupported.
    AppendBranchSummary(fromID, summary string) (entryID string, err error)
}

var ErrBranchSummaryNotSupported = errors.New("session manager does not support branch summaries")

Update treeManagerAdapter (which already has the method on the inner manager) and any other shipped manager. Remove the type-assertion fallback in Kit.CollapseBranch.


Tier 3 — papercuts

11. Kit.Close() doesn't take a context

Location: pkg/kit/kit.go:2790

func (m *Kit) Close() error

Graceful shutdown with a deadline isn't possible.

Proposed fix: add CloseContext(ctx context.Context) error alongside Close(). Keep Close() as return m.CloseContext(context.Background()) for back-compat.


12. No kit.ResetForTesting() under a build tag

Location: grep -rn 'ResetForTesting' pkg/kit/ returns nothing.

Package-global state (the configPath global in #1, the embedded models registry cache, the global viper if anyone still touches it) survives across tests in the same binary. With v0.79.0's per-instance viper store this is mostly cosmetic, but the leftover configPath global is exactly the kind of process-residual that a ResetForTesting() would scrub.

Proposed fix:

// pkg/kit/export_test.go OR pkg/kit/testing.go behind a build tag:
//go:build testing

package kit

// ResetForTesting clears all package-global state. Intended for use in
// test-binary teardown / between-test cleanup. Safe to call concurrently
// with no in-flight kit.New() calls.
func ResetForTesting() {
    config.SetConfigPath("")
    // …any other globals that get stamped during kit.New
}

13. Kit.Subagent always does a full kit.New() per spawn — add Kit.Fork

Location: pkg/kit/kit.go:1936-2060

Kit.Subagent already inherits provider config and MCP task options via inheritProviderConfig(childOpts, m.v) and inheritMCPTaskOptions(childOpts, m.opts), but the child still goes through full New(ctx, childOpts) — new config store, re-validate provider credentials, possibly re-load the models DB. For embedders that spawn many short-lived sub-tasks, this dominates per-task cost.

Proposed fix: add a lighter-weight Kit.Fork:

// ForkConfig configures a forked Kit instance for a short-lived sub-task.
// Unlike Subagent, Fork reuses the parent's provider client, session
// manager (if not overridden), event bus, and MCP connections — only the
// system prompt and tool set are scoped to the fork.
//
// The returned Kit must be Close()d to release the fork-local resources.
// Closing the fork does not affect the parent.
type ForkConfig struct {
    SystemPrompt string
    Tools        []Tool
    ExtraTools   []Tool
    // Model is optional; nil/empty means "inherit from parent".
    Model string
    // SessionManager, if non-nil, scopes the fork to a child session.
    // nil means "share the parent's session" (the fork's appends go to
    // the parent's branch).
    SessionManager SessionManager
}

func (m *Kit) Fork(ctx context.Context, cfg ForkConfig) (*Kit, error)

Internally Fork constructs a *Kit that shares the parent's *viper.Viper, the parent's provider client, and the parent's event bus, only re-allocating what the fork-config actually changes.


14. LoadSkillsFromFS(fs.FS) overload missing

Location: pkg/kit/skills.go:26-46

func LoadSkill(path string) (*Skill, error)
func LoadSkillsFromDir(dir string) ([]*Skill, error)
func LoadSkills(cwd string) ([]*Skill, error)

All three loaders are path-string-based, so embedders that wrap skill discovery in an fs.FS abstraction (for embed.FS distribution, fstest.MapFS tests, or per-tenant virtual filesystems) can't use the upstream loaders directly.

Proposed fix: add an fs.FS-typed overload:

// LoadSkillsFromFS is the fs.FS-typed counterpart of LoadSkillsFromDir.
// It walks fsys starting at root (which may be "." or a subdirectory),
// finds *.md files, parses YAML frontmatter + markdown body, and returns
// the loaded skills.
func LoadSkillsFromFS(fsys fs.FS, root string) ([]*Skill, error)

Implementation is mechanical — share the parsing helper with the existing path-based loader, just substitute fs.WalkDir(fsys, root, …) and fs.ReadFile(fsys, p) for filepath.WalkDir and os.ReadFile.


Motivation / Use Case

Building higher-level agent frameworks on top of pkg/kit repeatedly runs into the same set of paper cuts. The patterns are:

  1. Tests — every embedder writes a 200-400 LOC OpenAI-compatible httptest.Server because there's no Provider injection seam (fix: ToolRenderConfig BorderColor and Background fields are ignored #2). Per-test setup time is dominated by kit.New() rebuild because there's no per-call override path (AddMCPServer() creates MCPToolManager without inheriting OAuth AuthHandler and TokenStoreFactory #3). Streaming-mode tests can't deterministically assert delta order (OAuth fails for remote MCP servers without dynamic client registration (e.g. GitHub) — no way to provide ClientID #4). The configPath global makes concurrent tests data-race (Add support for overriding model registry and capabilities #1) — SkipConfig: true is a workaround but doesn't help anyone who legitimately wants .kit.yml loading.
  2. Tools — schema-driven tool registration (skills, MCP catalogs, user-supplied tools) requires hand-rolling the fantasy.AgentTool interface because NewTool[T] is Go-typed (fix: error calling tool #8). Structured-result patterns (model calls finish(typed_arg) and the loop terminates) need a side-channel because ToolOutput has no FinalValue/Halt (feat: open external $EDITOR for composing long prompts via ctrl+x e chord #5).
  3. Errors & telemetry — provider errors arrive as opaque wrapped strings, so context-overflow auto-compaction logic ends up doing strings.Contains over wrapped messages (feat: highlight @file tokens in input with accent color #6). Failed compactions emit no event, so symmetric lifecycle telemetry has to hand-roll timing (OpenAI login does not become the default provider, and starting with --model openai/... fails until model is reselected in /model #9). Usage maps silently drop new provider buckets (feat(sdk): expose generation and provider params on Options #7).
  4. Session-tree primitivesCollapseBranch doesn't work with custom session managers because AppendBranchSummary isn't on the interface (Shift+Enter does not insert a newline even though the UI hint says it should #10).
  5. Skills & sub-agents — skill loaders are path-string-only, not fs.FS-typed (Text selection in the TUI is awkward and makes copying partial output frustrating #14). Subagent still pays full kit.New() cost per spawn (Ctrl+C exits the app instead of clearing the current input #13).
  6. CleanupClose() can't be bounded by a context (/thinking incorrectly reports that GPT-5.4 does not support thinking/reasoning #11); no ResetForTesting() helper (TUI feels visually noisy: user label, large prompt placeholder, and input help take too much space #12).

None of these individually is a blocker, but collectively they push 800–1200 LOC of adapter/workaround code into every embedder, and three of them (#1, #2, #4) force test architectures that are fundamentally slower and less deterministic than they need to be.

Proposed Implementation

See the per-gap "Proposed fix" sections above. All 14 items are designed to be additive — no existing exported signature changes. The interface extensions (#10) ship with default implementations on all shipped managers (treeManagerAdapter already has it; in-memory / no-op managers return ErrBranchSummaryNotSupported). The Options and PromptOptions extensions (#2, #3) are new fields on existing structs (zero-valued = current behavior).

A coding agent implementing this in a single PR would tackle the items in roughly this order:

  1. Mechanical / pure-additive first (Add support for overriding model registry and capabilities #1 mutex, feat: open external $EDITOR for composing long prompts via ctrl+x e chord #5 ToolOutput fields, feat(sdk): expose generation and provider params on Options #7 Usage Extra, OpenAI login does not become the default provider, and starting with --model openai/... fails until model is reselected in /model #9 CompactionEvent.Err, /thinking incorrectly reports that GPT-5.4 does not support thinking/reasoning #11 CloseContext, TUI feels visually noisy: user label, large prompt placeholder, and input help take too much space #12 ResetForTesting, Text selection in the TUI is awkward and makes copying partial output frustrating #14 LoadSkillsFromFS) — each is <30 LOC and has no dependencies on the others.
  2. Interface extension with shim (Shift+Enter does not insert a newline even though the UI hint says it should #10 AppendBranchSummary).
  3. New exported constructors / functions (fix: error calling tool #8 NewRawTool).
  4. Provider-error sentinels (feat: highlight @file tokens in input with accent color #6) — touches multiple provider adapters under internal/providers/*.
  5. PromptOptions extension (AddMCPServer() creates MCPToolManager without inheriting OAuth AuthHandler and TokenStoreFactory #3) — extends an existing options struct and threads through runTurn.
  6. Provider injection (fix: ToolRenderConfig BorderColor and Background fields are ignored #2) — adds a new Options field and a branch in the provider-construction path. Largest single item but still ~200 LOC.
  7. Streaming TurnResult fields (OAuth fails for remote MCP servers without dynamic client registration (e.g. GitHub) — no way to provide ClientID #4) — adds Stream []StreamEvent to TurnResult, populates it in the streaming code path, clarifies the doc-comment.
  8. Kit.Fork (Ctrl+C exits the app instead of clearing the current input #13) — new method; shares parent's viper/provider/event-bus references.

Tests for each item are colocated: extend pkg/kit/*_test.go with a focused subtest per gap. The data-race for #1 is testable via kit_test.go running two New() calls under t.Parallel() (with -race). The Provider injection (#2) lets every other test in the binary stop spinning up httptest.Servers, which should be a measurable wall-clock improvement on go test ./pkg/kit/....

Alternatives Considered

  • I've considered workarounds or alternative approaches

For each Tier-1 item, embedder-side workarounds exist and are in production use, but they're either fragile (string-matching error messages), slow (HTTP loopback for every test turn), or correctness-compromising (process-global races mitigated only by SkipConfig: true, which bypasses legitimate .kit.yml loading). The Tier-2/3 items are pure boilerplate-reduction.

Checklist

  • I've searched existing issues and this hasn't been requested yet
  • This feature aligns with Kit's design philosophy (TUI-first, extension-based) — every item makes Kit more usable as an embeddable SDK without altering the TUI/extension surface.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions