feat: embedder SDK hardening — 14 scoped additions for higher-level framework authors

### Feature Description

This is a coordinated **Embedder SDK Hardening** request — a set of 14 small, individually-justified additions and fixes that together unblock building higher-level agent frameworks on top of `pkg/kit`. Each item is scoped, has a proposed API, and can be implemented additively without breaking the existing public surface. Together they're sized to land as a single PR (most items are <50 LOC; a handful are interface extensions that need default-implementation shims).

The catalogue below is grouped into three tiers by impact. Tier 1 items are currently forcing embedders to either drop functionality or maintain non-trivial workarounds; Tier 2 items are simplifications that remove ~100–300 LOC of embedder-side boilerplate apiece; Tier 3 items are papercuts but cheap.

> **One-PR feasibility:** items #1, #5, #7, #9, #10, #11, #12, #14 are pure additive (new fields / new exported symbols / interface methods with default no-op implementations on shipped managers). #2, #3, #4, #6, #8, #13 add new types/functions alongside the existing surface. Nothing requires breaking an existing signature.

---

### Tier 1 — currently forces workarounds or feature drops

#### 1. `configPath` package-global is written without a mutex

**Location:** `internal/config/config.go:584`

```go
var configPath string

// SetConfigPath sets the configuration file path for resolving relative paths
func SetConfigPath(path string) {
    configPath = path
}
```

**Call site:** `pkg/kit/config.go:175` (`loadConfigWithEnvSubstitution` → `config.SetConfigPath`), reached from `initConfig` → `kit.New` whenever a `.kit.yml` is discovered in `$PWD` or `$HOME`.

**Repro:** two concurrent `kit.New()` calls in a process that has a `.kit.yml` in `$HOME` race on this write. `go test -race` catches it immediately:

```
WARNING: DATA RACE
Write at 0x000000... by goroutine N:
  github.com/mark3labs/kit/internal/config.SetConfigPath
      .../internal/config/config.go:590
  github.com/mark3labs/kit/pkg/kit.loadConfigWithEnvSubstitution
      .../pkg/kit/config.go:175
  github.com/mark3labs/kit/pkg/kit.(*Kit).initConfig
  github.com/mark3labs/kit/pkg/kit.New
```

The `viper.New()` per-instance fix from #40 isolated the viper store but left this residual global behind because the v0.79.0 isolation tests set `SkipConfig: true`, never exercising this path.

**Proposed fix:** move `configPath` onto a per-`*viper.Viper`-keyed sync.Map (or, simpler, just guard the global with a `sync.Mutex`):

```go
// internal/config/config.go
var (
    configPathMu sync.RWMutex
    configPath   string
)

func SetConfigPath(path string) {
    configPathMu.Lock()
    defer configPathMu.Unlock()
    configPath = path
}

func GetConfigPath() string {
    configPathMu.RLock()
    defer configPathMu.RUnlock()
    return configPath
}
```

…and apply the same `RLock` to every existing read site of `configPath`.

---

#### 2. No pluggable `Provider` injection point — tests must run an HTTP loopback

**Location:** `pkg/kit/kit.go:812` — `models.CreateProvider` is the only path to construct a model backend, and it always spins up a real HTTP client.

```go
providerResult, err := models.CreateProvider(ctx, config)
```

`grep -rn 'WithProvider\b\|InProcessProvider\|FakeProvider\|TestProvider' pkg/kit/` returns nothing. There's no seam for an embedder to inject a synchronous in-process responder.

**Cost:** every embedder writing tests against `kit.Kit` stands up an `httptest.Server` that speaks the OpenAI/Anthropic protocol they're targeting. That's 200–400 LOC of harness per project, every test pays a TCP round-trip + JSON encode/decode, and there's no structural guarantee that a `KIT_NO_NETWORK=1`-style flag is honoured.

**Proposed fix:** add a `Provider` option:

```go
// pkg/kit/options.go

// Provider is a synchronous in-process replacement for the HTTP-backed
// providers that kit constructs from a model string. When non-nil, kit
// skips models.CreateProvider entirely and routes every turn through fn.
//
// Provider takes precedence over Model resolution. Intended primarily for
// tests and offline harnesses; production embedders should prefer the
// model-string path.
type ProviderFn func(ctx context.Context, req ProviderRequest) (ProviderResponse, error)

type ProviderRequest struct {
    Messages    []LLMMessage
    Tools       []Tool
    System      string
    MaxTokens   int
    Streaming   bool
    Temperature *float64
    // …whatever the existing internal provider request carries
}

type ProviderResponse struct {
    Messages   []LLMMessage // assistant + tool messages in order
    StopReason string
    Usage      *LLMUsage
    // Optional: streamed deltas, if Streaming was true and the caller
    // wants to assert delta order.
    Deltas []StreamDelta
}

// In Options:
type Options struct {
    // … existing fields …

    // Provider, when non-nil, replaces the model-string-driven provider
    // construction. The Model field is ignored.
    Provider ProviderFn
}
```

Then in `kit.go` where `models.CreateProvider` is called, branch on `m.opts.Provider != nil` and wrap it in an adapter that satisfies the same `fantasy.LanguageModel` interface the rest of the loop uses.

---

#### 3. `PromptOptions` only carries `SystemMessage` — no per-call model/tool/thinking overrides

**Location:** `pkg/kit/kit.go:2626-2649`

```go
type PromptOptions struct {
    // SystemMessage is prepended as a system message before the user prompt.
    SystemMessage string
}

func (m *Kit) PromptWithOptions(ctx context.Context, msg string, opts PromptOptions) (string, error) {
    // …
}
```

Any embedder that wants `session.Prompt(ctx, text, WithModel(...), WithTools(...), WithThinking(...))` semantics must rebuild a fresh `*Kit` per call — the whole `kit.New()` → `viper.New()` → provider validation → `Close()` cycle, on every prompt. For test-driven harnesses where the actual model call is cheap (in-process fake), this setup-teardown dominates total runtime by 10–100×.

**Proposed fix:** extend `PromptOptions` with the override fields that already exist as `Options` construction-time settings, and have `runTurn` apply them as scoped overrides:

```go
type PromptOptions struct {
    SystemMessage string

    // Model overrides the agent's configured model for this call only.
    // Empty string means "use the agent's default".
    Model string

    // Tools, when non-nil, replaces the agent's core tool set for this
    // call. Use ExtraTools to add tools without replacing the core set.
    Tools []Tool

    // ExtraTools are added to the effective tool set for this call only.
    ExtraTools []Tool

    // ThinkingLevel overrides the agent's reasoning level for this call.
    // Empty string means "use the agent's default".
    ThinkingLevel string

    // ProviderURL / ProviderAPIKey override provider credentials for this
    // call only — useful for multi-tenant embedders that resolve creds
    // per request.
    ProviderURL    string
    ProviderAPIKey string
}

// Add a streaming-aware variant that returns the full turn:
func (m *Kit) PromptResultWithOptions(ctx context.Context, msg string, opts PromptOptions) (*TurnResult, error)
```

Internally, `runTurn` already takes the provider/tools/system as locals; the change is to thread the override through the same pre-call resolution path.

---

#### 4. `PromptResult()` semantics in streaming mode are unclear

**Location:** `pkg/kit/kit.go:2650-2657`

```go
// PromptResult sends a message and returns the full turn result including
// usage statistics and conversation messages.
func (m *Kit) PromptResult(ctx context.Context, message string) (*TurnResult, error) {
    return m.runTurn(ctx, message, message, []fantasy.Message{
        fantasy.NewUserMessage(message),
    })
}
```

When `Options.Streaming == true` (the default), the only way to observe the streamed deltas is via `OnMessageUpdate` / `OnReasoningDelta` hooks. The doc-comment doesn't specify whether `PromptResult` blocks until end-of-stream or returns when the first chunk arrives, and the returned `TurnResult.Messages` doesn't carry the deltas. Embedders therefore can't write a deterministic test that asserts "stream emitted chunks A, B, C in order, then final text was 'ABC'" without re-implementing the delta-collection in their own `OnMessageUpdate` handler.

**Proposed fix:** clarify the contract and add a captured-delta slice to `TurnResult`:

```go
type TurnResult struct {
    // … existing fields …

    // Stream, when streaming was enabled for this turn, contains every
    // delta event observed during the turn in emit order. Empty when
    // Options.Streaming was disabled.
    Stream []StreamEvent
}

type StreamEvent struct {
    Kind     StreamEventKind // text_delta | reasoning_start | reasoning_delta | reasoning_end | tool_call_chunk
    Index    int             // step index this delta belongs to
    Text     string          // for *_delta kinds
    Reasoning string         // for reasoning_*
    ToolName  string         // for tool_call_chunk
    ToolID    string
    Args      string         // for tool_call_chunk (accumulating JSON)
}

type StreamEventKind string

const (
    StreamEventTextDelta      StreamEventKind = "text_delta"
    StreamEventReasoningStart StreamEventKind = "reasoning_start"
    StreamEventReasoningDelta StreamEventKind = "reasoning_delta"
    StreamEventReasoningEnd   StreamEventKind = "reasoning_end"
    StreamEventToolCallChunk  StreamEventKind = "tool_call_chunk"
)
```

…and explicitly document that `PromptResult` blocks until end-of-turn regardless of streaming mode.

---

### Tier 2 — meaningfully simplify embedder code

#### 5. `ToolOutput` has no way to signal "halt the agent loop and return this value"

**Location:** `pkg/kit/tools.go:24-44`

```go
type ToolOutput struct {
    Content   string
    IsError   bool
    Data      []byte
    MediaType string
}
```

Embedders building "structured result extraction" patterns (model is asked to call a `finish(...)` tool with a typed argument, the agent loop terminates, and the typed value is returned to the caller) currently smuggle the captured value out via a `sync.Mutex`-guarded side-channel because `ToolOutput` only round-trips a `Content` string back to the model.

**Proposed fix:** add an opaque carrier + a halt signal:

```go
type ToolOutput struct {
    Content   string
    IsError   bool
    Data      []byte
    MediaType string

    // FinalValue, when non-nil and Halt is true, is propagated to the
    // turn's TurnResult.FinalValue and the agent loop terminates without
    // submitting Content back to the model.
    FinalValue any

    // Halt, when true, terminates the agent loop after this tool call.
    // The model is not given a chance to respond to the tool result.
    Halt bool
}
```

And on `TurnResult`:

```go
type TurnResult struct {
    // … existing fields …

    // FinalValue is set when a tool returned ToolOutput with Halt=true.
    // The dynamic type is whatever the tool handler put in
    // ToolOutput.FinalValue.
    FinalValue any

    // HaltedByTool is the name of the tool that returned Halt=true, or
    // empty if the turn ended for any other reason.
    HaltedByTool string
}
```

---

#### 6. No structured provider error taxonomy — wrap with sentinels

**Location:** `grep 'var Err' pkg/kit/` returns only `ErrMCPNoToken`. Provider errors propagate up as `fmt.Errorf("provider %q: %w", name, err)` chains where `err` is whatever the underlying SDK returned — usually a string-formatted HTTP error.

Embedders that want to react to "context window exceeded → auto-compact" or "rate-limited → backoff" or "auth failed → re-prompt for key" must `strings.Contains` over the error text, which is fragile and provider-specific.

**Proposed fix:** export sentinels at the kit package level and have every provider wrap with them:

```go
// pkg/kit/errors.go (new file)
package kit

import "errors"

// Provider-error sentinels. Providers wrap these via fmt.Errorf("%w: %s", …)
// so embedders can classify via errors.Is.
var (
    // ErrContextOverflow indicates the request exceeded the model's
    // maximum context window. Embedders typically respond by compacting
    // and retrying.
    ErrContextOverflow = errors.New("context window exceeded")

    // ErrRateLimit indicates the provider throttled the request.
    // Embedders typically respond by backing off and retrying.
    ErrRateLimit = errors.New("rate limited by provider")

    // ErrAuth indicates a credential / authorization failure.
    ErrAuth = errors.New("provider authentication failed")

    // ErrProviderUnavailable indicates a transient provider/upstream
    // failure (5xx, network error, timeout).
    ErrProviderUnavailable = errors.New("provider unavailable")

    // ErrInvalidRequest indicates the request was structurally invalid
    // and retrying will not help.
    ErrInvalidRequest = errors.New("invalid request to provider")
)
```

Then audit `internal/providers/*` and `internal/models/providers.go` to wrap responses by status code:

```go
// pseudo-code in the openai/anthropic provider wrappers
switch resp.StatusCode {
case 401, 403:
    return fmt.Errorf("%w: %s", kit.ErrAuth, body)
case 429:
    return fmt.Errorf("%w: %s", kit.ErrRateLimit, body)
case 400:
    if strings.Contains(body, "context_length_exceeded") {
        return fmt.Errorf("%w: %s", kit.ErrContextOverflow, body)
    }
    return fmt.Errorf("%w: %s", kit.ErrInvalidRequest, body)
case 500, 502, 503, 504:
    return fmt.Errorf("%w: %s", kit.ErrProviderUnavailable, body)
}
```

Once landed, embedders do `errors.Is(err, kit.ErrContextOverflow)` and lose the string-matching.

---

#### 7. `fantasy.Usage` has fixed buckets — no `Extra` for provider-specific tokens

**Location:** `pkg/kit/types.go:276` — `kit.LLMUsage = fantasy.Usage` (type alias).

```go
// charm.land/fantasy@v0.32.0/model.go
type Usage struct {
    InputTokens         int64
    OutputTokens        int64
    TotalTokens         int64
    ReasoningTokens     int64
    CacheCreationTokens int64
    CacheReadTokens     int64
}
```

When a new provider exposes a new bucket (audio tokens, image tokens, web-search tokens, etc.), embedders mapping `kit.LLMUsage` into their own typed `Usage` silently drop the new bucket because the field doesn't exist on `fantasy.Usage`.

**Proposed fix:** add an `Extra map[string]int64` to `fantasy.Usage` upstream (or, if that's a separate repo, add `kit.LLMUsage` as a kit-owned struct that embeds `fantasy.Usage` plus the extra map). Then have every provider populate the extras it knows about:

```go
type LLMUsage struct {
    fantasy.Usage
    // Extra carries provider-specific token buckets that don't fit the
    // fixed fields on fantasy.Usage. Keys are provider-prefixed
    // (e.g. "anthropic.cache_creation_input_tokens",
    // "openai.audio_tokens") to avoid collisions.
    Extra map[string]int64
}
```

(The upstream-`fantasy` route is cleaner if you control that repo; the kit-side wrapper is the back-compat-safe path if you don't.)

---

#### 8. `NewTool[T any]` requires a Go-typed `T` — no untyped schema+handler form

**Location:** `pkg/kit/tools.go:139`

```go
func NewTool[TInput any](name, description string, fn func(ctx context.Context, input TInput) (ToolOutput, error)) Tool {
    // …
}
```

Embedders that source tools dynamically (skill markdown files, MCP server tool catalogs, user-supplied JSON-schema definitions) don't have a Go type at registration time — they have a JSON Schema + a `func(ctx, map[string]any) (any, error)`. To register such a tool today, the embedder must implement the underlying `fantasy.AgentTool` interface by hand (~140 LOC of boilerplate per project: `Info()`, `Run(ctx, LLMToolCall) (LLMToolResponse, error)`, `ProviderOptions()`, `SetProviderOptions(...)`).

**Proposed fix:** add an untyped constructor:

```go
// NewRawTool is the schema-driven counterpart to NewTool. Use it when the
// tool's input shape isn't known at compile time (e.g. tools loaded from
// JSON Schema definitions in skill files or MCP server catalogs).
//
// schema must be a valid JSON Schema describing the tool's input object.
// fn receives the decoded JSON arguments as a map and returns a ToolOutput.
// Schema validation is performed before fn is invoked; invalid inputs
// short-circuit with an error ToolResponse.
func NewRawTool(
    name, description string,
    schema map[string]any,
    fn func(ctx context.Context, args map[string]any) (ToolOutput, error),
) Tool
```

Internally this can share most of the implementation with `NewTool[T]`: decode `LLMToolCall.Input` (already a JSON string) into `map[string]any`, validate against `schema`, invoke `fn`, wrap the `ToolOutput` via the existing `toolOutputToResponse` helper.

---

#### 9. `CompactionEvent` has no failure-path variant

**Location:** `pkg/kit/events.go:373-385` and `pkg/kit/compaction.go:270-280`

```go
// CompactionEvent fires after a successful compaction.
type CompactionEvent struct {
    Summary         string
    OriginalTokens  int
    CompactedTokens int
    MessagesRemoved int
    ReadFiles       []string
    ModifiedFiles   []string
}
```

The "fires after a *successful* compaction" doc-comment is accurate — no event fires when compaction fails. Embedders wiring lifecycle telemetry around `Kit.Compact` (start timestamp, end timestamp, success/failure, error message) have to hand-roll start/end timing because `OnCompaction` won't fire on the error path.

**Proposed fix:** either add an `Err` field to `CompactionEvent` and always emit (success or failure), or add a separate `CompactionFailedEvent`:

```go
// Option A — additive on CompactionEvent (preferred, since OnCompaction
// already exists):
type CompactionEvent struct {
    Summary         string
    OriginalTokens  int
    CompactedTokens int
    MessagesRemoved int
    ReadFiles       []string
    ModifiedFiles   []string

    // Err is non-nil when compaction failed. On the failure path,
    // Summary / *Tokens / *Files are zero-valued.
    Err error
}
```

…and have `compactInternal` emit a `CompactionEvent{Err: err}` on the failure return paths.

---

#### 10. `SessionManager.AppendBranchSummary` is missing

**Location:** `pkg/kit/sessions.go:223-233`

```go
func (m *Kit) CollapseBranch(fromID, toID, summary string) error {
    if m.session == nil {
        return fmt.Errorf("no session available")
    }
    // Note: This operation is not directly supported by SessionManager interface
    // as it requires AppendBranchSummary which is TreeManager-specific.
    // For custom SessionManagers, this would need to be implemented differently.
    // For now, we try to use the underlying TreeManager if available.
    if adapter, ok := m.session.(*treeManagerAdapter); ok {
        _, err := adapter.inner.AppendBranchSummary(fromID, summary)
        return err
    }
    return fmt.Errorf("CollapseBranch not supported by custom session manager")
}
```

Embedders implementing a custom `SessionManager` (e.g. backed by a relational DB or a remote store) can't support `Kit.CollapseBranch` because the underlying primitive isn't on the interface.

**Proposed fix:** promote `AppendBranchSummary` to `SessionManager`:

```go
// pkg/kit/session.go
type SessionManager interface {
    // … existing methods …

    // AppendBranchSummary creates a summary entry that replaces the range
    // [fromID, leaf] on the current branch with a single summary. Returns
    // the new entry ID. Managers that do not track branch summaries may
    // return ErrBranchSummaryNotSupported.
    AppendBranchSummary(fromID, summary string) (entryID string, err error)
}

var ErrBranchSummaryNotSupported = errors.New("session manager does not support branch summaries")
```

Update `treeManagerAdapter` (which already has the method on the inner manager) and any other shipped manager. Remove the type-assertion fallback in `Kit.CollapseBranch`.

---

### Tier 3 — papercuts

#### 11. `Kit.Close()` doesn't take a context

**Location:** `pkg/kit/kit.go:2790`

```go
func (m *Kit) Close() error
```

Graceful shutdown with a deadline isn't possible.

**Proposed fix:** add `CloseContext(ctx context.Context) error` alongside `Close()`. Keep `Close()` as `return m.CloseContext(context.Background())` for back-compat.

---

#### 12. No `kit.ResetForTesting()` under a build tag

**Location:** `grep -rn 'ResetForTesting' pkg/kit/` returns nothing.

Package-global state (the `configPath` global in #1, the embedded models registry cache, the global viper if anyone still touches it) survives across tests in the same binary. With v0.79.0's per-instance viper store this is mostly cosmetic, but the leftover `configPath` global is exactly the kind of process-residual that a `ResetForTesting()` would scrub.

**Proposed fix:**

```go
// pkg/kit/export_test.go OR pkg/kit/testing.go behind a build tag:
//go:build testing

package kit

// ResetForTesting clears all package-global state. Intended for use in
// test-binary teardown / between-test cleanup. Safe to call concurrently
// with no in-flight kit.New() calls.
func ResetForTesting() {
    config.SetConfigPath("")
    // …any other globals that get stamped during kit.New
}
```

---

#### 13. `Kit.Subagent` always does a full `kit.New()` per spawn — add `Kit.Fork`

**Location:** `pkg/kit/kit.go:1936-2060`

`Kit.Subagent` already inherits provider config and MCP task options via `inheritProviderConfig(childOpts, m.v)` and `inheritMCPTaskOptions(childOpts, m.opts)`, but the child still goes through full `New(ctx, childOpts)` — new config store, re-validate provider credentials, possibly re-load the models DB. For embedders that spawn many short-lived sub-tasks, this dominates per-task cost.

**Proposed fix:** add a lighter-weight `Kit.Fork`:

```go
// ForkConfig configures a forked Kit instance for a short-lived sub-task.
// Unlike Subagent, Fork reuses the parent's provider client, session
// manager (if not overridden), event bus, and MCP connections — only the
// system prompt and tool set are scoped to the fork.
//
// The returned Kit must be Close()d to release the fork-local resources.
// Closing the fork does not affect the parent.
type ForkConfig struct {
    SystemPrompt string
    Tools        []Tool
    ExtraTools   []Tool
    // Model is optional; nil/empty means "inherit from parent".
    Model string
    // SessionManager, if non-nil, scopes the fork to a child session.
    // nil means "share the parent's session" (the fork's appends go to
    // the parent's branch).
    SessionManager SessionManager
}

func (m *Kit) Fork(ctx context.Context, cfg ForkConfig) (*Kit, error)
```

Internally `Fork` constructs a `*Kit` that shares the parent's `*viper.Viper`, the parent's provider client, and the parent's event bus, only re-allocating what the fork-config actually changes.

---

#### 14. `LoadSkillsFromFS(fs.FS)` overload missing

**Location:** `pkg/kit/skills.go:26-46`

```go
func LoadSkill(path string) (*Skill, error)
func LoadSkillsFromDir(dir string) ([]*Skill, error)
func LoadSkills(cwd string) ([]*Skill, error)
```

All three loaders are path-string-based, so embedders that wrap skill discovery in an `fs.FS` abstraction (for `embed.FS` distribution, `fstest.MapFS` tests, or per-tenant virtual filesystems) can't use the upstream loaders directly.

**Proposed fix:** add an `fs.FS`-typed overload:

```go
// LoadSkillsFromFS is the fs.FS-typed counterpart of LoadSkillsFromDir.
// It walks fsys starting at root (which may be "." or a subdirectory),
// finds *.md files, parses YAML frontmatter + markdown body, and returns
// the loaded skills.
func LoadSkillsFromFS(fsys fs.FS, root string) ([]*Skill, error)
```

Implementation is mechanical — share the parsing helper with the existing path-based loader, just substitute `fs.WalkDir(fsys, root, …)` and `fs.ReadFile(fsys, p)` for `filepath.WalkDir` and `os.ReadFile`.

---

### Motivation / Use Case

Building higher-level agent frameworks on top of `pkg/kit` repeatedly runs into the same set of paper cuts. The patterns are:

1. **Tests** — every embedder writes a 200-400 LOC OpenAI-compatible `httptest.Server` because there's no `Provider` injection seam (#2). Per-test setup time is dominated by `kit.New()` rebuild because there's no per-call override path (#3). Streaming-mode tests can't deterministically assert delta order (#4). The `configPath` global makes concurrent tests data-race (#1) — `SkipConfig: true` is a workaround but doesn't help anyone who legitimately wants `.kit.yml` loading.
2. **Tools** — schema-driven tool registration (skills, MCP catalogs, user-supplied tools) requires hand-rolling the `fantasy.AgentTool` interface because `NewTool[T]` is Go-typed (#8). Structured-result patterns (model calls `finish(typed_arg)` and the loop terminates) need a side-channel because `ToolOutput` has no `FinalValue`/`Halt` (#5).
3. **Errors & telemetry** — provider errors arrive as opaque wrapped strings, so context-overflow auto-compaction logic ends up doing `strings.Contains` over wrapped messages (#6). Failed compactions emit no event, so symmetric lifecycle telemetry has to hand-roll timing (#9). Usage maps silently drop new provider buckets (#7).
4. **Session-tree primitives** — `CollapseBranch` doesn't work with custom session managers because `AppendBranchSummary` isn't on the interface (#10).
5. **Skills & sub-agents** — skill loaders are path-string-only, not `fs.FS`-typed (#14). `Subagent` still pays full `kit.New()` cost per spawn (#13).
6. **Cleanup** — `Close()` can't be bounded by a context (#11); no `ResetForTesting()` helper (#12).

None of these individually is a blocker, but collectively they push 800–1200 LOC of adapter/workaround code into every embedder, and three of them (#1, #2, #4) force test architectures that are fundamentally slower and less deterministic than they need to be.

### Proposed Implementation

See the per-gap "Proposed fix" sections above. All 14 items are designed to be additive — no existing exported signature changes. The interface extensions (#10) ship with default implementations on all shipped managers (`treeManagerAdapter` already has it; in-memory / no-op managers return `ErrBranchSummaryNotSupported`). The `Options` and `PromptOptions` extensions (#2, #3) are new fields on existing structs (zero-valued = current behavior).

A coding agent implementing this in a single PR would tackle the items in roughly this order:

1. **Mechanical / pure-additive first** (#1 mutex, #5 ToolOutput fields, #7 Usage Extra, #9 CompactionEvent.Err, #11 CloseContext, #12 ResetForTesting, #14 LoadSkillsFromFS) — each is <30 LOC and has no dependencies on the others.
2. **Interface extension with shim** (#10 AppendBranchSummary).
3. **New exported constructors / functions** (#8 NewRawTool).
4. **Provider-error sentinels** (#6) — touches multiple provider adapters under `internal/providers/*`.
5. **PromptOptions extension** (#3) — extends an existing options struct and threads through `runTurn`.
6. **Provider injection** (#2) — adds a new Options field and a branch in the provider-construction path. Largest single item but still ~200 LOC.
7. **Streaming TurnResult fields** (#4) — adds `Stream []StreamEvent` to `TurnResult`, populates it in the streaming code path, clarifies the doc-comment.
8. **`Kit.Fork`** (#13) — new method; shares parent's viper/provider/event-bus references.

Tests for each item are colocated: extend `pkg/kit/*_test.go` with a focused subtest per gap. The data-race for #1 is testable via `kit_test.go` running two `New()` calls under `t.Parallel()` (with `-race`). The Provider injection (#2) lets every other test in the binary stop spinning up `httptest.Server`s, which should be a measurable wall-clock improvement on `go test ./pkg/kit/...`.

### Alternatives Considered

- [x] I've considered workarounds or alternative approaches

For each Tier-1 item, embedder-side workarounds exist and are in production use, but they're either fragile (string-matching error messages), slow (HTTP loopback for every test turn), or correctness-compromising (process-global races mitigated only by `SkipConfig: true`, which bypasses legitimate `.kit.yml` loading). The Tier-2/3 items are pure boilerplate-reduction.

### Checklist

- [x] I've searched existing issues and this hasn't been requested yet
- [x] This feature aligns with Kit's design philosophy (TUI-first, extension-based) — every item makes Kit more usable as an embeddable SDK without altering the TUI/extension surface.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: embedder SDK hardening — 14 scoped additions for higher-level framework authors #68

Feature Description

Tier 1 — currently forces workarounds or feature drops

1. `configPath` package-global is written without a mutex

2. No pluggable `Provider` injection point — tests must run an HTTP loopback

3. `PromptOptions` only carries `SystemMessage` — no per-call model/tool/thinking overrides

4. `PromptResult()` semantics in streaming mode are unclear

Tier 2 — meaningfully simplify embedder code

5. `ToolOutput` has no way to signal "halt the agent loop and return this value"

6. No structured provider error taxonomy — wrap with sentinels

7. `fantasy.Usage` has fixed buckets — no `Extra` for provider-specific tokens

8. `NewTool[T any]` requires a Go-typed `T` — no untyped schema+handler form

9. `CompactionEvent` has no failure-path variant

10. `SessionManager.AppendBranchSummary` is missing

Tier 3 — papercuts

11. `Kit.Close()` doesn't take a context

12. No `kit.ResetForTesting()` under a build tag

13. `Kit.Subagent` always does a full `kit.New()` per spawn — add `Kit.Fork`

14. `LoadSkillsFromFS(fs.FS)` overload missing

Motivation / Use Case

Proposed Implementation

Alternatives Considered

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat: embedder SDK hardening — 14 scoped additions for higher-level framework authors #68

Description

Feature Description

Tier 1 — currently forces workarounds or feature drops

1. configPath package-global is written without a mutex

2. No pluggable Provider injection point — tests must run an HTTP loopback

3. PromptOptions only carries SystemMessage — no per-call model/tool/thinking overrides

4. PromptResult() semantics in streaming mode are unclear

Tier 2 — meaningfully simplify embedder code

5. ToolOutput has no way to signal "halt the agent loop and return this value"

6. No structured provider error taxonomy — wrap with sentinels

7. fantasy.Usage has fixed buckets — no Extra for provider-specific tokens

8. NewTool[T any] requires a Go-typed T — no untyped schema+handler form

9. CompactionEvent has no failure-path variant

10. SessionManager.AppendBranchSummary is missing

Tier 3 — papercuts

11. Kit.Close() doesn't take a context

12. No kit.ResetForTesting() under a build tag

13. Kit.Subagent always does a full kit.New() per spawn — add Kit.Fork

14. LoadSkillsFromFS(fs.FS) overload missing

Motivation / Use Case

Proposed Implementation

Alternatives Considered

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `configPath` package-global is written without a mutex

2. No pluggable `Provider` injection point — tests must run an HTTP loopback

3. `PromptOptions` only carries `SystemMessage` — no per-call model/tool/thinking overrides

4. `PromptResult()` semantics in streaming mode are unclear

5. `ToolOutput` has no way to signal "halt the agent loop and return this value"

7. `fantasy.Usage` has fixed buckets — no `Extra` for provider-specific tokens

8. `NewTool[T any]` requires a Go-typed `T` — no untyped schema+handler form

9. `CompactionEvent` has no failure-path variant

10. `SessionManager.AppendBranchSummary` is missing

11. `Kit.Close()` doesn't take a context

12. No `kit.ResetForTesting()` under a build tag

13. `Kit.Subagent` always does a full `kit.New()` per spawn — add `Kit.Fork`

14. `LoadSkillsFromFS(fs.FS)` overload missing