Releases: minpeter/plugsuits
@ai-sdk-tool/harness@1.3.2
Fixes
env.ts: moved.envfile discovery side-effects out of module top level intoloadDotEnvFilesIfAvailable()helper. The module now loads safely in edge runtimes (Cloudflare Workers, Vercel Edge) withoutnode:fsaccess. Callers that relied on the automatic.envloading must explicitly callloadDotEnvFilesIfAvailable()from their Node.js entry point.
plugsuits@2.3.5
Patch Changes
-
e937dc7: Keep env-file loading compatible with Node 18, preserve URL validation for shared AI endpoints, and align CEA's default context limit with the shared AI configuration.
-
5bb3997: Update direct and transitive dependency resolutions across the monorepo, including AI SDK packages, tooling, TypeScript, and runtime adapters. Raise the declared Node.js support floor to 22.19.0 to match upgraded runtime dependencies such as undici 8.
-
8b1919c: Persist user-level agent preferences across sessions (e.g.
/translate,/reasoning-mode,/tool-fallbackin CEA;/reasoningin minimal-agent) so toggles set in the TUI survive process restarts.- Harness: new generic
PreferencesStore<T>abstraction withFilePreferencesStore(single atomic JSON document),InMemoryPreferencesStore,LayeredPreferencesStore(configurable merge + write layer), andshallowMergePreferenceshelper. Exposed from the package root and a new@ai-sdk-tool/harness/preferencessubpath. Intentionally separate fromSnapshotStorebecause preferences are app/user-scoped while snapshots are session-scoped. - Harness: new one-line helper
createLayeredPreferences({ appName, validate })that returns{ store, userStore, workspaceStore, patch, paths }backed by~/.${appName}/settings.json(user layer) and./.${appName}/settings.json(workspace layer, write target).patch(partial)handles the load-merge-save flow on the workspace layer so consumers don't have to reimplement it. Fully customizable paths, merge strategy, and validator — but the common case is now a single call. - Harness: new
createTogglePreferenceCommandandcreateEnumPreferenceCommandfactories that collapse the typical "parse args → validate → mutate runtime → persist" slash-command boilerplate into a declarative config object. Typedfield: keyof Tensures persistence goes to the right preference key. Supports aliases, custom truthy/falsy words, custom parser, custom validator, and custom enabled/disabled messages. - CEA:
createUserPreferencesStore()now delegates tocreateLayeredPreferencesand also exposes the full harness bundle alongside the existing public fields./translateis migrated tocreateTogglePreferenceCommand; the CEA-localcreateToggleCommandfactory is deleted as dead code./translatenow awaits persistence (previously fire-and-forget), so the command response confirms the disk write. - CEA:
/reasoning-modeand/tool-fallbackcontinue to use the sharedpreferences-persistencesingleton for now (they have domain-specific selectable-modes logic that is not a good fit for the generic factory yet).configurePreferencesPersistencenow accepts an optionalbundleargument so future migrations can reuse the harness factories. - CEA startup: persisted preferences are applied to
AgentManagerbefore CLI flags. Explicit CLI flags (--no-translate,--reasoning-mode on,--tool-fallback,--toolcall-mode) still win for the current process but no longer overwrite the persisted file — they are one-shot overrides only.resolveSharedConfignow acceptsrawArgsso callers can distinguish explicit flags from defaults. - minimal-agent: gains a
/reasoning <on|off>slash command that togglesproviderOptions.openai.reasoningEffortvia theonBeforeTurnhook and persists the value throughcreateLayeredPreferences. The command definition is a single 10-line factory call thanks tocreateTogglePreferenceCommand. Header subtitle shows the live reasoning state. Preferences are stored at~/.minimal-agent/settings.jsonand./.minimal-agent/settings.json(separate from CEA so the two agents' defaults don't collide).
- Harness: new generic
-
8b1919c: Harden the user-preferences persistence layer against failure and concurrency, discovered during manual QA of PR #119 and flagged by the Oracle reviewer.
- Harness:
FilePreferencesStore.clear()now deletes the file withrmSyncinstead of writing"{}". Previouslyclear()left an empty{}on disk, and un-validated stores would return{}fromload()instead ofnull, breaking the contract shared withInMemoryPreferencesStore. Consumers using a validator were unaffected in practice because empty objects already validated tonull. - Harness:
createLayeredPreferences().patch()now serializes calls through a promise queue. Previous implementation was a plain read-modify-write, so two concurrentpatch()calls could both read the same stale workspace JSON and one update would be silently dropped. Concurrency tests cover both different-field and same-field writes. - Harness:
createTogglePreferenceCommandandcreateEnumPreferenceCommandnow persist before mutating runtime. If disk persistence fails, the runtime state is never touched and the command returns{ success: false, message: "Failed to persist …" }. If the runtimeset()throws after a successful persist, the disk write is rolled back to the previous value. This eliminates the class of "disk and runtime disagree" bugs across all factory-backed commands. - CEA:
patchWorkspacePreferencesnow queues writes per-store via aWeakMap-keyed promise chain, matching the harness bundle semantics. This protects the shim path used by/reasoning-modeand/tool-fallbackfrom the same lost-update race that previously affected the factory path. - CEA:
applyPersistedPreferencesToAgentManagerandapplySharedConfigToAgentManagerextracted frommain.tsintopackages/cea/src/entrypoints/preferences-startup.ts. Now pure functions taking the agent manager and store as arguments, so startup wiring is testable in isolation. Existingmain.tscall sites updated to pass dependencies explicitly. - CEA: minimal-agent gains
vitestdevDep,testscript,vitest.config.ts, and excludes*.test.ts/vitest.config.tsfrom the TypeScript build output.cittywas a dead devDependency — removed. - Tests added:
- Harness
preferences-store.test.ts:clear()⇒load()null,clear()removes file on disk,clear()no-op on missing file, cross-instance restart, three concurrent-patch scenarios (different fields, same field, queue recovery after failure). - Harness
preference-commands.test.ts: persist-first contract — toggle + enum commands do NOT mutate runtime when persistence fails, toggle command rolls back disk when runtime set throws, toggle command returns success only when BOTH disk and runtime succeed. - CEA
user-preferences.test.ts: concurrentpatchWorkspacePreferencespreserves all fields, same-field is last-writer-wins, save errors bubble to caller. - CEA
preferences-persistence.test.ts(new): singleton lifecycle, customonErrorhandler fires on save failure, concurrentpersistPreferencePatchpreserves all fields. - CEA
reasoning-mode.test.ts(new): 7 tests covering status report, valid transitions, invalid mode rejection, persistence, sibling field preservation, no-op "already using", graceful behavior without persistence configured. - CEA
tool-fallback.test.ts(new): 7 tests in the same shape. - CEA
preferences-startup.test.ts(new): integration tests forapplyPersistedPreferencesToAgentManagerandapplySharedConfigToAgentManagercovering all three fields, CLI-vs-persisted precedence, and the invariant that CLI flags never write to the store. - minimal-agent
preferences.test.ts(new): 7 tests covering path expectations, schema validation, round-trip, and the/reasoningonBeforeTurn integration (toggle → closure mutation → providerOptions change).
- Harness
- Test counts:
- Harness: 673 → 699 (+26)
- CEA: 526 → 563 (+37)
- minimal-agent: 0 → 7 (+7)
- Total workspace: 1283 → 1353 (+70)
- Harness:
-
496ffdb: Surface the "prompt processing" state that previously looked frozen, and fix follow-up correctness gaps found during post-implementation review.
- Harness: new
LoopHooks.onStreamStart/onFirstStreamParthooks wrap theagent.stream()call site so consumers driving turns throughrunAgentLoopcan react to the prompt-processing latency gap.onFirstStreamPartreceives the current stream part as its first argument (TextStreamPart<ToolSet>) so consumers can inspectpart.typeto filter framing chunks (start,text-start, …) from visible content.TextStreamPartis re-exported from the harness root for convenience. Docstring clarifies that the TUI has its own independentonStreamStartonAgentTUIConfig. - TUI: shows a
Processing...loader during turn preparation and transitions toWorking...once the LLM request is in flight. The startup token probe is now non-blocking (fire-and-forget) so the editor accepts input immediately; the context-usage footer starts from the estimated count and quietly upgrades to the real value. During a blocking compaction the foreground loader temporarily switches toCompacting...and restores the previous label when the block ends, so users see the actual reason for a long wait.text-startstream parts are now treated as visible, clearing the streaming loader as soon as the assistant view mounts (no more empty-view flicker). - Headless: emits a
turn-startlifecycle annotation and a matchingonStreamStartcallback before each LLM request; the event is dropped fromtrajectory.json(transient UX signal, nostep_id) so persisted consumers see identical output. The event fires exactly once per logical turn — overflow and no-output retries no longer re-emit it. New tests cover normal ordering,new-turnvsintermediate-stepphases, retry single-emission, and non-persistence intrajectory.json. - Headless: the persisted
schema_versionis corrected from the internalATIF-v1.6label to the actual current Harbor spec versionATIF-v1.4(<https://www.harborframework.com/docs/agents/traject...
- Harness: new
@ai-sdk-tool/tui@3.1.1
Patch Changes
-
5bb3997: Update direct and transitive dependency resolutions across the monorepo, including AI SDK packages, tooling, TypeScript, and runtime adapters. Raise the declared Node.js support floor to 22.19.0 to match upgraded runtime dependencies such as undici 8.
-
496ffdb: Surface the "prompt processing" state that previously looked frozen, and fix follow-up correctness gaps found during post-implementation review.
- Harness: new
LoopHooks.onStreamStart/onFirstStreamParthooks wrap theagent.stream()call site so consumers driving turns throughrunAgentLoopcan react to the prompt-processing latency gap.onFirstStreamPartreceives the current stream part as its first argument (TextStreamPart<ToolSet>) so consumers can inspectpart.typeto filter framing chunks (start,text-start, …) from visible content.TextStreamPartis re-exported from the harness root for convenience. Docstring clarifies that the TUI has its own independentonStreamStartonAgentTUIConfig. - TUI: shows a
Processing...loader during turn preparation and transitions toWorking...once the LLM request is in flight. The startup token probe is now non-blocking (fire-and-forget) so the editor accepts input immediately; the context-usage footer starts from the estimated count and quietly upgrades to the real value. During a blocking compaction the foreground loader temporarily switches toCompacting...and restores the previous label when the block ends, so users see the actual reason for a long wait.text-startstream parts are now treated as visible, clearing the streaming loader as soon as the assistant view mounts (no more empty-view flicker). - Headless: emits a
turn-startlifecycle annotation and a matchingonStreamStartcallback before each LLM request; the event is dropped fromtrajectory.json(transient UX signal, nostep_id) so persisted consumers see identical output. The event fires exactly once per logical turn — overflow and no-output retries no longer re-emit it. New tests cover normal ordering,new-turnvsintermediate-stepphases, retry single-emission, and non-persistence intrajectory.json. - Headless: the persisted
schema_versionis corrected from the internalATIF-v1.6label to the actual current Harbor spec versionATIF-v1.4(https://www.harborframework.com/docs/agents/trajectory-format). Documentation acrosspackages/headless/AGENTS.md,packages/headless/README.md, andpackages/cea/benchmark/AGENTS.mdnow separates the internal JSONL streaming protocol (which carries lifecycle annotations such asapproval,compaction,interrupt,turn-start) from the ATIF-v1.4 trajectory thatTrajectoryCollectorwrites to disk. - Headless:
StepMetricsgains the remaining ATIF-v1.4 optional fields (logprobs,prompt_token_ids,completion_token_ids) andTrajectoryJson.final_metricsnow aggregatestotal_cost_usd.TrajectoryJson.extrais typed as a closed record of exactly the three ATIF persistence buckets (approval_events,compaction_events,interrupt_events); new lifecycle types must extend the interface explicitly so the Harbor persistence contract stays type-enforced. - CEA: the
--atifCLI help text and the benchmark pipeline now reference ATIF-v1.4 (matching the correctedschema_version). The bundledpackages/cea/benchmark/test_trajectory.pyvalidator now calls Harbor's officialTrajectoryValidatorwhenharboris importable and falls back to a stricter local shape check otherwise; it enforces per-step metric shapes and rejectsboolvalues where ATIF requires a real number. - Addressed PR review feedback:
turn-startandonStreamStartnow fire strictly afteragent.stream()successfully returns, so stream-creation failures no longer produce a false "stream started" signal (reported by Gemini, Codex, and Cubic reviewers).- The background startup usage probe is serialized against per-turn probes by a generation token; a stale startup probe can no longer overwrite newer usage data and skew context-pressure metrics.
- The blocking-compaction spinner swap only stashes the original foreground label on first entry and only restores it when the foreground loader is still live, eliminating both the "Compacting..." wording sticking after unblock and the "Processing..." spinner resurrecting after the first stream part arrived.
- Restored the post-
onSetupupdateHeader()call that was accidentally dropped when the startup probe became non-blocking, so any header/footer state thatonSetupinitialises renders immediately instead of waiting for the first probe to resolve. - The bundled Python ATIF validator (
test_trajectory.py) no longer acceptsboolvalues where ATIF v1.4 requires a real number —isinstance(True, int)isTruein Python, so the old check let invalid metric payloads slip through. Added_is_real_number/_is_real_inthelpers that excludebool. - Observer hooks (
onStreamStart,onFirstStreamPart) no longer abort a valid stream when the callback throws. Errors are logged viaconsole.errorand swallowed in the harness loop, headless runner, and TUI session loop, with the contract documented onLoopHooks. - Repaired a regression where
LoopHooks.onToolCallhad silently dropped out of the publicLoopHooksinterface while still being destructured insiderunAgentLoop. The field is restored to its original signature; consumers that already relied on it are unaffected, and the destructuring now type-checks again. - Corrected the
LoopHooks.onFirstStreamPartsignature as a pre-adoption fix (Cubic P2): the previous(context) => voidshape promised in its docstring that consumers could filter on part type, but the callback never received the part. The signature now passes(part: TextStreamPart<ToolSet>, context)so consumers can actually inspectpart.type. Zero existing consumers were found across the monorepo (the hook was introduced earlier in this PR), so this is a type-only correction with no runtime migration. New regression tests inloop.test.tscover single-fire semantics, per-iteration firing, empty-stream skip, and observer-error isolation.
- Pinned the ATIF v1.4 compliance contract in-source:
trajectory-collector.ts,TrajectoryJson,AtifStep,TrajectoryEvent,collectTrajectoryEvent, andrunHeadlessnow carry module/interface-level JSDoc spelling out the Harbor spec version, the allowedsteps[*].sourcevalues, theextra.*persistence rule, and the stream-vs-snapshot boundary.packages/headless/AGENTS.mdgains an "ATIF v1.4 COMPLIANCE" section listing the same invariants, and theatif-events.test.tssuite now declares itself as the executable compliance contract. These are docs-only, but they turn future spec drifts into obvious code-review red flags instead of silent regressions. - Review cycle 1 follow-ups (Oracle + Gemini + Codex + Cubic + CodeRabbit):
- Guarded
TrajectoryCollector.writeToagainst persisting an invalid zero-step trajectory (Harbor's own validator rejectssteps: []). The method now returnsboolean—truewhen a file was written,falsewhen the write was intentionally skipped to keeptrajectory.jsonATIF-v1.4 compliant. - Moved the TUI
showLoader("Processing...")call inside the stream-turntry/finallyso a thrownprepareMessages(oronBeforeTurn/usage probe/compaction check) no longer leaves the spinner stuck on screen. - Tightened the startup usage-probe guard: in addition to the generation token,
measureUsageIfAvailablenow capturesmessageHistory.getRevision()at call time and drops its result when the history has mutated mid-probe, preventing stale empty-message usage from overwriting per-turn measurements. - Narrowed
TrajectoryJson.extrato the three canonical lifecycle buckets (approval_events,compaction_events,interrupt_events) by dropping theRecord<string, unknown>intersection. New lifecycle types must now extend the interface explicitly, keeping the ATIF persistence contract type-enforced. - Hardened the Python validator:
_is_real_numbernow rejectsNaN,Infinity, and-Infinity(all of whichjson.loadswill happily produce from non-strict JSON) via an explicitmath.isfinitecheck. - Corrected documentation drift across
packages/headless/AGENTS.md,packages/headless/README.md,packages/headless/src/types.ts,packages/headless/src/trajectory-collector.ts, and the rootAGENTS.md:approval/compaction/interruptare persisted undertrajectory.extra.*, not JSONL-only; onlyturn-startanderrorare transient. - Regression test added for the
writeTozero-step guard:does not write an invalid zero-step trajectory when the stream fails before any step.
- Guarded
- Review cycle 2 follow-ups (Oracle re-audit):
- Headless
measureUsageIfAvailablenow carries the same generation + revision guards the TUI already had. A slow background probe that resolves after a compaction or a newer per-turn probe no longer overwrites fresh usage data. - ATIF v1.4 step source contract aligned across code, Python validator, and benchmark docs:
user,agent, andsystemare all permitted (Harbor v1.2+). Previous divergence betweenAtifStep.sourceandtest_trajectory.py'svalid_sources = {user, agent}is resolved. - Root
README.mdheadless event list now includesturn-startand points at Harbor's ATIF-v1.4 schema for the persisted trajectory.
- Headless
- Harness: new
-
f523de9: Bump outdated dependencies to their latest releases:
@ai-sdk-tool/parser4.1.21,vitest4.1.5, and@mariozechner/pi-tui0.68.1. Align the@ai-sdk-tool/tuipeer range for@mariozechner/pi-tuito^0.68.1and updatecreateAliasAwareAutocompleteProviderto the new async autocomp...
@ai-sdk-tool/headless@3.1.1
Patch Changes
-
5bb3997: Update direct and transitive dependency resolutions across the monorepo, including AI SDK packages, tooling, TypeScript, and runtime adapters. Raise the declared Node.js support floor to 22.19.0 to match upgraded runtime dependencies such as undici 8.
-
496ffdb: Surface the "prompt processing" state that previously looked frozen, and fix follow-up correctness gaps found during post-implementation review.
- Harness: new
LoopHooks.onStreamStart/onFirstStreamParthooks wrap theagent.stream()call site so consumers driving turns throughrunAgentLoopcan react to the prompt-processing latency gap.onFirstStreamPartreceives the current stream part as its first argument (TextStreamPart<ToolSet>) so consumers can inspectpart.typeto filter framing chunks (start,text-start, …) from visible content.TextStreamPartis re-exported from the harness root for convenience. Docstring clarifies that the TUI has its own independentonStreamStartonAgentTUIConfig. - TUI: shows a
Processing...loader during turn preparation and transitions toWorking...once the LLM request is in flight. The startup token probe is now non-blocking (fire-and-forget) so the editor accepts input immediately; the context-usage footer starts from the estimated count and quietly upgrades to the real value. During a blocking compaction the foreground loader temporarily switches toCompacting...and restores the previous label when the block ends, so users see the actual reason for a long wait.text-startstream parts are now treated as visible, clearing the streaming loader as soon as the assistant view mounts (no more empty-view flicker). - Headless: emits a
turn-startlifecycle annotation and a matchingonStreamStartcallback before each LLM request; the event is dropped fromtrajectory.json(transient UX signal, nostep_id) so persisted consumers see identical output. The event fires exactly once per logical turn — overflow and no-output retries no longer re-emit it. New tests cover normal ordering,new-turnvsintermediate-stepphases, retry single-emission, and non-persistence intrajectory.json. - Headless: the persisted
schema_versionis corrected from the internalATIF-v1.6label to the actual current Harbor spec versionATIF-v1.4(https://www.harborframework.com/docs/agents/trajectory-format). Documentation acrosspackages/headless/AGENTS.md,packages/headless/README.md, andpackages/cea/benchmark/AGENTS.mdnow separates the internal JSONL streaming protocol (which carries lifecycle annotations such asapproval,compaction,interrupt,turn-start) from the ATIF-v1.4 trajectory thatTrajectoryCollectorwrites to disk. - Headless:
StepMetricsgains the remaining ATIF-v1.4 optional fields (logprobs,prompt_token_ids,completion_token_ids) andTrajectoryJson.final_metricsnow aggregatestotal_cost_usd.TrajectoryJson.extrais typed as a closed record of exactly the three ATIF persistence buckets (approval_events,compaction_events,interrupt_events); new lifecycle types must extend the interface explicitly so the Harbor persistence contract stays type-enforced. - CEA: the
--atifCLI help text and the benchmark pipeline now reference ATIF-v1.4 (matching the correctedschema_version). The bundledpackages/cea/benchmark/test_trajectory.pyvalidator now calls Harbor's officialTrajectoryValidatorwhenharboris importable and falls back to a stricter local shape check otherwise; it enforces per-step metric shapes and rejectsboolvalues where ATIF requires a real number. - Addressed PR review feedback:
turn-startandonStreamStartnow fire strictly afteragent.stream()successfully returns, so stream-creation failures no longer produce a false "stream started" signal (reported by Gemini, Codex, and Cubic reviewers).- The background startup usage probe is serialized against per-turn probes by a generation token; a stale startup probe can no longer overwrite newer usage data and skew context-pressure metrics.
- The blocking-compaction spinner swap only stashes the original foreground label on first entry and only restores it when the foreground loader is still live, eliminating both the "Compacting..." wording sticking after unblock and the "Processing..." spinner resurrecting after the first stream part arrived.
- Restored the post-
onSetupupdateHeader()call that was accidentally dropped when the startup probe became non-blocking, so any header/footer state thatonSetupinitialises renders immediately instead of waiting for the first probe to resolve. - The bundled Python ATIF validator (
test_trajectory.py) no longer acceptsboolvalues where ATIF v1.4 requires a real number —isinstance(True, int)isTruein Python, so the old check let invalid metric payloads slip through. Added_is_real_number/_is_real_inthelpers that excludebool. - Observer hooks (
onStreamStart,onFirstStreamPart) no longer abort a valid stream when the callback throws. Errors are logged viaconsole.errorand swallowed in the harness loop, headless runner, and TUI session loop, with the contract documented onLoopHooks. - Repaired a regression where
LoopHooks.onToolCallhad silently dropped out of the publicLoopHooksinterface while still being destructured insiderunAgentLoop. The field is restored to its original signature; consumers that already relied on it are unaffected, and the destructuring now type-checks again. - Corrected the
LoopHooks.onFirstStreamPartsignature as a pre-adoption fix (Cubic P2): the previous(context) => voidshape promised in its docstring that consumers could filter on part type, but the callback never received the part. The signature now passes(part: TextStreamPart<ToolSet>, context)so consumers can actually inspectpart.type. Zero existing consumers were found across the monorepo (the hook was introduced earlier in this PR), so this is a type-only correction with no runtime migration. New regression tests inloop.test.tscover single-fire semantics, per-iteration firing, empty-stream skip, and observer-error isolation.
- Pinned the ATIF v1.4 compliance contract in-source:
trajectory-collector.ts,TrajectoryJson,AtifStep,TrajectoryEvent,collectTrajectoryEvent, andrunHeadlessnow carry module/interface-level JSDoc spelling out the Harbor spec version, the allowedsteps[*].sourcevalues, theextra.*persistence rule, and the stream-vs-snapshot boundary.packages/headless/AGENTS.mdgains an "ATIF v1.4 COMPLIANCE" section listing the same invariants, and theatif-events.test.tssuite now declares itself as the executable compliance contract. These are docs-only, but they turn future spec drifts into obvious code-review red flags instead of silent regressions. - Review cycle 1 follow-ups (Oracle + Gemini + Codex + Cubic + CodeRabbit):
- Guarded
TrajectoryCollector.writeToagainst persisting an invalid zero-step trajectory (Harbor's own validator rejectssteps: []). The method now returnsboolean—truewhen a file was written,falsewhen the write was intentionally skipped to keeptrajectory.jsonATIF-v1.4 compliant. - Moved the TUI
showLoader("Processing...")call inside the stream-turntry/finallyso a thrownprepareMessages(oronBeforeTurn/usage probe/compaction check) no longer leaves the spinner stuck on screen. - Tightened the startup usage-probe guard: in addition to the generation token,
measureUsageIfAvailablenow capturesmessageHistory.getRevision()at call time and drops its result when the history has mutated mid-probe, preventing stale empty-message usage from overwriting per-turn measurements. - Narrowed
TrajectoryJson.extrato the three canonical lifecycle buckets (approval_events,compaction_events,interrupt_events) by dropping theRecord<string, unknown>intersection. New lifecycle types must now extend the interface explicitly, keeping the ATIF persistence contract type-enforced. - Hardened the Python validator:
_is_real_numbernow rejectsNaN,Infinity, and-Infinity(all of whichjson.loadswill happily produce from non-strict JSON) via an explicitmath.isfinitecheck. - Corrected documentation drift across
packages/headless/AGENTS.md,packages/headless/README.md,packages/headless/src/types.ts,packages/headless/src/trajectory-collector.ts, and the rootAGENTS.md:approval/compaction/interruptare persisted undertrajectory.extra.*, not JSONL-only; onlyturn-startanderrorare transient. - Regression test added for the
writeTozero-step guard:does not write an invalid zero-step trajectory when the stream fails before any step.
- Guarded
- Review cycle 2 follow-ups (Oracle re-audit):
- Headless
measureUsageIfAvailablenow carries the same generation + revision guards the TUI already had. A slow background probe that resolves after a compaction or a newer per-turn probe no longer overwrites fresh usage data. - ATIF v1.4 step source contract aligned across code, Python validator, and benchmark docs:
user,agent, andsystemare all permitted (Harbor v1.2+). Previous divergence betweenAtifStep.sourceandtest_trajectory.py'svalid_sources = {user, agent}is resolved. - Root
README.mdheadless event list now includesturn-startand points at Harbor's ATIF-v1.4 schema for the persisted trajectory.
- Headless
- Harness: new
@ai-sdk-tool/harness@1.3.1
Patch Changes
-
e937dc7: Keep env-file loading compatible with Node 18, preserve URL validation for shared AI endpoints, and align CEA's default context limit with the shared AI configuration.
-
5bb3997: Update direct and transitive dependency resolutions across the monorepo, including AI SDK packages, tooling, TypeScript, and runtime adapters. Raise the declared Node.js support floor to 22.19.0 to match upgraded runtime dependencies such as undici 8.
-
8b1919c: Persist user-level agent preferences across sessions (e.g.
/translate,/reasoning-mode,/tool-fallbackin CEA;/reasoningin minimal-agent) so toggles set in the TUI survive process restarts.- Harness: new generic
PreferencesStore<T>abstraction withFilePreferencesStore(single atomic JSON document),InMemoryPreferencesStore,LayeredPreferencesStore(configurable merge + write layer), andshallowMergePreferenceshelper. Exposed from the package root and a new@ai-sdk-tool/harness/preferencessubpath. Intentionally separate fromSnapshotStorebecause preferences are app/user-scoped while snapshots are session-scoped. - Harness: new one-line helper
createLayeredPreferences({ appName, validate })that returns{ store, userStore, workspaceStore, patch, paths }backed by~/.${appName}/settings.json(user layer) and./.${appName}/settings.json(workspace layer, write target).patch(partial)handles the load-merge-save flow on the workspace layer so consumers don't have to reimplement it. Fully customizable paths, merge strategy, and validator — but the common case is now a single call. - Harness: new
createTogglePreferenceCommandandcreateEnumPreferenceCommandfactories that collapse the typical "parse args → validate → mutate runtime → persist" slash-command boilerplate into a declarative config object. Typedfield: keyof Tensures persistence goes to the right preference key. Supports aliases, custom truthy/falsy words, custom parser, custom validator, and custom enabled/disabled messages. - CEA:
createUserPreferencesStore()now delegates tocreateLayeredPreferencesand also exposes the full harness bundle alongside the existing public fields./translateis migrated tocreateTogglePreferenceCommand; the CEA-localcreateToggleCommandfactory is deleted as dead code./translatenow awaits persistence (previously fire-and-forget), so the command response confirms the disk write. - CEA:
/reasoning-modeand/tool-fallbackcontinue to use the sharedpreferences-persistencesingleton for now (they have domain-specific selectable-modes logic that is not a good fit for the generic factory yet).configurePreferencesPersistencenow accepts an optionalbundleargument so future migrations can reuse the harness factories. - CEA startup: persisted preferences are applied to
AgentManagerbefore CLI flags. Explicit CLI flags (--no-translate,--reasoning-mode on,--tool-fallback,--toolcall-mode) still win for the current process but no longer overwrite the persisted file — they are one-shot overrides only.resolveSharedConfignow acceptsrawArgsso callers can distinguish explicit flags from defaults. - minimal-agent: gains a
/reasoning <on|off>slash command that togglesproviderOptions.openai.reasoningEffortvia theonBeforeTurnhook and persists the value throughcreateLayeredPreferences. The command definition is a single 10-line factory call thanks tocreateTogglePreferenceCommand. Header subtitle shows the live reasoning state. Preferences are stored at~/.minimal-agent/settings.jsonand./.minimal-agent/settings.json(separate from CEA so the two agents' defaults don't collide).
- Harness: new generic
-
8b1919c: Harden the user-preferences persistence layer against failure and concurrency, discovered during manual QA of PR #119 and flagged by the Oracle reviewer.
- Harness:
FilePreferencesStore.clear()now deletes the file withrmSyncinstead of writing"{}". Previouslyclear()left an empty{}on disk, and un-validated stores would return{}fromload()instead ofnull, breaking the contract shared withInMemoryPreferencesStore. Consumers using a validator were unaffected in practice because empty objects already validated tonull. - Harness:
createLayeredPreferences().patch()now serializes calls through a promise queue. Previous implementation was a plain read-modify-write, so two concurrentpatch()calls could both read the same stale workspace JSON and one update would be silently dropped. Concurrency tests cover both different-field and same-field writes. - Harness:
createTogglePreferenceCommandandcreateEnumPreferenceCommandnow persist before mutating runtime. If disk persistence fails, the runtime state is never touched and the command returns{ success: false, message: "Failed to persist …" }. If the runtimeset()throws after a successful persist, the disk write is rolled back to the previous value. This eliminates the class of "disk and runtime disagree" bugs across all factory-backed commands. - CEA:
patchWorkspacePreferencesnow queues writes per-store via aWeakMap-keyed promise chain, matching the harness bundle semantics. This protects the shim path used by/reasoning-modeand/tool-fallbackfrom the same lost-update race that previously affected the factory path. - CEA:
applyPersistedPreferencesToAgentManagerandapplySharedConfigToAgentManagerextracted frommain.tsintopackages/cea/src/entrypoints/preferences-startup.ts. Now pure functions taking the agent manager and store as arguments, so startup wiring is testable in isolation. Existingmain.tscall sites updated to pass dependencies explicitly. - CEA: minimal-agent gains
vitestdevDep,testscript,vitest.config.ts, and excludes*.test.ts/vitest.config.tsfrom the TypeScript build output.cittywas a dead devDependency — removed. - Tests added:
- Harness
preferences-store.test.ts:clear()⇒load()null,clear()removes file on disk,clear()no-op on missing file, cross-instance restart, three concurrent-patch scenarios (different fields, same field, queue recovery after failure). - Harness
preference-commands.test.ts: persist-first contract — toggle + enum commands do NOT mutate runtime when persistence fails, toggle command rolls back disk when runtime set throws, toggle command returns success only when BOTH disk and runtime succeed. - CEA
user-preferences.test.ts: concurrentpatchWorkspacePreferencespreserves all fields, same-field is last-writer-wins, save errors bubble to caller. - CEA
preferences-persistence.test.ts(new): singleton lifecycle, customonErrorhandler fires on save failure, concurrentpersistPreferencePatchpreserves all fields. - CEA
reasoning-mode.test.ts(new): 7 tests covering status report, valid transitions, invalid mode rejection, persistence, sibling field preservation, no-op "already using", graceful behavior without persistence configured. - CEA
tool-fallback.test.ts(new): 7 tests in the same shape. - CEA
preferences-startup.test.ts(new): integration tests forapplyPersistedPreferencesToAgentManagerandapplySharedConfigToAgentManagercovering all three fields, CLI-vs-persisted precedence, and the invariant that CLI flags never write to the store. - minimal-agent
preferences.test.ts(new): 7 tests covering path expectations, schema validation, round-trip, and the/reasoningonBeforeTurn integration (toggle → closure mutation → providerOptions change).
- Harness
- Test counts:
- Harness: 673 → 699 (+26)
- CEA: 526 → 563 (+37)
- minimal-agent: 0 → 7 (+7)
- Total workspace: 1283 → 1353 (+70)
- Harness:
-
496ffdb: Surface the "prompt processing" state that previously looked frozen, and fix follow-up correctness gaps found during post-implementation review.
- Harness: new
LoopHooks.onStreamStart/onFirstStreamParthooks wrap theagent.stream()call site so consumers driving turns throughrunAgentLoopcan react to the prompt-processing latency gap.onFirstStreamPartreceives the current stream part as its first argument (TextStreamPart<ToolSet>) so consumers can inspectpart.typeto filter framing chunks (start,text-start, …) from visible content.TextStreamPartis re-exported from the harness root for convenience. Docstring clarifies that the TUI has its own independentonStreamStartonAgentTUIConfig. - TUI: shows a
Processing...loader during turn preparation and transitions toWorking...once the LLM request is in flight. The startup token probe is now non-blocking (fire-and-forget) so the editor accepts input immediately; the context-usage footer starts from the estimated count and quietly upgrades to the real value. During a blocking compaction the foreground loader temporarily switches toCompacting...and restores the previous label when the block ends, so users see the actual reason for a long wait.text-startstream parts are now treated as visible, clearing the streaming loader as soon as the assistant view mounts (no more empty-view flicker). - Headless: emits a
turn-startlifecycle annotation and a matchingonStreamStartcallback before each LLM request; the event is dropped fromtrajectory.json(transient UX signal, nostep_id) so persisted consumers see identical output. The event fires exactly once per logical turn — overflow and no-output retries no longer re-emit it. New tests cover normal ordering,new-turnvsintermediate-stepphases, retry single-emission, and non-persistence intrajectory.json. - Headless: the persisted
schema_versionis corrected from the internalATIF-v1.6label to the actual current Harbor spec versionATIF-v1.4(<https://www.harborframework.com/docs/agents/traject...
- Harness: new
plugsuits@2.3.4
Patch Changes
- Updated dependencies [a714664]
- @ai-sdk-tool/harness@1.3.0
- @ai-sdk-tool/tui@3.1.0
- @ai-sdk-tool/headless@3.1.0
@ai-sdk-tool/tui@3.1.0
Minor Changes
- a714664: Add
defineAgent,createAgentRuntime, andAgentSessionruntime layer to harness. AddrunAgentSessionTUIandrunAgentSessionHeadlesssession adapter helpers to tui and headless. Remove deprecatedSessionStore,CheckpointHistory.fromSession(), and legacy token field aliases (completionTokens,promptTokens).
@ai-sdk-tool/headless@3.1.0
Minor Changes
- a714664: Add
defineAgent,createAgentRuntime, andAgentSessionruntime layer to harness. AddrunAgentSessionTUIandrunAgentSessionHeadlesssession adapter helpers to tui and headless. Remove deprecatedSessionStore,CheckpointHistory.fromSession(), and legacy token field aliases (completionTokens,promptTokens).
@ai-sdk-tool/harness@1.3.0
Minor Changes
- a714664: Add
defineAgent,createAgentRuntime, andAgentSessionruntime layer to harness. AddrunAgentSessionTUIandrunAgentSessionHeadlesssession adapter helpers to tui and headless. Remove deprecatedSessionStore,CheckpointHistory.fromSession(), and legacy token field aliases (completionTokens,promptTokens).
plugsuits@2.3.3
Patch Changes
- 5e0768c: Fix review issues: runAgentLoop message retention, isContextOverflowError call sites, setTimeout leak, CEA token estimation, session history separation, per-thread memory tracking, vi.mock hoisting, AgentError export, and lint cleanup
- Updated dependencies [5e0768c]
- @ai-sdk-tool/harness@1.2.4
- @ai-sdk-tool/tui@3.0.2
- @ai-sdk-tool/headless@3.0.3