feat: verifiable sandboxed bash tool (just-bash + ledger exec)#1
Open
New1Direction wants to merge 6 commits into
Open
feat: verifiable sandboxed bash tool (just-bash + ledger exec)#1New1Direction wants to merge 6 commits into
New1Direction wants to merge 6 commits into
Conversation
A persistent just-bash Node sidecar (in-memory FS, no host/network access) gives the agent a real shell that physically cannot touch the host. Each exec returns fs_hash — a hash of the full virtual-FS state — and KorgChat chains every tool call into the korg ledger, so the shell session is tamper-evident and replayable (same commands from a fresh sandbox reproduce the same hashes). - sandbox/sidecar.mjs: stdio JSON-RPC just-bash shell + deterministic FS hash - korgchat.sandbox: SandboxClient, bash_tool(), tools_with_sandbox() - --sandbox flag wires it into the CLI session - tests: persistence, determinism/replay, host isolation, end-to-end ledger recording of the exec with fs_hash (skips cleanly without node)
The sandboxed bash tool can be constrained to a command allowlist, enforced
physically (just-bash registers only allowed commands) AND as a pre-exec AST
verdict that fails closed on denied or dynamically-named commands ($CMD).
Each call carries {decision, reasons, commands_used, mandate_hash} recorded to
the ledger — so what the agent was allowed to run is itself provable.
- sidecar: configure op + parse()-based command extraction + verdict
- shell_mandate(), SandboxClient(mandate=)/.configure(), tools_with_sandbox(mandate=)
- --mandate-allow CLI flag
- tests: allow/deny gating, file-untouched-on-reject, dynamic fail-closed,
verdict recorded in the ledger
A `pay` tool that authorizes payments through the owned goldseel mandate- enforcement model (Modal, serverless). Deterministic spend-cap floor first (short-circuits the pay-per-call model on over-cap), then goldseel judges the payment vs the authorized intent -> 3-way decision: ACCEPT / REJECT / ESCALATE (unreachable defers to a human, never auto-approves). Decision + verdict + mandate hash recorded to the ledger. - korgchat.gate: GoldseelGate (injectable), payment_mandate(), goldseel_pay_tool() - tests: offline with a fake judge + opt-in live test (KORGCHAT_GOLDSEEL_LIVE=1) Dogfood note: the wiring is correct and proven live; the currently-deployed goldseel over-rejects valid payments (a model-quality / retrain issue, separate from this integration — the gate is model-agnostic via GOLDSEEL_URL).
korgchat.ontology: a controlled vocabulary of recipient categories (synonyms + is-a hierarchy) + a vendor registry. The pay tool resolves known recipients deterministically (ALLOW/DENY, no model call) and only sends genuine unknowns to goldseel -- making the ml-inference != ai-inference false-reject structurally impossible. It compounds: learn() writes newly-classified recipients back to the registry (optionally persisted), so the known set grows monotonically (fewer model calls, more consistent decisions over time -- a data network effect). - payment_mandate() gains allow_classes/deny_classes (default deny prohibited) - pay result records decided_by (ontology vs goldseel), floor verdict, learned - tests: synonyms/hierarchy, ALLOW/DENY/UNKNOWN, learn+persist, and the pay-tool paths (allow bypasses a reject-happy model; deny bypasses an approve-happy model; unknown consults the model; compounding)
goldseel was trained on dollar amounts (e.g. "12.00") + a positive use-counter. The pay tool sent amount_usdc in on-chain micros (12 -> 12000000) and a null counter, so the model saw every $12 payment as $12M over cap (or counter exhausted) and rejected everything. Send the dollar view + a positive counter to the model; the canonical redemption (micros) is unchanged for settlement. This was the real cause of goldseel's 'harshness' (compounded by an old deployed checkpoint). With v0.3.2 @ Q8 + this format, the two-sided benchmark goes 0 false-approve / 0 false-reject (was 0/6 approve).
The second compounding loop. When the pay gate ESCALATEs (goldseel defers or is unreachable), the case is logged; a human resolves it; resolved escalations export in goldseel's training format and feed the next retrain — the cases the model couldn't handle become the ones it learns. Ontology compounds knowledge; this compounds judgment. - korgchat.escalation: EscalationLog (record/pending/resolve/export/persist) - goldseel_pay_tool(escalation_log=...) logs on ESCALATE, returns escalation_id - GoldseelGate now recognizes the model's 'escalate' verdict (was -> skip) - tests: record/resolve/export/persist, idempotency, pay-tool wiring, unreachable
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a sandboxed
bashtool to KorgChat, backed by a persistent just-bash Node sidecar — a JS reimplementation of bash + ~90 coreutils over an in-memory filesystem. The shell physically cannot reach the host filesystem or network (no network/python/js enabled).Why it matters
Every
execreturnsfs_hash— a hash of the full virtual-FS state after the command. Because KorgChat already hash-chains each tool call into the korg ledger, embeddingfs_hashmakes the agent's shell session tamper-evident and replayable: the same commands from a fresh sandbox reproduce the same hashes. Verified end-to-end — the exec lands in the ledger between the LLM call and its validation, chain intact.Surface
sandbox/sidecar.mjs— stdio JSON-RPC just-bash shell + deterministic FS-state hashkorgchat.sandbox—SandboxClient,bash_tool(),tools_with_sandbox()--sandboxCLI flag wires the tool into the sessionnode_modulesgitignored;sandbox/carriespackage.json+ lockfile (one-timenpm install)Test plan
tests/test_sandbox.py: FS persistence across calls, deterministic-replay hash, host-FS isolation, and end-to-end ledger recording of the exec withfs_hash+ chain continuity