Skip to content

feat: verifiable sandboxed bash tool (just-bash + ledger exec)#1

Open
New1Direction wants to merge 6 commits into
mainfrom
feat/sandbox-bash-tool
Open

feat: verifiable sandboxed bash tool (just-bash + ledger exec)#1
New1Direction wants to merge 6 commits into
mainfrom
feat/sandbox-bash-tool

Conversation

@New1Direction

Copy link
Copy Markdown
Owner

What

Adds a sandboxed bash tool to KorgChat, backed by a persistent just-bash Node sidecar — a JS reimplementation of bash + ~90 coreutils over an in-memory filesystem. The shell physically cannot reach the host filesystem or network (no network/python/js enabled).

Why it matters

Every exec returns fs_hash — a hash of the full virtual-FS state after the command. Because KorgChat already hash-chains each tool call into the korg ledger, embedding fs_hash makes the agent's shell session tamper-evident and replayable: the same commands from a fresh sandbox reproduce the same hashes. Verified end-to-end — the exec lands in the ledger between the LLM call and its validation, chain intact.

Surface

  • sandbox/sidecar.mjs — stdio JSON-RPC just-bash shell + deterministic FS-state hash
  • korgchat.sandboxSandboxClient, bash_tool(), tools_with_sandbox()
  • --sandbox CLI flag wires the tool into the session
  • node_modules gitignored; sandbox/ carries package.json + lockfile (one-time npm install)

Test plan

  • tests/test_sandbox.py: FS persistence across calls, deterministic-replay hash, host-FS isolation, and end-to-end ledger recording of the exec with fs_hash + chain continuity
  • Tests skip cleanly when Node/just-bash isn't present (suite stays green on a bare checkout)
  • Full suite: 162 passed, 14 skipped locally
  • CI green

A persistent just-bash Node sidecar (in-memory FS, no host/network access)
gives the agent a real shell that physically cannot touch the host. Each exec
returns fs_hash — a hash of the full virtual-FS state — and KorgChat chains
every tool call into the korg ledger, so the shell session is tamper-evident
and replayable (same commands from a fresh sandbox reproduce the same hashes).

- sandbox/sidecar.mjs: stdio JSON-RPC just-bash shell + deterministic FS hash
- korgchat.sandbox: SandboxClient, bash_tool(), tools_with_sandbox()
- --sandbox flag wires it into the CLI session
- tests: persistence, determinism/replay, host isolation, end-to-end ledger
  recording of the exec with fs_hash (skips cleanly without node)
The sandboxed bash tool can be constrained to a command allowlist, enforced
physically (just-bash registers only allowed commands) AND as a pre-exec AST
verdict that fails closed on denied or dynamically-named commands ($CMD).
Each call carries {decision, reasons, commands_used, mandate_hash} recorded to
the ledger — so what the agent was allowed to run is itself provable.

- sidecar: configure op + parse()-based command extraction + verdict
- shell_mandate(), SandboxClient(mandate=)/.configure(), tools_with_sandbox(mandate=)
- --mandate-allow CLI flag
- tests: allow/deny gating, file-untouched-on-reject, dynamic fail-closed,
  verdict recorded in the ledger
A `pay` tool that authorizes payments through the owned goldseel mandate-
enforcement model (Modal, serverless). Deterministic spend-cap floor first
(short-circuits the pay-per-call model on over-cap), then goldseel judges the
payment vs the authorized intent -> 3-way decision: ACCEPT / REJECT / ESCALATE
(unreachable defers to a human, never auto-approves). Decision + verdict +
mandate hash recorded to the ledger.

- korgchat.gate: GoldseelGate (injectable), payment_mandate(), goldseel_pay_tool()
- tests: offline with a fake judge + opt-in live test (KORGCHAT_GOLDSEEL_LIVE=1)

Dogfood note: the wiring is correct and proven live; the currently-deployed
goldseel over-rejects valid payments (a model-quality / retrain issue, separate
from this integration — the gate is model-agnostic via GOLDSEEL_URL).
korgchat.ontology: a controlled vocabulary of recipient categories (synonyms +
is-a hierarchy) + a vendor registry. The pay tool resolves known recipients
deterministically (ALLOW/DENY, no model call) and only sends genuine unknowns
to goldseel -- making the ml-inference != ai-inference false-reject structurally
impossible. It compounds: learn() writes newly-classified recipients back to the
registry (optionally persisted), so the known set grows monotonically (fewer
model calls, more consistent decisions over time -- a data network effect).

- payment_mandate() gains allow_classes/deny_classes (default deny prohibited)
- pay result records decided_by (ontology vs goldseel), floor verdict, learned
- tests: synonyms/hierarchy, ALLOW/DENY/UNKNOWN, learn+persist, and the
  pay-tool paths (allow bypasses a reject-happy model; deny bypasses an
  approve-happy model; unknown consults the model; compounding)
goldseel was trained on dollar amounts (e.g. "12.00") + a positive use-counter.
The pay tool sent amount_usdc in on-chain micros (12 -> 12000000) and a null
counter, so the model saw every $12 payment as $12M over cap (or counter
exhausted) and rejected everything. Send the dollar view + a positive counter
to the model; the canonical redemption (micros) is unchanged for settlement.

This was the real cause of goldseel's 'harshness' (compounded by an old
deployed checkpoint). With v0.3.2 @ Q8 + this format, the two-sided benchmark
goes 0 false-approve / 0 false-reject (was 0/6 approve).
The second compounding loop. When the pay gate ESCALATEs (goldseel defers or is
unreachable), the case is logged; a human resolves it; resolved escalations
export in goldseel's training format and feed the next retrain — the cases the
model couldn't handle become the ones it learns. Ontology compounds knowledge;
this compounds judgment.

- korgchat.escalation: EscalationLog (record/pending/resolve/export/persist)
- goldseel_pay_tool(escalation_log=...) logs on ESCALATE, returns escalation_id
- GoldseelGate now recognizes the model's 'escalate' verdict (was -> skip)
- tests: record/resolve/export/persist, idempotency, pay-tool wiring, unreachable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant