Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]

### Added
- **Sandboxed `bash` tool with verifiable exec (`--sandbox`).** New `korgchat.sandbox` module adds a `bash` tool backed by [just-bash](https://github.com/vercel-labs/just-bash) — a JS reimplementation of bash + ~90 coreutils over an in-memory filesystem, run as a persistent Node sidecar. The shell physically cannot reach the host filesystem or network (no network/python/js by default). Every `exec` returns `fs_hash`, a hash of the full virtual-filesystem state after the command; because each tool call is hash-chained into the ledger, the agent's shell session is **tamper-evident and replayable** — the same commands from a fresh sandbox reproduce the same hashes. Exports `SandboxClient` (stdio JSON-RPC), `bash_tool()`, and `tools_with_sandbox()`; enable in the CLI with `--sandbox` (requires Node ≥18 and `npm install` in `sandbox/`).
- **Mandate-gated shell (`--mandate-allow`).** The sandboxed `bash` tool can be constrained to a command allowlist. Enforced two ways: just-bash only registers the allowed commands (physical), and each line is parsed before exec so a disallowed or dynamically-named command (`$CMD`) is rejected — **fail-closed**. Every call carries a verdict (`{decision, reasons, commands_used, mandate_hash}`) recorded to the ledger, so what the agent was *allowed* to run is itself provable. New `shell_mandate(allow, deny)`; `SandboxClient(mandate=...)` / `.configure()`; `tools_with_sandbox(mandate=...)`; CLI `--mandate-allow ls,cat,grep,...`.
- **goldseel-gated `pay` tool.** New `korgchat.gate` module: a `pay` tool that authorizes a payment through the owned **goldseel** mandate-enforcement model (served on Modal, serverless). A deterministic spend-cap floor runs first — and short-circuits the pay-per-call model on an over-cap payment; goldseel then judges the payment against the authorized intent. Maps to a three-way decision: **REJECT** (cap or goldseel), **ESCALATE** (goldseel unreachable → defer to a human, *never* auto-approve), **ACCEPT** (within cap and approved). The decision + verdict + mandate hash are recorded to the ledger, so what an agent was allowed to spend is provable. New `GoldseelGate` (injectable), `payment_mandate()`, `goldseel_pay_tool()`. Offline tests use a fake judge; a live endpoint test is opt-in (`KORGCHAT_GOLDSEEL_LIVE=1`).
- **Recipient-category ontology — the gate's deterministic knowledge floor (`korgchat.ontology`).** A controlled vocabulary of recipient categories with **synonyms** and an **is-a hierarchy** (`ml-inference` ≡ `ai-inference` ≡ `llm-inference`, all *is-a* `ai-compute`; `gambling`/`adult`/`crypto-trading` *is-a* `prohibited`), plus a seeded **vendor registry**. The `pay` tool now resolves *known* recipients deterministically — **ALLOW/DENY without a model call** — and only genuine unknowns reach goldseel, making the `ml-inference ≠ ai-inference` false-reject *structurally impossible*. **It compounds:** `learn()` writes newly-classified recipients back to the registry (optionally persisted), so the known set grows monotonically — more decisions → fewer model calls → more consistent outcomes (a data network effect). The `pay` result records `decided_by` (ontology vs goldseel), the floor verdict, and what was learned. `payment_mandate()` gains `allow_classes` / `deny_classes` (default deny `["prohibited"]`).
- **Escalation harvest — the second compounding loop (`korgchat.escalation`).** When the pay gate **ESCALATEs** (goldseel defers, or is unreachable), the case is logged; a human resolves it (approve/reject + why); resolved escalations export in goldseel's training format and feed the *next* retrain. So the cases the model *couldn't* handle become the ones it *learns* — the ontology compounds **knowledge**, this compounds **judgment**. `EscalationLog` (`record` / `pending` / `resolve` / `export_training_cases` / `write_training_jsonl`); `goldseel_pay_tool(escalation_log=...)` logs on ESCALATE and returns an `escalation_id`; `GoldseelGate` now recognizes the model's `escalate` verdict (it was collapsing to `skip`).
- **Auto-context injection is now a first-class ledger event.** Previously the recall-augmented preamble the model actually saw was a *ghost* — the journal recorded only the user's original prompt. Now, whenever auto-context injects a preamble, a `context_injection` event is written capturing the preamble text, the recall query, and the matched `seq_id`s + scores, causally chained `user_prompt → context_injection → llm_inference`. The user_prompt event still records only what the user typed; the injected context is a separate, auditable, replayable event. New `AutoContextEngine.build_context()` returns a `ContextInjection` (preamble + structured matches); `build_preamble()` is now a thin wrapper over it.
- **Tool-schema snapshot + conformance events.** Every tool execution is now bracketed by two events: a `tool_schema_snapshot` *before* the call (the declared `input_schema`, `description`, and a deterministic `schema_hash`) and a `tool_validation` *after* (did the call's input conform to the declared schema? did the call succeed?). A replayed conversation stays meaningful even after a tool's schema changes — the contract it ran against is frozen on the ledger, and a stale call is detectable. New `korgchat.schema` module: `schema_hash()` (canonical sha256, byte-for-byte aligned with `korg-ledger@v1` canonicalization) and a dependency-free `validate_input()`.

Expand Down
97 changes: 97 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,103 @@ korgchat --mock --no-stream
# Tune mock-mode streaming speed (default 0.005s/char):
korgchat --mock --stream-delay 0.05 # slow & visible
korgchat --mock --stream-delay 0 # instant

# Give the agent a sandboxed `bash` tool (verifiable exec):
cd sandbox && npm install && cd .. # one-time: pulls just-bash
korgchat --sandbox
```

## Sandboxed shell (`--sandbox`)

`--sandbox` adds a `bash` tool backed by
[just-bash](https://github.com/vercel-labs/just-bash) — a JS reimplementation
of bash + ~90 coreutils over an **in-memory** filesystem, run as a persistent
Node sidecar (`sandbox/sidecar.mjs`). The shell physically cannot reach the
host filesystem or network (no network/python/js are enabled).

Every command returns `fs_hash` — a hash of the full virtual-filesystem state
after it runs. Because each tool call is hash-chained into the ledger, the
agent's shell session becomes **tamper-evident and replayable**: the same
commands from a fresh sandbox reproduce the same hashes.

```python
from korgchat import ChatSession
from korgchat.sandbox import tools_with_sandbox

session = ChatSession(journal_path=..., responder=..., tools=tools_with_sandbox())
```

**Mandate (`--mandate-allow`).** Constrain the shell to a command allowlist.
It's enforced two ways — just-bash only registers the allowed commands
(physical), and each line is parsed before exec so a disallowed or
dynamically-named command (`$CMD`) is rejected (fail-closed). Every call
carries a verdict that's recorded to the ledger, so what the agent was
*allowed* to run is itself provable.

```bash
korgchat --mandate-allow "ls,cat,grep,sed,awk,find,sort,wc,head,tail,echo"
```

```python
from korgchat.sandbox import tools_with_sandbox, shell_mandate

tools = tools_with_sandbox(mandate=shell_mandate(["ls", "cat", "grep"], deny=["rm"]))
```

Requires Node ≥18 and a one-time `npm install` in `sandbox/`.

## Payments (goldseel-gated)

The `pay` tool (`korgchat.gate`) authorizes a payment through **goldseel**, an
owned mandate-enforcement model served on Modal. A deterministic spend-cap runs
first; goldseel then judges the payment against the authorized intent. The
outcome is three-way:

- **ACCEPT** — within cap and goldseel approved
- **REJECT** — over cap, or goldseel rejected
- **ESCALATE** — goldseel unreachable → defer to a human (never auto-approved)

```python
from korgchat import ChatSession
from korgchat.gate import goldseel_pay_tool, payment_mandate
from korgchat.tools import default_tools

tools = default_tools()
tools.register(goldseel_pay_tool(payment_mandate(
"Pay only for AI inference / GPU compute. No gambling, adult, or crypto-trading.",
spend_cap_usd=50,
)))
session = ChatSession(journal_path=..., responder=..., tools=tools)
```

The decision, the goldseel verdict, and the mandate hash are recorded to the
ledger — so *what an agent was allowed to spend, and why,* is provable. The
gate is model-agnostic: point `GOLDSEEL_URL` at any goldseel deployment.

### Knowledge floor (ontology) — and how it compounds

Before goldseel is consulted, the `pay` tool resolves the recipient against a
**category ontology** (`korgchat.ontology`): a controlled vocabulary with
synonyms and an is-a hierarchy (`ml-inference` ≡ `ai-inference` ≡
`llm-inference`, all *is-a* `ai-compute`; `gambling`/`adult`/`crypto-trading`
*is-a* `prohibited`) plus a vendor registry. **Known recipients resolve
deterministically** — ALLOW/DENY with no model call — so `ml-inference ≠
ai-inference` mistakes are impossible, and the model is only spent on genuine
unknowns.

It **compounds**: every newly-classified recipient is learned back into the
registry (`learn()`, optionally persisted), so the known set grows
monotonically — the more decisions the system makes, the fewer reach the model
and the more consistent it gets (a data network effect). Each `pay` result
records `decided_by` (`ontology` vs `goldseel`) so the audit shows *which*
layer decided.

```python
from korgchat.gate import payment_mandate, goldseel_pay_tool
tool = goldseel_pay_tool(payment_mandate(
"Pay only for AI inference / GPU compute. No gambling.",
spend_cap_usd=50, allow_classes=["ai-compute"], deny_classes=["prohibited"],
))
```

## Streaming (v0.4.2)
Expand Down
1 change: 1 addition & 0 deletions sandbox/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
node_modules/
Loading