Status: enhancement, with a concern to bake in when it lands.
Context
Tilth is validated against OpenRouter today (OpenAI SDK underneath; other OpenAI-flavour gateways are roadmap, not validated — see README). client.py already does dual-client routing for worker / evaluator / prep, but every role currently points at one provider.
The feature
Support pointing a role (or the whole harness) at a second provider/model — e.g. evaluator on provider A, worker on provider B; or a local Ollama endpoint for one role.
The concern to bake in (Pi §2.6)
The moment a role runs on a second provider, you hit the provider-capability mismatch class of error: "Grok rejects reasoning_effort," "Cerebras/xAI/Mistral reject store," "no tool-call streaming on Google." Pi maintains a generated per-model capability registry (OpenRouter + models.dev) carrying cost / image / thinking / which-knobs-tolerated flags. Tilth has no per-model capability map today.
Recommendation
Don't build the capability table speculatively. The first multi-provider-per-role error is the signal that a small table (even hand-maintained) has become cheaper than per-incident try/except. Until then, CLAUDE.md's "verify, don't guess" rule is the interim discipline (probe the live response shape before coding to it).
Related
Source
Pi-harness considerations doc, §2.6 and §1 (provider-swamp validation).
Status: enhancement, with a concern to bake in when it lands.
Context
Tilth is validated against OpenRouter today (OpenAI SDK underneath; other OpenAI-flavour gateways are roadmap, not validated — see README).
client.pyalready does dual-client routing for worker / evaluator / prep, but every role currently points at one provider.The feature
Support pointing a role (or the whole harness) at a second provider/model — e.g. evaluator on provider A, worker on provider B; or a local Ollama endpoint for one role.
The concern to bake in (Pi §2.6)
The moment a role runs on a second provider, you hit the provider-capability mismatch class of error: "Grok rejects
reasoning_effort," "Cerebras/xAI/Mistral rejectstore," "no tool-call streaming on Google." Pi maintains a generated per-model capability registry (OpenRouter + models.dev) carrying cost / image / thinking / which-knobs-tolerated flags. Tilth has no per-model capability map today.Recommendation
Don't build the capability table speculatively. The first multi-provider-per-role error is the signal that a small table (even hand-maintained) has become cheaper than per-incident
try/except. Until then, CLAUDE.md's "verify, don't guess" rule is the interim discipline (probe the live response shape before coding to it).Related
Source
Pi-harness considerations doc, §2.6 and §1 (provider-swamp validation).