What problem does this solve
cladding owns something most harnesses lack: a strong, deterministic, execution-based verifier (the 15-stage gate + spec-conformance). But it's used as a single-pass PASS/FAIL on one attempt. The 2025–26 consensus is that the verifier — not the generator — is now the bottleneck and the moat, and that solution coverage scales with the number of candidate attempts as long as you have a good verifier to select among them. cladding has exactly that verifier and isn't exploiting it.
Proposed shape
A best-of-N mode for the drive loop / clad run:
- Generate K candidate implementations of a feature (varying seed/temperature/persona framing).
- Run the gate on each (already isolated per-feature by
modules).
- Select the green candidate — or, when several pass, rank by a spec-conformance rubric (oracle coverage, fewest warn-level findings).
- Keep the winner, discard the rest; record the selection in the audit log.
This makes cladding's gate a selector, not just a judge, and turns its verification rigor into higher first-pass conformance.
Versioning scope (GOVERNANCE.md §2)
In-scope check (GOVERNANCE.md §4.1 / §4.2)
Alternatives considered
- Single-attempt + reflect loop (retry on failure) — complementary, not a substitute. Reflect fixes a candidate; best-of-N explores several and selects. They compose.
- LLM-judge selection — rejected as primary: cladding's deterministic gate is a better, cheaper, non-self-certifying selector. An LLM rubric only tie-breaks among gate-green candidates.
Willing to implement?
The strategic headline of a competitive-gap analysis: cladding's verifier is its differentiator; best-of-N is how it compounds.
What problem does this solve
cladding owns something most harnesses lack: a strong, deterministic, execution-based verifier (the 15-stage gate + spec-conformance). But it's used as a single-pass PASS/FAIL on one attempt. The 2025–26 consensus is that the verifier — not the generator — is now the bottleneck and the moat, and that solution coverage scales with the number of candidate attempts as long as you have a good verifier to select among them. cladding has exactly that verifier and isn't exploiting it.
Proposed shape
A
best-of-Nmode for the drive loop /clad run:modules).This makes cladding's gate a selector, not just a judge, and turns its verification rigor into higher first-pass conformance.
Versioning scope (GOVERNANCE.md §2)
clad run/ drive-loop contract — deferring to maintainer scoping, which is why this is an issue first.In-scope check (GOVERNANCE.md §4.1 / §4.2)
Alternatives considered
Willing to implement?
The strategic headline of a competitive-gap analysis: cladding's verifier is its differentiator; best-of-N is how it compounds.