[feature] Best-of-N / verifier-in-the-loop selection over the gate

## What problem does this solve

cladding owns something most harnesses lack: a strong, deterministic, execution-based **verifier** (the 15-stage gate + spec-conformance). But it's used as a single-pass PASS/FAIL on one attempt. The 2025–26 consensus is that the verifier — not the generator — is now the bottleneck and the moat, and that solution coverage scales with the number of candidate attempts *as long as you have a good verifier to select among them*. cladding has exactly that verifier and isn't exploiting it.

## Proposed shape

A `best-of-N` mode for the drive loop / `clad run`:

- Generate *K* candidate implementations of a feature (varying seed/temperature/persona framing).
- Run the gate on each (already isolated per-feature by `modules`).
- **Select the green candidate** — or, when several pass, rank by a spec-conformance rubric (oracle coverage, fewest warn-level findings).
- Keep the winner, discard the rest; record the selection in the audit log.

This makes cladding's gate a *selector*, not just a judge, and turns its verification rigor into higher first-pass conformance.

## Versioning scope (GOVERNANCE.md §2)

- [ ] **Minor** — new drive-loop mode
- [x] possibly **Major** if it touches the `clad run` / drive-loop contract — **deferring to maintainer scoping**, which is why this is an issue first.

## In-scope check (GOVERNANCE.md §4.1 / §4.2)

- [x] Not regressing Iron Law conformance — it raises selected-candidate quality
- [x] Not bypassing the anti-self-cert guard — selection is by the gate + an independent rubric, not self-judgment by the author persona
- [x] Not forking the Ironclad spec
- [x] Not cosmetic-only — ships with a test harness exercising K-candidate selection on a fixture feature

## Alternatives considered

- Single-attempt + reflect loop (retry on failure) — complementary, not a substitute. Reflect fixes *a* candidate; best-of-N explores *several* and selects. They compose.
- LLM-judge selection — rejected as primary: cladding's deterministic gate is a better, cheaper, non-self-certifying selector. An LLM rubric only tie-breaks among gate-green candidates.

## Willing to implement?

- [ ] Yes
- [x] Open to either — would want maintainer agreement on scope/contract before coding (flagship idea, larger surface).

---
_The strategic headline of a competitive-gap analysis: cladding's verifier is its differentiator; best-of-N is how it compounds._


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] Best-of-N / verifier-in-the-loop selection over the gate #200

What problem does this solve

Proposed shape

Versioning scope (GOVERNANCE.md §2)

In-scope check (GOVERNANCE.md §4.1 / §4.2)

Alternatives considered

Willing to implement?

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[feature] Best-of-N / verifier-in-the-loop selection over the gate #200

Description

What problem does this solve

Proposed shape

Versioning scope (GOVERNANCE.md §2)

In-scope check (GOVERNANCE.md §4.1 / §4.2)

Alternatives considered

Willing to implement?

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions