Verifier-as-selector: the gate runs single-pass; best-of-N would exploit cladding's strongest asset

### Problem

cladding owns something most harnesses lack: a strong, deterministic, execution-based verifier (the 15-stage gate + spec-conformance). But the autonomous drive loop (`clad run`) uses it as a single-pass PASS/FAIL on **one** attempt. The 2025–26 consensus is that the verifier — not the generator — is the bottleneck and the moat, and that solution coverage scales with the number of candidate attempts *given a good verifier to select among them*. cladding has exactly that verifier and isn't exploiting it.

### Proposed shape

A best-of-N mode for `clad run`: generate K candidate implementations, gate each in isolation, select the green winner (rank among green by a structural rubric), keep the winner and discard the rest, audit the selection. This makes the gate a **selector**, not just a judge.

### Verified (independent A/B)

- **Mechanism (deterministic, real `selectBest`):** where the first candidate is red but a later one is green, single-pass MISSES and best-of-N HITS; among several green candidates the selector keeps the higher-quality one (fewest stub-fallbacks).
- **Coverage lift (simulation, per-candidate pass ~ Bernoulli(p)):** P(green) tracks `1-(1-p)^N` — e.g. at p=0.3, N=1 → 0.30 vs N=10 → 0.97.

### Honest scope

- `clad run` is the **experimental** autonomous surface (the supported path is host-delegated); best-of-N's reach is gated on autonomous-loop adoption.
- N>1 trades **N× generation + gate cost** for higher P(green) — worth it when a human-required halt costs more than N× compute.
- The real-generator pass-rate is **not** measured here (needs live, non-deterministic LLM runs); the A/B proves the selector mechanism + the coverage math it unlocks.

Implemented by F-ac92c812.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verifier-as-selector: the gate runs single-pass; best-of-N would exploit cladding's strongest asset #209

Problem

Proposed shape

Verified (independent A/B)

Honest scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Verifier-as-selector: the gate runs single-pass; best-of-N would exploit cladding's strongest asset #209

Description

Problem

Proposed shape

Verified (independent A/B)

Honest scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions