Add a reject class for OOD detection by danielbusnz · Pull Request #6 · danielbusnz/routelet

danielbusnz · 2026-05-29T18:30:20Z

Fixes the overconfidence problem the report surfaced: routelet labeled gibberish as a real intent at ~98% confidence, so the confidence gate caught almost no OOD. A sixth none class, trained on generated OOD (Scripts/gen_ood.py), lets the model flag junk directly. Results on the hand-written probes: 80% OOD rejected vs 17% for the confidence gate, 0 false rejects on the holdout, holdout accuracy unchanged at 0.925. The teacher/eval never emit none; it's learned only from generated data. Report now shows the reject-class vs confidence-gate comparison. Paired with the aegis wiring (Intent::None -> defer to Claude).

A sixth "none" label, trained on generated out-of-distribution data (Scripts/gen_ood.py), lets routelet flag gibberish instead of confidently mislabeling it. On the hand-written probes it rejects 80% of OOD vs 17% for the old confidence gate, with zero false rejects on the holdout and holdout accuracy unchanged at 0.925. The teacher never emits none.

report.py now contrasts the confidence gate and the reject class on OOD caught vs real commands wrongly deferred. Drops the confidence histogram and deferral-tradeoff figures, which described the old gate mechanism.

danielbusnz added 2 commits May 29, 2026 14:30

Swap the confidence-gate figures for a reject-class comparison

28c0b83

report.py now contrasts the confidence gate and the reject class on OOD caught vs real commands wrongly deferred. Drops the confidence histogram and deferral-tradeoff figures, which described the old gate mechanism.

danielbusnz merged commit c3b9a36 into main May 29, 2026
3 checks passed

danielbusnz deleted the reject-class-ood branch May 29, 2026 18:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a reject class for OOD detection#6

Add a reject class for OOD detection#6
danielbusnz merged 2 commits into
mainfrom
reject-class-ood

danielbusnz commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danielbusnz commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant