Speed up training and add OOD deferral analysis by danielbusnz · Pull Request #5 · danielbusnz/routelet

danielbusnz · 2026-05-29T17:58:33Z

Two changes. (1) Training: swap the exhaustive 'unique' pair sampling for oversampling (num_iterations=40, 1 epoch), ~60 min runs that kept dying become ~2 min; holdout accuracy 0.925. (2) Report: score an OOD/garbled probe set next to the holdout and add a confidence histogram + deferral-tradeoff curve. They show routelet's max-softmax saturates near 0.98 even on gibberish, so the 0.55 gate deferred ~0% of OOD; ~0.95 is the usable operating point (paired with the matching aegis tuning change).

sampling_strategy="unique" generated ~1.2M contrastive pairs on the grown pool (~60 min runs that kept dying), most of them trivially-easy near-duplicate pairs from the disfluency augmentation. oversampling with num_iterations=40 draws ~400 pairs, trains in ~2 min, holdout accuracy 0.925.

report.py scores an OOD/garbled probe set next to the holdout and adds two figures: a confidence histogram and a deferral-tradeoff curve. They show routelet is overconfident (max-softmax ~0.98 even on gibberish), so the 0.55 gate deferred almost nothing; ~0.95 is the usable operating point.

danielbusnz added 2 commits May 29, 2026 13:58

danielbusnz merged commit 61eb50f into main May 29, 2026
3 checks passed

danielbusnz deleted the report-ood-analysis branch May 29, 2026 17:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up training and add OOD deferral analysis#5

Speed up training and add OOD deferral analysis#5
danielbusnz merged 2 commits into
mainfrom
report-ood-analysis

danielbusnz commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danielbusnz commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant