Context
Wesley directive 2026-05-14: a single vendor isn't one leaderboard row — vendor × configuration is. If a user hyper-optimizes synthpanel for "healthcare workers in North Central US" or "sanitation workers in K-12 public schools," they should be able to publish that configuration AND the result it produces, and other people in that niche can adopt it.
This converts SynthBench from "vendor comparison shop" → "Hugging Face Leaderboards of synthetic UXR." Community-contributed configurations become the primary research artifact.
What to build
synthbench submit --config <config.yaml> accepts a configuration artifact (in addition to vendor adapter from #256). Config schema:
vendor: synthpanel
vendor_version: "1.4.0"
config_name: "wesley-healthcare-northcentral"
contributor: "@username"
target_subgroup:
description: "Healthcare workers in NC/MN/WI/IA"
selector:
geography: ["NC", "MN", "WI", "IA"]
occupation: ["healthcare"]
packs: ["healthcare-patient", "general-consumer"]
ensemble_weights: [0.7, 0.3]
extra_prompts:
system: |
You are responding as a healthcare professional...
decoding:
temperature: 0.6
top_p: 0.9
model: "anthropic/claude-haiku-4-5"
Leaderboard row becomes {vendor}/{config_name}. Each config submission shows:
- Vendor, config name, contributor
- Target subgroup description + selector
- Score across the whole bench
- Win-region highlight: where this config beats the vendor's default
- Reproducible config artifact (YAML pinned to harness version)
- Re-run button (kicks off a CI verification against the same hash)
Done when
Leading indicator
≥10 user-contributed configurations posted to the leaderboard within 90 days. Bonus signal: ≥3 of them describe a niche subgroup not addressed by any vendor default.
Open question
Gameability — see Move 4 (held-out validation) issue.
Context
Wesley directive 2026-05-14: a single vendor isn't one leaderboard row — vendor × configuration is. If a user hyper-optimizes synthpanel for "healthcare workers in North Central US" or "sanitation workers in K-12 public schools," they should be able to publish that configuration AND the result it produces, and other people in that niche can adopt it.
This converts SynthBench from "vendor comparison shop" → "Hugging Face Leaderboards of synthetic UXR." Community-contributed configurations become the primary research artifact.
What to build
synthbench submit --config <config.yaml>accepts a configuration artifact (in addition to vendor adapter from #256). Config schema:Leaderboard row becomes
{vendor}/{config_name}. Each config submission shows:Done when
docs/configs.mdsynthbench submit --configvalidates + produces a submission artifactLeading indicator
≥10 user-contributed configurations posted to the leaderboard within 90 days. Bonus signal: ≥3 of them describe a niche subgroup not addressed by any vendor default.
Open question
Gameability — see Move 4 (held-out validation) issue.