Skip to content

User-configurations as first-class leaderboard entries (Move 2b) #257

@openclaw-dv

Description

@openclaw-dv

Context

Wesley directive 2026-05-14: a single vendor isn't one leaderboard row — vendor × configuration is. If a user hyper-optimizes synthpanel for "healthcare workers in North Central US" or "sanitation workers in K-12 public schools," they should be able to publish that configuration AND the result it produces, and other people in that niche can adopt it.

This converts SynthBench from "vendor comparison shop" → "Hugging Face Leaderboards of synthetic UXR." Community-contributed configurations become the primary research artifact.

What to build

synthbench submit --config <config.yaml> accepts a configuration artifact (in addition to vendor adapter from #256). Config schema:

vendor: synthpanel
vendor_version: "1.4.0"
config_name: "wesley-healthcare-northcentral"
contributor: "@username"
target_subgroup:
  description: "Healthcare workers in NC/MN/WI/IA"
  selector:
    geography: ["NC", "MN", "WI", "IA"]
    occupation: ["healthcare"]
packs: ["healthcare-patient", "general-consumer"]
ensemble_weights: [0.7, 0.3]
extra_prompts:
  system: |
    You are responding as a healthcare professional...
decoding:
  temperature: 0.6
  top_p: 0.9
  model: "anthropic/claude-haiku-4-5"

Leaderboard row becomes {vendor}/{config_name}. Each config submission shows:

  • Vendor, config name, contributor
  • Target subgroup description + selector
  • Score across the whole bench
  • Win-region highlight: where this config beats the vendor's default
  • Reproducible config artifact (YAML pinned to harness version)
  • Re-run button (kicks off a CI verification against the same hash)

Done when

  • Config schema documented in docs/configs.md
  • synthbench submit --config validates + produces a submission artifact
  • Leaderboard page renders configs grouped under their vendor with sortable "win region" highlight
  • At least one community-contributed config lands beyond the vendor defaults

Leading indicator

≥10 user-contributed configurations posted to the leaderboard within 90 days. Bonus signal: ≥3 of them describe a niche subgroup not addressed by any vendor default.

Open question

Gameability — see Move 4 (held-out validation) issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestleaderboardLeaderboard UX + data modelpmf-researchSurfaced from synthpanel-driven PMF research 2026-05-14vendor-onboardingLowering friction for vendor + user-config submissions

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions