Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions docs/demo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# SynthPanel proof-demo artifact

Pre-run synthesis sample produced for **DAT-65** so the Designer (DAT-58) and CMO
(DAT-56) have a real, embeddable SynthPanel output to work from. Per DAT-57 CTO
scoping, this uses the **bundled** `product-research` personas and the
`market_research` instrument — no new instrument authoring.

> ⚠️ **Synthetic panel.** Every response was generated by AI personas, not human
> respondents. Do not cite as user-research data. SynthPanel produces *directional*
> evidence to decide what deserves real-world validation.

## Files

| File | What it is |
| --- | --- |
| `proof-demo-output.json` | Full JSON envelope from the run (per-persona responses, synthesis, usage, cost, provenance). |
| `proof-demo-synthesis.md` | Rendered Markdown synthesis report (`synthpanel report`) + an untruncated full-synthesis appendix. |
| `proof-demo-personas.yaml` | The 8-persona panel used (first 8 of the bundled `product-research` pack). |

## How it was produced

```bash
pip install synthpanel # v1.5.5
export OPENROUTER_API_KEY=... # only OpenRouter creds available in this env

synthpanel --output-format json panel run \
--personas docs/demo/proof-demo-personas.yaml \
--instrument examples/instruments/market_research.yaml \
--var problem="synthetic focus groups for product teams" \
--synthesis-strategy single \
--save --max-cost 2.00 \
> docs/demo/proof-demo-output.json

synthpanel report <RESULT_ID> > docs/demo/proof-demo-synthesis.md
```

Notes on deviations from the DAT-65 command sketch:

- **`--panels 8`** is not a CLI flag; panel size equals the number of supplied
personas. The 20-persona `product-research` pack was subset to its first 8
personas (`proof-demo-personas.yaml`) to get an 8-panelist run.
- **`--personas-pack`** is not a flag; the real flag is `--personas <path|pack>`.
- **Model:** the only credential available was `OPENROUTER_API_KEY`, so the run used
the default `openrouter/auto` for both panelists and synthesis. (The `haiku`/
`sonnet` aliases route to Anthropic natively and need `ANTHROPIC_API_KEY`;
`openrouter/anthropic/claude-3.7-sonnet` 404s on OpenRouter.)

## Cost

| Measure | Value |
| --- | --- |
| **Provider-reported actual (panelists)** | **$0.0237** |
| Synthesis (provider-reported) | $0.00 reported (est. $0.089) |
| **Effective actual total** | **≈ $0.02–0.11** |
| Tool headline estimate (`metadata.cost`) | $1.17 |

Both figures are under the < $2.00 target. The **$1.17 headline is the tool's local
estimate using a stale/fallback pricing table** for `openrouter/auto` — SynthPanel
itself logs `Local cost estimate diverges from provider-reported ... ratio≈98%`.
The accurate cost is the provider-reported figure (~$0.02), read from
`total_usage.provider_reported_cost` in the JSON envelope.

## Run facts

- Panel: 8 personas × 5 questions = 40 panelist Q/A pairs, **0 errors**.
- Synthesis: 7 themes, 6 agreements, 5 disagreements, 5 surprises, 1 recommendation.
- synthpanel 1.5.5 · instrument `market_research` · `--var problem="synthetic focus groups for product teams"`.
1 change: 1 addition & 0 deletions docs/demo/proof-demo-output.json

Large diffs are not rendered by default.

93 changes: 93 additions & 0 deletions docs/demo/proof-demo-personas.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
name: Product Researchers & PMs (proof-demo, n=8)
description: First 8 personas from the bundled `product-research` pack, used for the SynthPanel proof-demo
panel (DAT-65).
personas:
- name: Priya Ramanathan
age: 34
occupation: Senior PM at a B2B SaaS company
background: 'Owns discovery for a 300-user enterprise product. Ten customers willing to do interviews,
but schedule them 4-6 weeks out. Needs faster signal to validate features before writing specs.

'
personality_traits:
- PM buyer
- discovery-constrained
- speed-seeker
- name: Mateo Gonzalez
age: 29
occupation: UX researcher at a mid-size fintech
background: 'Trained in qualitative methods. Skeptical that synthetic focus groups produce valid signal.
Will test with a low-stakes question first and check against real interview data before trusting it.

'
personality_traits:
- methodologist
- validation-first
- principled skeptic
- name: Sasha Kim
age: 38
occupation: Head of product at a 15-person startup
background: 'Owns product and partial growth. No research team, no budget for one. Has been using Reddit
threads and Twitter polls for signal. Curious whether synthetic research tooling is a meaningful step
up.

'
personality_traits:
- resourceful founder-PM
- tool-collector
- scrappy
- name: David O'Brien
age: 45
occupation: Research director, 50-person team
background: 'Runs a real research org. Evaluates synthetic tools as augmentation, not replacement. Specifically
wants to know if they accelerate formative research (early concept) vs summative (post-launch).

'
personality_traits:
- research leader
- augmentation-minded
- buyer
- name: Anja Svensson
age: 31
occupation: Solo founder, B2C consumer app
background: 'Pre-launch consumer app. Can''t afford a research team, but needs to test messaging before
burning $20k on paid acquisition. Very price-sensitive. Measures "worth it" in paid-acquisition savings.

'
personality_traits:
- pre-launch founder
- price-sensitive
- ROI-quantifying
- name: Rohan Desai
age: 33
occupation: Growth PM at a Series-C consumer fintech
background: 'Runs experiments weekly. Wants synthetic panels for landing-page variant screening before
pushing to real traffic. High cadence, low-stakes per-decision — ideal fit if quality holds.

'
personality_traits:
- experimenter
- cadence-driven
- ops-minded
- name: Jenna Miller
age: 27
occupation: Freelance design researcher
background: 'Bills hourly for discovery interviews. Thinks synthetic research could either expand her
business (faster first-pass signal) or eat it (clients skip her). Curious how to price around it.

'
personality_traits:
- consultant
- pricing-strategist
- ambivalent
- name: Omar Khalil
age: 41
occupation: CTO who sometimes wears the PM hat
background: 'Technical co-founder of a dev-tools startup. Product decisions fall to him because they
can''t afford a PM. Prefers tools with Python SDKs and CLI workflows; hates no-code.

'
personality_traits:
- CTO-as-PM
- CLI-native
- developer-buyer
126 changes: 126 additions & 0 deletions docs/demo/proof-demo-synthesis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Panel Report: result-20260525-153050-22db74

> **Synthetic panel.** All responses below were generated by AI personas,
> not human respondents. Do not cite as user-research data.

## Provenance

| Field | Value |
| --- | --- |
| config_hash | d403542b5efcb968aee95e90e302c69d30cd8a400ad53520c5b4552a361737d3 |
| synthpanel_version | 1.5.5 |
| python_version | 3.11.15 |
| pricing_snapshot_date | 2026-05-10 |
| created_at | 2026-05-25T15:30:50.396377+00:00 |
| total_seconds | 631.81 |
| source_path | (loaded by ID) |
| panelist_model(s) | openrouter/auto |
| synthesis_model | openrouter/auto |

## Overview

| Field | Value |
| --- | --- |
| Personas | 8 |
| Questions | 5 |
| Rounds | 1 (default) |
| Total tokens | 119830 |
| Total cost | $1.1679 |
| Instrument | market_research |
| Shape | flat (saved via --save; no rounds metadata) |

## Per-Model Rollup

| Model | Personas | Answered | Input | Output | Total | Cost |
| --- | --- | --- | --- | --- | --- | --- |
| openrouter/auto | 8 | 40 | 36787 | 64596 | 101383 | $1.1679 |

## Per-Persona Summary

| Persona | Model | Responses | Errors | Notes |
| --- | --- | --- | --- | --- |
| Priya Ramanathan | openrouter/auto | 5 | 0 | |
| Mateo Gonzalez | openrouter/auto | 5 | 0 | |
| Sasha Kim | openrouter/auto | 5 | 0 | |
| David O'Brien | openrouter/auto | 5 | 0 | |
| Anja Svensson | openrouter/auto | 5 | 0 | |
| Rohan Desai | openrouter/auto | 5 | 0 | |
| Jenna Miller | openrouter/auto | 5 | 0 | |
| Omar Khalil | openrouter/auto | 5 | 0 | |

## Synthesis

**Status:** ran

| Field | Value |
| --- | --- |
| Model | openrouter/auto |
| Cost | $0.0886 |
| Themes | 7 |

**Summary peek:**

> Across the panel, synthetic focus groups were viewed less as a standalone category and more as part of a hybrid workflow that combines panel/recruitment tools, real-user research, and AI-assisted synthesis; Remesh was the most frequently named synthetic-adjacent option, while Dovetail, UserTesting, …

**Themes:**

- Hybrid stack mentality: Priya, Mateo, David, Jenna, and Anja all described synthetic focus groups as a combination of panel/recruitment, research execution, and synthesis tools rather than a single end-to-end winner.
- Signal quality and trust dominate: Priya, Mateo, Sasha, David, Jenna, and Omar all prioritized trustworthy, explainable, auditable outputs over flashy AI features.
- Speed and workflow fit are essential: Priya wanted outputs that drop into PRDs/readouts, Sasha wanted a 10–20 minute workflow, David emphasized brief→recommendation speed, and Rohan wanted same-day/next-day experiment decisions.
- Validation against reality is a recurring requirement: Mateo and David want calibration against real interviews, Rohan wants alignment with live traffic, and several others suggested pilots on known use cases before buying.
- Enterprise readiness matters for larger teams: Priya, Mateo, David, Rohan, and Omar repeatedly mentioned SSO, governance, privacy, retention policies, procurement readiness, and security controls as gating factors.
- Pricing is highly segment-dependent: enterprise PM/research leaders were comfortable with $3k–$12k+ monthly pricing if ROI was proven, while startup/solo/freelance respondents expected sub-$1k pricing and resisted lock-in.
- Synthetic is mostly seen as augmentation, not replacement: Mateo, David, Anja, Jenna, and Omar all explicitly framed synthetic approaches as useful for early signal generation but still requiring real-user validation for high-stakes decisions.

**Agreements:**

1. Trustworthy signal quality was the top purchase criterion for most respondents; Priya called it 'signal quality that I can trust and act on,' Mateo prioritized 'validity first,' and Jenna said credibility was 'first by a mile.'
2. Most panelists said brand trust and peer recommendations help create a shortlist but do not close the sale; Priya, Mateo, David, Rohan, Jenna, and Omar all made this point explicitly.
3. A real pilot is the expected buying motion: Priya, Mateo, David, Rohan, Jenna, and Omar all described side-by-side or low-stakes pilots using a real decision or known dataset before adoption.
4. Workflow fit matters almost as much as quality: common asks included exports to Dovetail (Mateo, Jenna), PRDs/Confluence/Jira/Notion (Priya, Omar), and low-friction growth workflows (Rohan, Sasha, Anja).
5. For meaningful adoption, the tool must be transparent and auditable; Sasha, Mateo, Jenna, and Omar all explicitly wanted evidence trails, raw outputs, reproducibility, or challengeable reasoning.
6. Switching vendors in six months is generally unlikely unless the new product proves ROI, improves signal quality, integrates cleanly, and minimizes migration risk; this pattern appears in Priya, Mateo, David, Rohan, Jenna, and Omar.

**Disagreements:**

1. Price sensitivity varied dramatically by role: Sasha and Jenna capped expected spend around $100–$600/month, Anja around $600–$1,500, while Priya, David, Rohan, and Mateo accepted $3k–$12k+ for enterprise use if ROI was clear.
2. Not everyone ranked price the same way: Anja explicitly put price/ROI first, while Priya, Mateo, David, Jenna, and Omar placed signal quality and trust ahead of price.
3. There was no consensus on whether a true category leader exists: Omar argued 'there isn’t a true synthetic focus group product yet,' while Priya, Mateo, Anja, Rohan, and Jenna immediately named Remesh and adjacent platforms.
4. The preferred workflow differed by persona: Omar wanted API/CLI/Python-first control, Priya wanted document-ready PM outputs, David emphasized formative-research workflows, and Rohan/Anja cared most about experiment velocity and traffic/ad-spend decisions.
5. Willingness to switch also varied: Rohan was relatively open if a pilot showed strong live-traffic correlation (up to 60–80%), Priya estimated 15–25% baseline, Jenna 20–40%, while Mateo and David were more conservatively 'not likely by default.'

**Recommendation:**

> Position the product as a validated decision-support workflow, not as a magical replacement for real research. Build the go-to-market around side-by-side pilots that prove calibration against real interviews or live traffic, make auditability and transparency core (raw outputs, evidence trails, reproducibility), and segment pricing/packaging aggressively: self-serve sub-$1k plans for startup/solo users, and $3k–$12k enterprise tiers with SSO, governance, Dovetail/PRD/Jira integrations, plus API…

## Failure Stats

| Field | Value |
| --- | --- |
| Total pairs | 40 |
| Errored pairs | 0 |
| Failure rate | 0.0% |

_Generated by synthpanel report. Synthetic panel — AI-generated responses, not human data._

---

## Full Synthesis (untruncated)

_The `synthpanel report` section above truncates the summary and recommendation to a preview. The complete text below is extracted verbatim from `proof-demo-output.json` (`synthesis.*`)._

### Executive Summary

Across the panel, synthetic focus groups were viewed less as a standalone category and more as part of a hybrid workflow that combines panel/recruitment tools, real-user research, and AI-assisted synthesis; Remesh was the most frequently named synthetic-adjacent option, while Dovetail, UserTesting, Qualtrics, Forsta, Cint/Dynata/Toluna, and Zappi filled adjacent roles. The strongest cross-panel signal was that buyers will not choose on brand or feature count alone: they want trustworthy, auditable output that maps to real-user behavior or live traffic, plus fast time-to-insight and clean workflow integration. Willingness to pay split sharply by persona—startup/solo users like Sasha, Jenna, and Anja clustered around $100–$1,000/month, while enterprise-oriented respondents like Priya, David, Rohan, and Mateo accepted low-to-mid thousands or more if ROI, security, and integrations were proven. Most respondents said switching in the next six months is unlikely without a side-by-side pilot showing validated signal, measurable ROI, enterprise/security fit, and low migration friction.

### Recommendation

Position the product as a validated decision-support workflow, not as a magical replacement for real research. Build the go-to-market around side-by-side pilots that prove calibration against real interviews or live traffic, make auditability and transparency core (raw outputs, evidence trails, reproducibility), and segment pricing/packaging aggressively: self-serve sub-$1k plans for startup/solo users, and $3k–$12k enterprise tiers with SSO, governance, Dovetail/PRD/Jira integrations, plus API/CLI options for technical teams like Omar’s.

### Surprises

- Brand strength was consistently de-emphasized; even enterprise-oriented respondents like Priya and David treated it mainly as a risk reducer, not a purchase driver.
- Omar’s response surfaced an unmet market opportunity: he explicitly said there is no true polished synthetic focus-group product yet and asked for a reproducible, API/CLI-driven workflow rather than a dashboard-first tool.
- Rohan introduced an unusually concrete proof standard—wanting synthetic results to correlate with live traffic outcomes, even citing a rough threshold (e.g., ≥0.75 correlation).
- Several respondents cared less about 'research features' and more about downstream decision artifacts: Priya wanted PRD-ready outputs, Sasha wanted shareable decision docs, and Jenna wanted client-defendable deliverables.
- The willingness-to-pay spread was very wide, suggesting this is not one market but several: startup/founder/solo buyers and enterprise PM/research buyers likely need entirely different packaging and sales motions.
Loading