Deploy-ready: fully static app, client-side debate engine, zero backend needed#2
Deploy-ready: fully static app, client-side debate engine, zero backend needed#2emregucerr wants to merge 29 commits into
Conversation
…, audience, debate engine, scoring, tests Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
…question gen Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
… visual assets Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
… shift table Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
…ate leaderboard and judge analysis Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
… started guide Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
…bate history Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
…for new API key Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
…ress) Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
…oard FINAL RESULTS (45 real head-to-head debates): #1 Claude Opus 4.6 (Thinking) ELO: 1590 #2 Grok 4.20 (Reasoning) ELO: 1577 #3 Grok 4.20 Multi-Agent ELO: 1560 #4 Grok 4.20 ELO: 1546 #5 GPT-5.2 Chat ELO: 1546 #6 Claude Opus 4.6 ELO: 1508 #7 Gemini 3 Flash ELO: 1459 #8 Gemini 3 Pro ELO: 1430 #9 GPT-5.4 (High) ELO: 1407 #10 Gemini 3.1 Pro Preview ELO: 1377 Stats: 1,038 API calls, 9.2M input tokens, 1.2M output tokens, 0 errors Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
…PI route - Add maxDuration=60 to /api/debate route for Vercel serverless function timeout - Fix react-hooks/immutability error: replace mutable currentPhase with index-based check - Remove unused imports across 5 files (PERSONA_MAP, Brain, TrendingDown, etc.) - Remove unused variable (winnerName in RecentDebates) - All lint checks and builds pass cleanly Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c56bf4e. Configure here.
| await asyncio.sleep(delay) | ||
| continue | ||
| stats.errors += 1 | ||
| raise |
There was a problem hiding this comment.
Retry sleep holds semaphores, blocking all concurrency
Medium Severity
All four asyncio.sleep(delay) calls during retry backoff (lines 133, 143, 171, 181) execute inside the async with global_sem and async with model_sem context managers. This means during exponential backoff (up to 32 seconds), both semaphore slots are held, blocking other concurrent requests from proceeding. With only 10 global slots and 2 per-model slots, a single rate-limited request can starve the entire benchmark's concurrency for the duration of the sleep.
Reviewed by Cursor Bugbot for commit c56bf4e. Configure here.
* Phase 0+1: Benchmark engine scaffolding - models, prompts, API client, audience, debate engine, scoring, tests Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Fix: min max_tokens for OpenAI models (>=16), handle null content in question gen Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Add benchmark runner.py for full 45-matchup benchmark Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Phase 2+3: Web app with leaderboard, arena, SSE debate API, generated visual assets Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Fix: remove nested git repo in web/, add all web files properly Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Add results sync script, fix data loading, update synced data Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Add RecentDebates sidebar, sync 2 debate results Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Add debate replay pages with full transcript, vote analysis, and vote shift table Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Generate full 45-debate sample results (2 real + 43 simulated), populate leaderboard and judge analysis Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Add comprehensive README with architecture docs, model table, getting started guide Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Add model detail pages: ELO, win rate, h2h records, judge profile, debate history Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Add fallback for empty cross-exam questions before full benchmark run Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Batch judge calls (3 at a time) to avoid credit pre-auth spikes Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Fix: always load existing debates to skip them, dotenv override=True for new API key Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Sync 3 real benchmark debates (debates 1-3 complete, debate 4 in progress) Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Sync 5 real debates, benchmark running steadily (~$1/debate) Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Sync 7 real debates Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Sync 9 real debates (10th in progress) Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Sync 11 real debates Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Sync 15 real debates ($12.58 credits remaining) Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Sync 17 real debates Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Sync 20 real debates - past halfway mark Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Sync 22 real debates (~50%) Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Parallelize benchmark: run 3 debates concurrently in batches Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Sync 29 real debates (parallel execution working, ~3x speedup) Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * Sync 32 real debates (auto top-up replenished credits) Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> * 🏆 Complete benchmark: all 45 real debates finished, final ELO leaderboard FINAL RESULTS (45 real head-to-head debates): #1 Claude Opus 4.6 (Thinking) ELO: 1590 #2 Grok 4.20 (Reasoning) ELO: 1577 #3 Grok 4.20 Multi-Agent ELO: 1560 #4 Grok 4.20 ELO: 1546 #5 GPT-5.2 Chat ELO: 1546 #6 Claude Opus 4.6 ELO: 1508 #7 Gemini 3 Flash ELO: 1459 #8 Gemini 3 Pro ELO: 1430 #9 GPT-5.4 (High) ELO: 1407 #10 Gemini 3.1 Pro Preview ELO: 1377 Stats: 1,038 API calls, 9.2M input tokens, 1.2M output tokens, 0 errors Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>
The live arena now calls OpenRouter directly from the browser instead of proxying through a serverless API route. This eliminates: - Serverless function timeout limits (was 60s, debates take 1-3 minutes) - The need for any backend/serverless infrastructure - Server-side API key handling (key never leaves the browser) The entire app is now fully static (all routes are ○ Static or ● SSG). It can be deployed on any free static hosting: Vercel, Netlify, Cloudflare Pages, or even GitHub Pages. Changes: - New: src/lib/debate-engine.ts — client-side debate orchestration - Modified: src/app/arena/page.tsx — uses debate-engine instead of /api/debate - Deleted: src/app/api/debate/route.ts — no longer needed Co-authored-by: Emre Gucer <emregucerr@users.noreply.github.com>


Summary
Makes the entire app fully static so it can be deployed for free on Vercel (or any static host) with zero backend, zero serverless functions, and zero timeout issues.
The Problem
The original
/api/debateroute ran the entire debate as a server-side function. A full debate makes 38 sequential/parallel LLM API calls to OpenRouter, taking 1-3 minutes. Vercel's free tier has a 60-second serverless function timeout — debates would frequently fail mid-way.The Solution
Move the debate orchestration entirely to the client. The API key was already user-provided and stored in localStorage, so there's no security reason to proxy through a server. OpenRouter supports CORS and is designed for direct browser calls.
What Changed
New:
src/lib/debate-engine.ts— Client-side debate enginefetch()AbortSignalModified:
src/app/arena/page.tsxrunDebate()from the client-side engineDebateScoretype instead ofRecord<string, unknown>)Deleted:
src/app/api/debate/route.tsBuild Output (Before → After)
All routes are now
○(Static) or●(SSG). Zero serverless functions.Lint & Build
npm run lint— 0 errors, 0 warningsnpm run build— all 15 pages generate successfullyAlso fixed (from first commit)
react-hooks/immutabilityerror in DebateReplay.tsx