feat(vision): Tier-2 AI screen evaluation (deepens score to core+vision) by oxxo · Pull Request #6 · oxxo/protoscan

oxxo · 2026-06-10T20:11:43Z

Summary

The "analizá la pantalla con IA" Pro tier. A vision model evaluates each screen and emits graded Metric[] for 6 subjective Tier-2 dimensions — clutter, saliency, feedback, consistency, affordance, guidance — that feed the 0-100 score and flip its tier to core+vision, filling the previously-grey radar axes. The free deterministic Tier-1 score is unchanged.

What it does

core: 4 new Tier-2 dimensions added across the 5 catalog sites. Free-score invariant test added (core-only score is independent of the Tier-2 catalog size — verified byte-identical on EDET: 70/100, 8 dims).
vision: provider abstraction (providers/: VisionProvider + OpenAIVisionProvider + Claude stub + selectProvider) so GPT-4o ships now and Claude slots in later. Holistic prompt returns {findings, dimensions} with temperature 0 + JSON mode (low run-to-run variance). Each dimension scores 0-100 or null = na when the model can't judge it from a single static screenshot (no hallucinated numbers). Defensive sanitizeVisionResponse drops malformed items. analyzeVision/analyzeVisionProxy return VisionResult {issues, metrics, coverage}.
cli: --vision (implies --score), --provider, wires additionalMetrics.
mcp: vision arg (Pro-gated like simulate) + additionalMetrics + forces score.
server proxy (apps/web/api/vision.ts): holistic prompt + {findings, dimensions} — additive/back-compat (old clients read .findings; old servers degrade to metrics: []).
reporters: radar <3-dim fallback; "core = deterministic / vision = AI-assessed (may vary)" note (terminal + HTML).

Audit

Plan adversarially audited; the genuine gaps (Adenda §A-§G) are folded in: non-determinism (temp 0 + clear labeling), per-dimension na/abstention (judge-ability from one static frame), malformed-item validation, partial-data coverage, dual-emit source tag, server back-compat. (Most "CRITICAL" audit findings were the plan's own Phase A-D tasks — false positives from auditing a plan as code.)

Verification

100 tests (93 core + 7 vision), tsc --noEmit strict clean across core/vision/cli/mcp. No free-tier regression (EDET deterministic 70/100 unchanged).
Not yet runnable end-to-end: the real --vision AI call needs OPENAI_API_KEY (BYOK) or PROTOSCAN_API_KEY + the Vercel server redeployed (Fase D). @protoscan/vision stays proprietary (not published MIT).

🤖 Generated with Claude Code

…+vision Adds the proprietary AI vision Tier-2: a vision model evaluates each screen and emits graded Metric[] for 6 subjective dimensions (clutter, saliency, feedback, consistency, affordance, guidance) that feed the 0-100 score and flip it to `core+vision`. The free deterministic Tier-1 score is unchanged. - core: 4 new Tier-2 dimensions across the 5 catalog sites (types, dimensions x3, metric DIMENSION_ISSUE_META) + free-score invariant test (core-only score is independent of Tier-2 catalog size). - vision: provider abstraction (providers/: VisionProvider + OpenAIVisionProvider + Claude stub + selectProvider). Holistic prompt returns {findings, dimensions} (temperature 0 + JSON mode). Per-dimension score 0-100 or null=na (model abstains when it cannot judge a dim from one static screenshot). Defensive parse (sanitizeVisionResponse) drops malformed items. analyze* return VisionResult {issues, metrics, coverage}. COST_PER_SCREEN 0.005 -> 0.009. - cli: --vision implies --score, --provider flag, wires additionalMetrics. - mcp: vision arg (Pro-gated like simulate) + additionalMetrics + force score. - server proxy (apps/web/api/vision.ts): holistic prompt + {findings,dimensions} (additive — old clients read .findings; old servers degrade to metrics:[]). - reporters: radar <3-dim fallback; "core deterministic vs vision AI-assessed (may vary)" note in terminal + HTML. 100 tests (93 core + 7 vision), tsc strict clean across core/vision/cli/mcp. EDET deterministic score byte-identical (70/100, 8 dims) — no free-tier regression. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vision): Tier-2 AI screen evaluation (deepens score to core+vision)#6

feat(vision): Tier-2 AI screen evaluation (deepens score to core+vision)#6
oxxo wants to merge 1 commit into
masterfrom
feat/tier2-ai-vision

oxxo commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

oxxo commented Jun 10, 2026

Summary

What it does

Audit

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant