| title | UX-Ray AI QA Agent |
|---|---|
| emoji | ⚡ |
| colorFrom | purple |
| colorTo | indigo |
| sdk | docker |
| pinned | false |
UX-Ray is an autonomous AI agent designed for startups and indie builders. It acts as a simulation of 1,000 real first-time users, intelligently navigating your web application to identify UX friction, discover behavior patterns, and generate developer-ready actionable checklists before you launch.
- ✨ Core Features
- 🏗️ System Architecture
- 💻 Tech Stack
- ⚙️ Environment Variables Reference
- 🚀 Local Installation
- 🕹️ Usage Guide
- 📜 License
- 🧠 Senior QA Algorithmic Explorer: UX-Ray is prompted with strict algorithmic rules. It deduces visual hierarchy based on button sizes (width/height), explicitly tests edge cases, and maps out core workflows without getting stuck in loops.
- 👁️ Spatial & Semantic Awareness: The internal Playwright engine extracts rich DOM metadata including
width,height,disabledstates,aria-labels, andhrefs. This gives the AI true "sight" of your app. - ⏳ Smart Auto-Wait Engine: Built directly into the execution engine, the AI automatically detects active spinners or loading text and waits for your app to finish loading before proceeding. This mimics true human patience.
- 🛠️ Heuristic UI Evaluation: The AI acts as a Senior UX/UI Engineer, grading your app against Nielsen's 10 Usability Heuristics. It outputs exact, developer-ready CSS and layout fixes (e.g., "Increase the contrast ratio from #555 to #333").
- 🎥 Live Session Replay: Watch the AI test your app in real-time, complete with a timeline of events, network interactions, and AI "thoughts".
UX-Ray operates on a continuous, multi-modal autonomous loop using Set-of-Mark Visual Prompting.
graph TD
A[Start Session] --> B[Playwright Launches Sandbox]
subgraph Autonomous Vision Agent Loop
B --> C[Extract DOM & Inject Bounding Boxes]
C --> D[Capture Annotated Screenshot]
D --> E{Is Page Loading?}
E -- Yes --> F[Smart Auto-Wait]
F --> C
E -- No --> G[Gemini 2.0 Multimodal Reasoning]
G --> H[Infer Next Optimal Target ID]
H --> I[Playwright Executor: Click/Type/Scroll]
I --> J[Save Annotated Screenshot to Timeline]
J --> C
end
J -. Session Complete .-> K[Gemini Multimodal UX Audit]
K --> L[Generate Actionable UI Checklist]
- Set-of-Mark Annotation: Before making a decision, the headless browser dynamically injects red numeric bounding boxes over every interactable element on the screen.
- Visual Autonomous Navigation: The agent streams the live, annotated screenshot directly to Gemini 2.0 Flash. The AI uses spatial reasoning to "look" at the screen, read the popups, and output the ID of the exact bounding box it wants to interact with.
- Actionable UX Audit: Uses the captured visual timeline to write a highly technical, bias-free UI/UX report.
| Component | Technology | Description |
|---|---|---|
| Framework | Next.js 14 | App Router, API Routes for SSE streaming |
| Database | PostgreSQL | Managed via Prisma ORM for session/event storage |
| Automation | Playwright | Headless browser execution and visual DOM annotation |
| Reasoning | Gemini 2.0 Flash | Powers the deep visual navigation and decision logic |
| Reporting | Gemini Pro | Multimodal visual analysis for the final UX Audit report |
| Styling | Tailwind CSS | Custom highly-polished developer interface |
Create a .env file in the root directory.
| Variable | Required | Description |
|---|---|---|
DATABASE_URL |
Yes | Connection string for your PostgreSQL database (e.g., Supabase) |
DIRECT_URL |
Yes | Direct connection string for Prisma migrations |
GEMINI_API_KEY |
Yes | API key from Google AI Studio for visual reasoning and reporting |
-
Clone the repository
git clone https://github.com/yourusername/ux-ray.git cd ux-ray -
Install dependencies
npm install
-
Initialize Database
npx prisma generate npx prisma db push
-
Run the Development Server
npm run dev
Open http://localhost:3000 to access the dashboard.
- Enter your target URL in the main dashboard.
- Select your personalized Developer Testing Preset (e.g., End-to-End Journey, Aggressive QA Tester, or Conversion Flow).
- Watch the live Session Explorer as the AI isolates your site in a sandbox, reads the spatial layout, and systematically tests buttons, forms, and workflows.
- Click Share report to team to distribute the actionable UX Audit directly to your engineers.
This project is licensed under the MIT License - see the LICENSE file for details.