A collection of cutting-edge AI-powered tools: screenshot-to-code generation, MCP server infrastructure, UI animation showcases, and an Antigravity IDE landing page.
Vision is an ongoing AI tooling monorepo experimenting with multimodal LLM capabilities — from converting screenshots directly into production-ready UI code, to building MCP servers that give AI agents real tools, to creating showcase UIs for animation techniques and the Antigravity IDE ecosystem.
- Overview
- Architecture Diagram
- Monorepo Structure
- Modules
- Tech Stack
- Getting Started
- Roadmap
- Contributing
Vision is a monorepo housing multiple experimental and production-bound AI tools, all built around the same core theme: giving AI agents real superpowers — whether that's eyes (screenshot understanding), hands (MCP tool access), or a voice (landing pages and showcases).
The project is actively developed with an emphasis on:
- Multimodal AI — using vision-capable LLMs to understand and reproduce UI designs
- MCP (Model Context Protocol) — exposing tools to AI agents via the open standard pioneered by Anthropic
- Frontend craft — high-quality CSS animation techniques and TypeScript-first web development
- Antigravity IDE ecosystem — building tools and sites for Google's agentic coding IDE
The SVG architecture diagram is included in this repo at
vision-architecture.svg. Embed it in your README like so:
The diagram covers all 5 architectural layers:
- Monorepo Root — all 5 modules branching from the
Vision/root - screenshot-to-code — vision LLM → layout analysis → component generation pipeline
- mcp-brain — MCP server, tool registry, and AI host adapter layer
- animation-showcase — CSS engine + JS controller + browser render pipeline
- antigravity-website — TypeScript frontend → Docker → CDN hosting chain
- Shared Infrastructure — Node.js, Python, TypeScript, Docker, Vite, Git
- External AI Integrations — OpenAI, Anthropic, Gemini, Antigravity IDE, Browser APIs, MCP Protocol
Vision/
│
├── _write_test/ # Write evaluation test harness
│ └── ... # Prompt / output eval suite
│
├── animation-showcase/ # CSS & JS animation techniques showcase
│ ├── index.html # Demo entry point
│ ├── styles/ # Keyframe, variable, and animation CSS
│ └── scripts/ # Intersection Observer, GSAP controllers
│
├── antigravity-website/ # Antigravity IDE landing page
│ ├── src/ # TypeScript component source
│ ├── public/ # Static assets
│ ├── Dockerfile # Container build for deployment
│ └── package.json # Frontend dependencies
│
├── mcp-brain/ # MCP server infrastructure
│ ├── server.py # MCP server entrypoint (Python)
│ ├── tools/ # Tool definitions (web, fs, exec, api)
│ ├── transport/ # stdio / HTTP / SSE transport adapters
│ └── schema/ # JSON schema tool specs
│
├── screenshot-to-code/ # Screenshot → UI code pipeline
│ ├── backend/ # Python FastAPI + vision LLM integration
│ │ ├── main.py # API entrypoint
│ │ ├── vision.py # LLM image analysis module
│ │ ├── generator.py # HTML/CSS/React code generation
│ │ └── requirements.txt # Python dependencies
│ └── frontend/ # TypeScript drag-drop UI
│ ├── src/ # React components
│ └── package.json # Frontend dependencies
│
└── package-lock.json # Root npm lockfile (lockfileVersion 3)
Convert any screenshot, mockup, or wireframe directly into clean HTML, CSS, or React code using multimodal LLMs.
How it works:
- User uploads an image (PNG/JPG/WebP) via drag-and-drop frontend
- Image sent to FastAPI backend as
multipart/form-data - Vision LLM (GPT-4o or Claude) analyzes layout, colors, typography, and spacing
- Code generator emits structured HTML/CSS or JSX output
- Output displayed in live preview panel with copy/download option
Pipeline stages:
Image Input → Vision LLM → Layout Analysis → Style Extraction → Component Generation → Code Output
Stack: Python · FastAPI · OpenAI Vision API / Anthropic Claude · TypeScript · React · Tailwind CSS
Status: 🚧 Active development
A configurable MCP (Model Context Protocol) server that exposes tools to AI agents like Antigravity, Claude Code, and Cursor.
What is MCP? The Model Context Protocol is an open standard that lets AI agents call real tools — file system, web, APIs, databases — through a structured JSON schema interface. mcp-brain is a custom MCP server implementing this protocol.
Exposed tools (planned/in-progress):
| Tool | Description | Transport |
|---|---|---|
web_search |
Fetch and scrape live web data | HTTP |
file_read / file_write |
Read and write project files | stdio |
code_exec |
Run shell commands and scripts | stdio |
api_bridge |
Connect to external services | HTTP |
Transport modes: stdio (local agents) · HTTP + SSE (remote agents)
Compatible with: Antigravity IDE · Claude Code · Cursor · Gemini CLI · Any MCP-compliant host
Stack: Python · TypeScript · MCP Protocol · JSON Schema · FastAPI
Status: 🚧 Core protocol scaffolding underway
A curated showcase of advanced CSS and JavaScript animation techniques for reference and inspiration.
Techniques demonstrated:
| Category | Techniques |
|---|---|
| CSS-native | Keyframe animations, custom properties, clip-path morphing, 3D transforms |
| Scroll-driven | Intersection Observer triggers, scroll-linked progress |
| Interaction | Hover states, focus transitions, micro-interactions |
| Advanced | Particle systems, SVG morphing, Lottie JSON playback, Canvas API |
| Planned | WebGL shaders, GSAP timeline demos, physics-based spring animations |
Stack: HTML5 · CSS3 · Vanilla JS · GSAP · Canvas API · Lottie
Status: 🚧 Adding new animations progressively
A landing page for the Antigravity AI coding IDE ecosystem — Google's agentic Gemini-powered development environment.
Page sections:
| Section | Status |
|---|---|
| Hero / headline | 🚧 In progress |
| Feature highlights | 🚧 In progress |
| Live demo embed | 🔮 Planned |
| Pricing tiers | 🔮 Planned |
| CTA / Sign up | 🔮 Planned |
| Blog / Docs links | 🔮 Planned |
| Footer / Legal | 🔮 Planned |
Deployment: Containerized via Dockerfile — deployable to Vercel, Netlify, or any Docker-compatible host.
Stack: TypeScript · Vite · React · Tailwind CSS · Docker
Status: 🚧 Early frontend scaffolding
Internal test harness for evaluating AI writing and code generation output quality.
Used to benchmark prompt variations, measure output consistency, and validate that LLM responses meet quality bars for other modules in the repo.
Status: 🔮 Experimental / internal tooling
| Technology | Version | Module | Purpose |
|---|---|---|---|
| Python | 3.10+ | screenshot-to-code, mcp-brain | Core runtime |
| FastAPI | 0.100+ | screenshot-to-code | REST API framework |
| Uvicorn | Latest | screenshot-to-code | ASGI server |
| OpenAI SDK | Latest | screenshot-to-code | GPT-4o Vision API |
| Anthropic SDK | Latest | mcp-brain | Claude API |
| Pillow | 10.x | screenshot-to-code | Image I/O |
| OpenCV | 4.x | screenshot-to-code | Image preprocessing |
| python-dotenv | Latest | All Python | Env config |
| Technology | Version | Module | Purpose |
|---|---|---|---|
| TypeScript | 5.x | antigravity-website, screenshot-to-code | Type-safe JS |
| React | 18+ | screenshot-to-code | UI framework |
| Vite | 5.x | antigravity-website | Build tool + HMR |
| Tailwind CSS | 3.x | All frontend | Utility CSS |
| GSAP | 3.x | animation-showcase | Animation library |
| HTML5 / CSS3 | — | animation-showcase | Native animations |
| Technology | Purpose |
|---|---|
| Docker | Container runtime for antigravity-website |
| Node.js 18+ | JS runtime, npm workspace root |
| npm v3 (lockfile) | Package management |
| Git / GitHub | Version control, public repo |
| Protocol | Used In | Purpose |
|---|---|---|
| MCP (stdio) | mcp-brain | Local agent tool communication |
| MCP (HTTP/SSE) | mcp-brain | Remote agent streaming transport |
| REST / JSON | screenshot-to-code | Client-server API |
| JSON Schema | mcp-brain | Tool definition specs |
| Requirement | Version | Check |
|---|---|---|
| Python | 3.10+ | python --version |
| Node.js | 18+ | node --version |
| Docker | Latest | docker --version |
| npm | 9+ | npm --version |
git clone https://github.com/Aka-Nine/Vision.git
cd Visioncd screenshot-to-code/backend
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# Create .env
echo "OPENAI_API_KEY=your-key-here" > .env
uvicorn main:app --reloadFrontend:
cd ../frontend
npm install && npm run devcd mcp-brain
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# Run MCP server (stdio mode for local agents)
python server.py --transport stdioTo connect with Antigravity IDE, add to your mcp.json:
{
"mcpServers": {
"vision-brain": {
"command": "python",
"args": ["path/to/Vision/mcp-brain/server.py"]
}
}
}cd antigravity-website
npm install
npm run devOr with Docker:
docker build -t antigravity-site .
docker run -p 3000:3000 antigravity-siteNo build step needed — open animation-showcase/index.html directly in a browser, or serve it:
cd animation-showcase
npx serve .- Component-level code generation (React, Vue, Svelte)
- Multi-frame / multi-page detection
- Figma export integration
- Dark/light mode variant generation
- CSS variable extraction pipeline
- Complete stdio transport implementation
- HTTP + SSE streaming transport
- Web search tool (Playwright-based)
- File system tool (sandboxed read/write)
- Shell execution tool (scoped)
- Tool authentication & rate limiting
- Tool schema auto-generation from Python functions
- WebGL particle system demo
- GSAP timeline showcase
- Spring physics animations
- Scroll-linked progress bars
- Searchable / filterable demo index
- Complete all page sections
- Responsive mobile layout
- CMS integration for blog
- SEO metadata
- Lighthouse 90+ score
- GitHub Actions CI/CD pipeline
- Shared TypeScript types package
- ESLint + Prettier config
- Jest / Vitest test coverage
- Environment variable management (dotenv + secrets)
- Monorepo tooling (Turborepo or nx)
This is an active personal project — contributions, issues, and ideas welcome.
- Fork the repository
- Create a branch:
git checkout -b feature/your-feature - Commit:
git commit -m "feat: describe change" - Push:
git push origin feature/your-feature - Open a Pull Request
MIT License — see LICENSE for details.
Built with curiosity, caffeine, and multimodal LLMs ☕
Starring: Python · TypeScript · MCP · OpenAI · Anthropic · Gemini · React · Docker