Natural language to robot trajectory. Type a task, watch a Franka Panda execute it.
Runs entirely in the browser (MuJoCo WASM + Three.js). Only the LLM call goes to a backend.
Put.All.Cubes.on.the.Plate.mp4
cd frontend
cp .env.example .env
npm install
npx viteOpen http://localhost:5173. That's it. The default config points at a hosted backend that proxies to Groq's free tier, so no API key needed.
The hosted demo uses my own endpoint (qwen/qwen3-32b) that is rate-limited to 20-60 requests per minute. Please be gentle with it.
Edit frontend/.env:
# Any OpenAI-compatible endpoint works
VITE_API_BASE_URL=http://localhost:11434 # Ollama
VITE_MODEL=llama3
# Or direct to a provider (CORS must be enabled)
VITE_API_BASE_URL=https://api.groq.com/openai
VITE_MODEL=qwen/qwen3-32bIf you're calling a provider directly (not through a proxy), you'll need to add an API key header. See frontend/src/core/llmClient.ts.
The model must support tool calling (function calling) via the OpenAI-compatible API. Models known to work:
qwen/qwen3-32b(Groq) - recommended, supports reasoningmeta-llama/llama-4-scout-17b-16e-instruct(Groq)gpt-4o(OpenAI)- Any Ollama model with tool support
Note on reasoning models: Models like Qwen3 and DeepSeek-R1 can spend all their output tokens on thinking and return no tool calls. The system prompt includes "Keep your reasoning brief" to mitigate this, but very long/complex tasks may still fail. If you get empty plans, try a non-reasoning model or simplify the task.
frontend/ TypeScript + React + Vite
src/
core/ Pure logic (scene, primitives, resolver, LLM client)
sim/ MuJoCo WASM + Three.js (loader, visualizer, IK, animator)
components/ React UI
public/mujoco/ Franka Panda model + scene XML
vendor/mujoco/ MuJoCo WASM runtime
src/ Python CLI (original prototype, not used by web demo)
- Scene extraction reads object positions from the MuJoCo simulation
- LLM receives a text scene description + tool definitions, returns tool calls like
pick("Red Cube"),place("Plate") - Resolver expands tool calls into waypoints (name to coordinates, higher-order primitives to atomic steps)
- Animator drives the robot arm to each waypoint using IK
Add one function in frontend/src/core/primitives.ts:
const push: PrimitiveDef = {
name: "push",
description: "Push an object sideways.",
parameters: {
type: "object",
properties: {
object_name: { type: "string" },
direction: { type: "string", description: "left or right" },
},
required: ["object_name", "direction"],
},
handler: (scene, params): Waypoint[] => {
// ... return waypoints
},
};Add it to the LLM_PRIMITIVES array and the LLM can use it immediately.
Only needed if you want to host your own proxy (to hide API keys or add rate limiting).
cd backend/ # or wherever your proxy lives
pip install -r requirements.txt
GROQ_API_KEY=your_key python app.pyThe proxy forwards /v1/chat/completions to Groq with the API key injected. The frontend's VITE_API_BASE_URL should point at it.
cd frontend
npx vitest runThe system was designed and implemented by myself back in 2023. This demo is a re-implementation of the original concept. AI was used to speed up the process.