Camera/audio/cmd web bridges + Mac+LM-Studio agentic compose by tfius · Pull Request #2277 · dimensionalOS/dimos

tfius · 2026-05-28T04:39:53Z

Summary

Adds three web bridges and a local-LLM hackathon quickstart, exposing the
robot's I/O over plain HTTP/WS so external processes (e.g. an MLX VLM running
elsewhere on the same Mac) can drive perception + control without speaking
any internal dimos APIs.

New modules (Mac- and sim-friendly, no CUDA)

camera-mjpeg-module — port 7780
- GET /video_feed/color_image MJPEG (multipart/x-mixed-replace)
- GET /snapshot/color_image single JPEG
- CORS open
audio-ws-module — port 7781
- ws://…/audio_out binary int16 PCM frames (with a JSON {event:"format",…} hello)
- GET /audio_info reports rate/channels
- POST /play accepts WAV bytes → robot speaker (megaphone path)
cmd-bridge-module — port 7782
- POST /cmd_vel, POST /path (raw Twist or semantic forward/left/degrees),
  POST /stop, GET /pose
- Open-loop; designed for a VLM that re-plans each iteration

Go2 audio I/O (real robot only — sim has no audio)

UnitreeWebRTCConnection.audio_stream() → AudioMessage(data, sample_rate, channels),
hooks the existing WebRTC audio transceiver
UnitreeWebRTCConnection.play_wav_bytes(wav) — uploads via audiohub.upload_megaphone,
enters megaphone, sleeps for the WAV's duration, then exits cleanly
New skills on GO2Connection: play_wav(path), play_wav_b64(base64)
New streams: audio: Out[AudioMessage], audio_in: In[bytes]

Launcher scripts

go2-start.sh — hackathon quickstart: wifi/ping/NTP checks, env-based LLM
presets (LMSTUDIO=1, MLXVLM=1), auto-injects mcp-server + mcp-client
when an LLM preset is set against a non-agentic blueprint
sim-with-llm.sh — thin wrapper that sets SIMULATION=1 and forwards
Both probe /v1/models before launch and surface a useful error if the
backend isn't up

Docs

journal/2026-05-27-camera-audio-bridges.md — implementation notes,
review fixes, Mac+local-LLM agentic landscape
journal/2026-05-27-mlxvlm-robot-integration-prompt.md — drop-in brief
for an external Claude/agent that consumes these endpoints

Includes (from earlier feat/gemini-go2-2245 work by @grmkris, not by me)

feat(go2): all-Gemini VL for the Mac-only agentic blueprint
feat(take_picture), feat(follow_person), feat(go2): tilt_body,
save/load map, room_scan, etc.

Test plan

./sim-with-llm.sh lmstudio boots; agent reaches the endpoint
./sim-with-llm.sh mlxvlm reaches mlxvlm at :8080
Open http://127.0.0.1:7780/video_feed/color_image — MJPEG renders
curl -X POST -d '{"steps":[{"forward":0.5,"duration":2}]}' http://127.0.0.1:7782/path
drives the sim Go2 forward
dimos mcp list-tools shows observe, play_wav, play_wav_b64
On real Go2: WebSocket audio_out streams PCM; POST /play plays
through the dog's speaker and exits megaphone cleanly

🤖 Generated with Claude Code

…d capture-viewer tool Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the custom Go2FullRecorder with the stock go2-memory recorder for the capture demo (camera + lidar + odom is enough for the frames+trajectory viewer). Point the capture-viewer at recording_go2.db and ignore recording sidecars + the MuJoCo runtime log. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New TakePictureSkill subscribes to color_image, caches the latest frame, and on take_picture() JPEG-encodes it and POSTs to robomoo's /api/robot/frame with a shared bearer token (ROBOMOO_URL / ROBOT_INGEST_TOKEN from env). Wired into unitree_go2_agentic_gemini and registered as take-picture-skill so the agent can call it ("take a picture"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

take_picture now attaches the robot's odom pose (poseX/poseY) + label so the web can pin captures on the map. New MapUploader subscribes global_costmap, renders it with turbo_image, and POSTs the PNG + grid metadata to robomoo /api/robot/map (throttled). Both wired into unitree_go2_agentic_gemini. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Consolidate the uncommitted Gemini layer onto the branch: - Gemini VL model (dimos/models/vl/gemini.py) wired into create()/types - --detection-model CLI knob + prefetch - Navigation/PersonFollow vl_model_name wiring - explore_and_capture skill gated on FrontierExplorerSpec - Gemini image/text embeddings raise instead of returning random vectors Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ueprint The dog has no onboard brain — all compute runs on the Mac. Moondream is unusable there (~6 min/inference + Metal crash, which aborted the blueprint at startup and looked like a connection failure). Route every VL path to Gemini: PersonFollow vl_model_name moondream->gemini, plus .global_config(detection_model=gemini) for look_out_for / PerceiveLoop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

GeminiSpeakSkill called GeminiTTSNode.consume_text() on every speak(), which spawned a fresh worker thread + subscription each call (the repeated 'Starting GeminiTTSNode' log) and leaked the old ones. Wire the TTS pipeline ONCE in start() against a long-lived Subject; each speak() just pushes onto it and the node drains FIFO. Default speak() to blocking=False so the agent isn't stalled on synthesis; blocking=True still waits, matching on the emitted utterance so a concurrent non-blocking speak can't trip the wait. Make consume_text idempotent as defense-in-depth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

take_picture blocked the agent on a synchronous httpx.post (up to 30s) before returning. Snapshot the current frame+pose synchronously, then dispatch the JPEG encode + POST to a daemon thread and return at once. Refactor the upload into _upload_frame(frame, pose, ...) shared by both the skill and the explore capture loop (via a thin _upload_current wrapper); track outstanding upload threads and join them in stop(). Trade-off: the skill no longer returns the storage key or surfaces HTTP errors to the agent — upload failures are logged only. That is the intended cost of returning fast. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Go2 camera is body-fixed; the only way to look up/down is to pitch the body with the Euler sport command (api_id 1007), which was not exposed and needs a parameter payload execute_sport_command can't send. Add a tilt_body(pitch_deg, roll_deg, yaw_deg) skill that publishes Euler to SPORT_MOD with the {"parameter": {"data": {x,y,z}}} payload (same publish_request path execute_sport_command uses), converting degrees->radians and clamping to the safe standing envelope (±0.75/±0.75/±0.6 rad). Negative pitch looks up. Note: pitch sign and the {data:{x,y,z}} vs flat {x,y,z} payload form should be confirmed on-robot; the clamp is the safety net. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

On a Mac (no CUDA) follow_person falls back to 'redetect', which ran a full synchronous Gemini detection every control cycle — capping the loop at ~0.5-1 Hz so the robot acted on stale velocity commands (the laggy feel). Decouple the two: a background thread re-detects every _redetect_period (0.8s) to anchor a cheap local OpenCV tracker, while the control loop runs at _frequency (20 Hz), updating the tracker locally and publishing a fresh twist each cycle. _create_tracker() picks the best available tracker (CSRT > KCF > MIL, across the main and legacy cv2 namespaces) so it auto-upgrades to CSRT where opencv-contrib exists and falls back to MIL (base OpenCV) here. Lost handling keeps the existing _lost_timeout semantics. The EdgeTAM (CUDA) path and auto mode-resolution are untouched, so GPU machines are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@Skill

…el in one call Tilting then capturing as separate agent calls is racy: tilt_body returns before the body settles and take_picture snapshots instantly, so the photo can catch a mid-tilt view. Add a deterministic tilt_and_capture(pitch_deg=-20, note, settle_s=1.0) that runs the whole sequence in the background — tilt, wait to settle, snapshot + upload the tilted view, then re-level — and returns at once. Reuses the existing ModuleRef-spec pattern: a new TiltSpec resolves structurally to UnitreeSkillContainer.tilt_body (which @Skill exposes over rpc), so the capture module can aim the body-fixed camera without owning the WebRTC connection. Negative pitch looks up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

map_uploader now ships a value-preserving grayscale occupancy PNG (free=0, occupied=1..100, unknown=255) instead of a baked turbo image, so the web can recolor + overlay client-side. Add scripts/export_recording.py: reads a memory2 SqliteStore .db and pushes a top-down lidar occupancy map, downsampled odom trajectory, and CLIP-embedded keyframes (+thumbnails+pose) to robomoo's /api/robot/*. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wifi/ROBOT_IP sanity check, NTP sync, venv bootstrap, and `exec dimos run <blueprint>` (default `unitree-go2-basic`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CameraMjpegModule (`camera-mjpeg-module`) republishes the `color_image` stream as an HTTP MJPEG feed and a single-JPEG snapshot, with CORS open: GET /video_feed/color_image multipart/x-mixed-replace GET /snapshot/color_image image/jpeg Default port 7780. Compose with any blueprint that publishes `color_image` (sim or real Go2): dimos --simulation run unitree-go2-basic camera-mjpeg-module Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds full bidirectional audio support for the real Go2: - UnitreeWebRTCConnection.audio_stream() emits AudioMessage (int16 PCM + sample_rate + channels) by hooking the existing WebRTC audio transceiver and activating it via switchAudioChannel. Float frames are scaled before int16 cast. - UnitreeWebRTCConnection.play_wav_bytes(wav) uploads + plays a WAV through the audiohub megaphone, then exits megaphone mode after the clip's duration so subsequent commands work normally. - GO2Connection exposes audio: Out[AudioMessage] and audio_in: In[bytes], plus two skills: play_wav(wav_path) — local-filesystem WAV play_wav_b64(wav_b64) — base64 WAV for remote MCP clients - New AudioWsModule (`audio-ws-module`) bridges to the browser: WebSocket /audio_out binary PCM frames GET /audio_info sample_rate + channels (initial WS frame also broadcasts {"event":"format", ...}) POST /play WAV body for robot speaker Simulation has no audio; the wiring is hasattr-guarded so unitree-go2-basic still works in sim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ATION mode - BRIDGES=1 (default) appends `camera-mjpeg-module audio-ws-module` to the dimos run argv so the web endpoints come up automatically. - SIMULATION=1 swaps in `--simulation`, skips wifi/ping/NTP checks, installs the `sim` extra on first venv bootstrap. - EXTRA="…" lets callers tack on additional modules. - Banner prints every endpoint that will be available (command center, MJPEG, snapshot, audio WS, audio play). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rapper go2-start.sh - LMSTUDIO=1 routes McpClient to LM Studio's OpenAI-compatible server at http://127.0.0.1:1234/v1 (LMSTUDIO_MODEL=qwen/qwen3-8b by default). - MLXVLM=1 routes to the mlxvlm Gemma-4 server at http://127.0.0.1:8080/v1 (MLXVLM_MODEL defaults to gemma-4-E4B-it-MLX-4bit). - Probes /v1/models before launch; bails with a useful hint if the backend isn't up. - Both presets set OPENAI_BASE_URL + OPENAI_API_KEY and pass `-o mcpclient.model=openai:<model>` to dimos run. - Warns when an LLM preset is set against a non-agentic blueprint. sim-with-llm.sh - One-liner: `./sim-with-llm.sh mlxvlm` runs the sim with the agentic blueprint pointed at mlxvlm; `lmstudio` and `ollama` variants too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

unitree-go2-agentic pulls in SecurityModule (EdgeTAM/CUDA), local Moondream VL (crashes Metal), and OpenAI-hardcoded TTS — none of which boot on Apple Silicon. unitree-go2-agentic-gemini is the existing Mac skeleton that disables those and uses Gemini for VL/embeddings/TTS. The chat LLM override (LMSTUDIO/MLXVLM) still applies — it's just the McpClient. Surface a GOOGLE_API_KEY warning early since Gemini VL/embed/TTS fail at runtime without it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ompose unitree-go2-agentic-gemini imports GeminiSpeakSkill at module load (before any --disable can apply), and google-genai isn't installed, so the gemini blueprint crashed at import. Switch the default compose to: unitree-go2-basic (camera + viz, no Google imports) + mcp-server + mcp-client (MCP agent loop) + unitree-skill-container (wait / current_time / sport / tilt_body) + camera-mjpeg-module + audio-ws-module (from BRIDGES=1) For chat: LMSTUDIO=1 or MLXVLM=1 reroutes McpClient as before. For ollama: keep the existing unitree-go2-agentic-ollama (clean Mac compose). Trade-off: no relative_move skill (needs the nav stack). Publish to /cmd_vel directly if movement is needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

UnitreeSkillContainer requires a NavigationInterfaceSpec module (replanning-a-star-planner + nav-stack chain). Without it, build fails: "No module met that spec." Minimal default is now just: unitree-go2-basic + mcp-server + mcp-client + bridges Skills available: observe, play_wav, play_wav_b64. Header documents how to layer in unitree-skill-container + replanning-a-star-planner when movement skills are needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Document the blueprint compatibility matrix on Apple Silicon, the skill availability per compose, and the VL backend gotcha (qwen is Alibaba cloud, not local MLX; moondream crashes Metal; gemini is Google cloud). Notes the missing piece: a Mac-local VL backend (mlxvlm/openai_compat) that would unlock the richer NavigationSkill / PersonFollow containers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous start() did `loop.run_until_complete(server.serve())` from a daemon thread and used `run_coroutine_threadsafe` to broadcast audio frames, which races the loop startup and silently raises "Event loop stopped before Future completed" when uvicorn shuts down or fails to bind. Now: - uvicorn.Server.run() owns its own loop and lifecycle. - Subscriber thread enqueues frames; a coroutine inside the uvicorn loop (spawned at FastAPI startup) drains the queue and fans out to WebSocket clients. - Bind errors (port already in use) are logged with the actual OSError and a kill hint, instead of a generic "loop stopped". - Drops oldest frame on overflow so latency stays bounded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… brief cmd-bridge-module exposes the robot's drive interface over HTTP so an external VLM (mlxvlm Gemma-4) can run a perceive-act loop: POST /cmd_vel one Twist + duration POST /path sequence of Twist steps (raw or semantic forward/left/degrees) POST /stop cancel any in-flight /path GET /pose current base_link pose in world frame Open-loop (no SLAM, no obstacle avoidance) so it works in sim and on the bare-metal robot without the nav stack. The VLM is expected to re-plan each iteration from a fresh camera frame. journal/2026-05-27-mlxvlm-robot-integration-prompt.md is the brief for the Claude working on the mlxvlm side: endpoint contract, the `/api/robot/navigate` perceive-act loop, JSON schema for the model's per-step reply, failure modes, and sim-only testing instructions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

BRIDGES=1 now appends camera-mjpeg-module + audio-ws-module + cmd-bridge-module, and the banner lists the new cmd_vel/path/stop/pose endpoints alongside the camera and audio URLs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…mode Real-robot LLM launches (LMSTUDIO=1 ./go2-start.sh) used to forget the McpClient, so the -o mcpclient.model=... override silently bound to nothing and LM Studio sat unused. go2-start.sh now mirrors what sim-with-llm.sh was doing: when an LLM preset is set against a non-agentic blueprint, mcp-server + mcp-client are appended to EXTRA. The check is skipped when the blueprint already includes an agent (e.g. unitree-go2-agentic-ollama). sim-with-llm.sh becomes a thin wrapper: it only picks the backend preset and forwards to go2-start.sh with SIMULATION=1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-05-28T04:44:13Z

Greptile Summary

This PR adds three HTTP/WebSocket bridges (MJPEG camera, PCM audio, cmd_vel) to expose robot I/O over plain HTTP for external VLM/LLM processes, adds Go2 audio I/O via WebRTC, several new skills (Gemini TTS, macOS say, take_picture, map_uploader), a Gemini-all blueprint, and two hackathon launcher scripts.

Web bridges (mjpeg_module, audio_ws_module, cmd_bridge_module): each runs uvicorn in a daemon thread, uses proper cross-thread handoffs, and binds to loopback only.
Audio I/O (UnitreeWebRTCConnection): audio_stream() correctly uses a Subject + finally_action teardown; play_wav_bytes() fires a coroutine into the existing event loop — but exit_megaphone() is missing from the finally block, which can leave the robot unable to move.
Launchers (go2-start.sh, sim-with-llm.sh): pre-flight wifi/ping/NTP/LLM checks with LLM auto-injection; a hardcoded developer filesystem path appears in one error message.

Confidence Score: 3/5

Safe to merge with one fix: the megaphone-exit bug in play_wav_bytes must be addressed before deploying on a real Go2, as it can leave the robot unresponsive to all motion commands until manually reset.

The audio WebRTC path in connection.py enters megaphone mode and then calls exit_megaphone() outside any finally block. A coroutine cancellation during active playback will leave the robot permanently in megaphone mode, blocking every subsequent motion command. The rest of the PR — the three web bridges, Gemini TTS skill, take-picture skill, and launcher scripts — is well-structured and largely correct.

dimos/robot/unitree/connection.py (play_wav_bytes / _upload_play_exit) requires the exit_megaphone finally-block fix before real-robot use. go2-start.sh has a minor hardcoded path to address.

Important Files Changed

Filename	Overview
dimos/robot/unitree/connection.py	Adds AudioMessage dataclass, audio_stream() observable, and play_wav_bytes(). The exit_megaphone() call is not wrapped in a finally block, so cancellation leaves the robot stuck in megaphone mode.
dimos/web/audio_ws_module.py	New WebSocket bridge for robot audio (mic out + WAV playback). Thread/async handoff via a bounded queue is correctly implemented; _clients is only accessed from the same asyncio loop so no race. Clean.
dimos/web/cmd_bridge_module.py	New HTTP bridge for robot motion commands. Lock-and-cancel design is correct; duration field lacks an upper bound that could stall the thread-pool worker indefinitely.
dimos/web/mjpeg_module.py	New MJPEG + snapshot HTTP module. Frame encoding is correct (RGB to BGR before imencode), snapshot is lock-protected, CORS is open. No issues.
dimos/robot/unitree/go2/connection.py	Adds audio/audio_in streams and play_wav/play_wav_b64 skills to GO2Connection. Uses hasattr guards for sim compatibility. Clean.
go2-start.sh	New hackathon launcher with wifi/ping/NTP/LLM-endpoint checks. Contains a hardcoded developer local path (/Users/tex/repos/…) in the mlxvlm error message.
dimos/agents/skills/gemini_speak_skill.py	New Gemini TTS-backed speak skill. Idempotent consumer wiring, fire-and-forget queue, blocking wait with text matching, and resource cleanup are all handled correctly.
dimos/agents/skills/take_picture_skill.py	New skill for one-shot and explore-and-capture photo upload. Background threads tracked and joined on stop; tilt always re-leveled in finally. Clean.
sim-with-llm.sh	Thin wrapper around go2-start.sh that routes the backend argument (mlxvlm/lmstudio/ollama) and sets SIMULATION=1. No issues.
dimos/stream/audio/tts/node_gemini.py	New Gemini TTS pipeline node. Worker thread and subscription lifecycle are correctly managed; dispose() drains the queue and joins the thread before completing subjects.

Sequence Diagram

sequenceDiagram
    participant ExtClient as External VLM/Client
    participant MjpegMod as CameraMjpegModule (port 7780)
    participant AudioMod as AudioWsModule (port 7781)
    participant CmdMod as CmdBridgeModule (port 7782)
    participant GO2Conn as GO2Connection
    participant UnitreeConn as UnitreeWebRTCConnection

    GO2Conn->>MjpegMod: color_image stream
    GO2Conn->>AudioMod: audio (AudioMessage) stream
    UnitreeConn->>GO2Conn: video_stream / audio_stream

    ExtClient->>MjpegMod: GET /video_feed/color_image (MJPEG)
    MjpegMod-->>ExtClient: multipart/x-mixed-replace JPEG frames

    ExtClient->>AudioMod: WS /audio_out
    AudioMod-->>ExtClient: binary int16 PCM frames

    ExtClient->>AudioMod: POST /play (WAV bytes)
    AudioMod->>GO2Conn: audio_in (bytes)
    GO2Conn->>UnitreeConn: play_wav_bytes(wav)
    UnitreeConn->>UnitreeConn: upload_megaphone, enter, sleep, exit

    ExtClient->>CmdMod: POST /cmd_vel or /path
    CmdMod->>GO2Conn: cmd_vel (Twist)

    ExtClient->>CmdMod: GET /pose
    CmdMod-->>ExtClient: x, y, z, theta

_{Reviews (1): Last reviewed commit: "fix(go2-start): auto-inject mcp-server/m..." | Re-trigger Greptile}

greptile-apps · 2026-05-28T04:44:16Z

+                await hub.upload_megaphone(tmp_path)
+                await hub.enter_megaphone()
+                # Hold megaphone for the clip's duration, plus a small flush margin,
+                # then release so other commands work normally.
+                await asyncio.sleep(duration + 0.5)
+                await hub.exit_megaphone()


If asyncio.sleep() is cancelled (e.g. the event loop shuts down mid-play) or any awaitable after enter_megaphone() raises, exit_megaphone() is never called and the robot is permanently stuck in megaphone mode until manually reset, rendering all movement commands ineffective.

Suggested change

await hub.upload_megaphone(tmp_path)

await hub.enter_megaphone()

# Hold megaphone for the clip's duration, plus a small flush margin,

# then release so other commands work normally.

await asyncio.sleep(duration + 0.5)

await hub.exit_megaphone()

await hub.upload_megaphone(tmp_path)

await hub.enter_megaphone()

try:

# Hold megaphone for the clip's duration, plus a small flush margin,

# then release so other commands work normally.

await asyncio.sleep(duration + 0.5)

finally:

await hub.exit_megaphone()

greptile-apps · 2026-05-28T04:44:18Z

+    if [[ "$LMSTUDIO" == "1" ]]; then
+      c_red "  start LM Studio's Local Server (Cmd-Shift-2) and load a tool-capable model"
+    else
+      c_red "  start mlxvlm: cd /Users/tex/repos/ai/mlx/mlxvlm && scripts/start-all.sh"


This error message contains a hardcoded path from the developer's local machine (/Users/tex/repos/ai/mlx/mlxvlm). Every other user will see an incorrect and confusing instruction.

Suggested change

c_red " start mlxvlm: cd /Users/tex/repos/ai/mlx/mlxvlm && scripts/start-all.sh"

c_red " start mlxvlm server and ensure it is listening on port 8080"

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-05-28T04:44:19Z

+class TwistRequest(BaseModel):
+    linear: list[float] = Field(default=[0.0, 0.0, 0.0], min_length=3, max_length=3)
+    angular: list[float] = Field(default=[0.0, 0.0, 0.0], min_length=3, max_length=3)
+    duration: float = 0.5  # seconds to hold the command before stopping


duration has no upper bound. A caller can send duration: 999999, blocking the uvicorn thread-pool worker (and the _drive_lock) for hours, making /stop unable to interrupt ongoing execution and preventing subsequent commands from acquiring the lock.

Suggested change

duration: float = 0.5 # seconds to hold the command before stopping

duration: float = Field(default=0.5, ge=0.0, le=30.0) # seconds to hold the command before stopping

bogwi and others added 29 commits May 25, 2026 13:46

Forward SHM transports to Rerun and unify Go2 replay IPC

b80e3c1

fix: mypy

c48366d

fix: Greptile P1

928c08f

feat: add gemini/local speak skills, agentic gemini Go2 blueprint, an…

eb622d6

…d capture-viewer tool Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore: add go2-start.sh hackathon quickstart

fd3254c

Wifi/ROBOT_IP sanity check, NTP sync, venv bootstrap, and `exec dimos run <blueprint>` (default `unitree-go2-basic`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docs(journal): camera + audio web bridges

22ef9a1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tfius requested a review from leshy as a code owner May 28, 2026 04:39

tfius requested review from mustafab0, paul-nechifor and spomichter as code owners May 28, 2026 04:39

greptile-apps Bot reviewed May 28, 2026

View reviewed changes

leshy added the hackaton label May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Camera/audio/cmd web bridges + Mac+LM-Studio agentic compose#2277

Camera/audio/cmd web bridges + Mac+LM-Studio agentic compose#2277
tfius wants to merge 29 commits into
dimensionalOS:mainfrom
tfius:feat/web-bridges-llm-presets

tfius commented May 28, 2026

Uh oh!

greptile-apps Bot commented May 28, 2026

Uh oh!

greptile-apps Bot May 28, 2026

Uh oh!

greptile-apps Bot May 28, 2026

Uh oh!

greptile-apps Bot May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	c_red " start mlxvlm: cd /Users/tex/repos/ai/mlx/mlxvlm && scripts/start-all.sh"
	c_red " start mlxvlm server and ensure it is listening on port 8080"

	duration: float = 0.5 # seconds to hold the command before stopping
	duration: float = Field(default=0.5, ge=0.0, le=30.0) # seconds to hold the command before stopping

Conversation

tfius commented May 28, 2026

Summary

New modules (Mac- and sim-friendly, no CUDA)

Go2 audio I/O (real robot only — sim has no audio)

Launcher scripts

Docs

Includes (from earlier feat/gemini-go2-2245 work by @grmkris, not by me)

Test plan

Uh oh!

greptile-apps Bot commented May 28, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants