Feat/find seat by Ueti999 · Pull Request #2299 · dimensionalOS/dimos

Ueti999 · 2026-05-28T16:27:56Z

Problem

For the hackathon, we were missing an "empty-seat guidance" feature on the
G1/Go2 stack. We needed a minimal loop where the robot detects an empty
seat from the camera, then either guides a person there or walks there
autonomously.

Closes DIM-XXX

Solution

Adds YOLO-based empty-seat detection (SeatFinderSkill / SeatPlanner)
and three Go2 agentic blueprints that use it.

Split into 4 commits, one module per commit:

#	Commit	What
1	`37687aa47`	`dimos/agents/skills/` — `seat_finder.py` (continuous YOLO + in-place scan) and `seat_planner.py` (on-demand, 3D-projection only)
2	`fa5159165`	`dimos/robot/unitree/go2/blueprints/agentic/` — `unitree_go2_guide.py` (guide-dog mode), `unitree_go2_seat_demo.py` (+ `_record` / `_reuse`)
3	`b483f95db`	`seat_check_webcam.py` — standalone webcam test, no robot needed
4	`3fa5d633a`	39 debug capture JPGs under `seat_finder_debug/`

Design choices:

SeatFinderSkill publishes cmd_vel for the in-place scan (SCAN_YAW_RATE = 0.5 rad/s, 10 Hz tick, ~14 s for one revolution).
SeatPlanner never publishes cmd_vel — only emits goal_request. Used in the manual-map → MCP-trigger flow, which is the safer path.
unitree_go2_guide drops SecurityModule / SpatialMemory / PerceiveLoopSkill / PersonFollowSkill to lower GPU load (not needed for guiding).
Occupancy test: OCCUPANCY_OVERLAP = 0.2 — a seat is "occupied" if a person bbox covers ≥20% of its area.

How to Test

A. Webcam only (no robot):
.venv/bin/python seat_check_webcam.py [--camera 0]
Overlay: green = empty seat, red = occupied, blue = person.

B. Manual map → MCP trigger (real Go2):

Launch the unitree_go2_seat_demo blueprint.
Drive the Go2 manually — click-to-goal in Rerun or keyboard teleop — to build the voxel map.
From another terminal:
dimos mcp call find_empty_seat_now
SeatPlanner picks an empty seat → projects it to 3D → publishes goal_request → A* plans the path.

C. Guide-dog mode:
python -m dimos.robot.unitree.go2.blueprints.agentic.unitree_go2_guide
Agent + voice I/O + SeatFinder + nav stack guide a leashed person to an empty seat.

greptile-apps · 2026-05-28T16:32:39Z

Greptile Summary

Adds an end-to-end empty-seat guidance feature to the Go2 stack: YOLO-based detection (SeatFinderSkill, SeatPlanner, SeatGuideSkillContainer), three agentic blueprints, a Vercel-hosted TTS relay for phone speaker output, OpenRouter model routing in McpClient, and supporting infrastructure (SHM→Rerun forwarding, optional SpatialMemory, MCP SSE fan-out).

SeatFinderSkill / SeatPlanner: continuous YOLO detection on a sharpness-filtered stream, in-place scan loop with cmd_vel publishing, and 3D projection via pointcloud; prior review thread covers the blocking get_next(), non-interruptible scan loop, timestamp mismatch, and _voice_worker lifecycle issues.
Vercel speaker app (apps/seat-guide-speaker-vercel): serverless relay that stores the latest TTS message in-process memory and serves it to a polling browser page; the Authorization header sent by the robot is never validated by the handler.
McpClient OpenRouter routing: new helper functions correctly select the OpenRouter base URL and headers (X-OpenRouter-Title is the current canonical name); a global OPENROUTER_MODEL env-var override can silently replace the explicit model name with no log warning.

Confidence Score: 4/5

Safe to merge for hackathon/demo use; the Vercel speaker endpoint is publicly writable (token is sent but never checked), and direct_move exposes unbounded velocity commands to the LLM — both are low-impact for a demo context but worth addressing before production use.

The core detection and navigation logic is well-structured. The remaining open issues from the prior review thread (scan-loop interruptibility, blocking get_next(), voice-worker lifecycle) are not yet addressed. Two new findings in this pass — the unauthenticated Vercel TTS endpoint and the unclamped direct_move duration — add to the list but are also demo-scope concerns rather than blocking correctness bugs.

dimos/agents/skills/seat_finder.py and dimos/agents/skills/seat_planner.py still carry the threading issues noted in prior rounds; apps/seat-guide-speaker-vercel/api/[...speaker].js needs a token validation pass before any non-demo deployment; dimos/robot/unitree/unitree_skill_container.py (direct_move) would benefit from a duration cap.

Security Review

Unauthenticated /api/speak endpoint (apps/seat-guide-speaker-vercel/api/[...speaker].js): The robot sends Authorization: Bearer <token> but the serverless handler never reads or validates that header. Any caller with the Vercel URL can inject arbitrary TTS text without a token. The SEAT_GUIDE_SPEAKER_TOKEN env var creates a misleading impression of access control that does not exist server-side.

Important Files Changed

Filename	Overview
dimos/agents/skills/seat_finder.py	New YOLO-based seat finder skill with in-place scan loop; `pointcloud.get_next()` has no timeout (blocks forever if stream is unavailable) and the scan loop cannot be interrupted by `stop()` — both flagged in prior review thread.
dimos/agents/skills/seat_planner.py	On-demand seat planner with voice-trigger worker; `_voice_worker` thread is not joined on `stop()` and `_voice_busy` lock is not reset, allowing stale-worker collisions on restart — flagged in prior thread.
apps/seat-guide-speaker-vercel/api/[...speaker].js	Vercel serverless TTS relay; the `/api/speak` POST endpoint never validates the `Authorization` header despite `SEAT_GUIDE_SPEAKER_TOKEN` being configured on the robot side.
dimos/agents/mcp/mcp_client.py	Adds OpenRouter routing logic with correct header names (`X-OpenRouter-Title`) and proper fallback chain; routing logic is sound, though `OPENROUTER_MODEL` env var silently overrides the explicit model config when both are set.
dimos/robot/unitree/unitree_skill_container.py	Adds `direct_move` and updates `relative_move` with x/y compat aliases; `direct_move` is exposed as an LLM-callable `@skill` with no duration upper bound, allowing the robot to run indefinitely under adversarial or errant prompts.
dimos/agents/skills/seat_guide.py	Large new skill container with planner, navigation helpers, and VLM-based observation provider; logic appears clean with proper locking, though shares duplicated `_is_occupied` / annotation helpers with `seat_finder.py`.
dimos/agents/web_human_input.py	Rewired to import `SeatGuideRequestSpec` and route voice/text directly to SeatGuide; the cloud speaker POST runs synchronously in the subscriber callback which blocks for up to 5 s if the endpoint is down, but this is low-frequency.
dimos/agents/mcp/mcp_server.py	Adds SSE fan-out endpoint and `agent_send` skill; clean implementation with proper queue cleanup on stop and keepalive pings.
dimos/core/coordination/module_coordinator.py	Adds SHM transport forwarding to RerunBridgeModule after stream wiring; logic is straightforward and guarded by an early return when the bridge isn't deployed.
dimos/agents/skills/navigation.py	Makes `_spatial_memory` optional with early-return guards; changes are minimal and correct.

Sequence Diagram

sequenceDiagram
    participant Browser as iPhone Browser
    participant Vercel as Vercel /api
    participant Robot as Go2 (WebInput)
    participant MCP as McpServer / McpClient
    participant Skill as SeatFinderSkill / SeatPlanner
    participant Nav as A* Planner

    Robot->>Vercel: "POST /api/speak {text, device, Authorization: Bearer token}"
    Note over Vercel: Token NOT validated
    Vercel-->>Robot: 200 OK
    Browser->>Vercel: "GET /api/latest?device=go2-demo (polls 700ms)"
    Vercel-->>Browser: "{message: {text, id}}"
    Browser->>Browser: speechSynthesis.speak(text)

    Robot->>MCP: human_input / dimos mcp call find_empty_seat_now
    MCP->>Skill: RPC find_empty_seat_now() / find_empty_seat()
    Skill->>Skill: _scan_in_place() — publish cmd_vel yaw
    Skill->>Skill: _select_empty_seats(detections)
    Skill->>Skill: pointcloud.get_next()
    Skill->>Skill: Detection3DPC.from_2d(best, pointcloud)
    Skill->>Nav: goal_request.publish(pose)
    Nav-->>Robot: "A* path → robot walks to seat"

_{Reviews (3): Last reviewed commit: "chore: clean up seat finder branch artif..." | Re-trigger Greptile}

greptile-apps · 2026-05-28T16:32:43Z

+            return f"Could not resolve the camera transform, cannot locate the {label}."
+
+        best = max(candidates, key=lambda d: d.bbox_2d_volume())
+        target3d = Detection3DPC.from_2d(


get_next() without a timeout can block forever

self.pointcloud.get_next() is called with no timeout argument. If the pointcloud stream is slow or disconnected the skill thread hangs indefinitely, holding any scan motion lock and making the robot unresponsive. SeatPlanner already uses self.pointcloud.get_next(timeout=2.0) — the same guard should be applied here.

greptile-apps · 2026-05-28T16:32:45Z

+        if detections is None or not candidates:
+            return f"No {label} found after looking around."
+
+        pointcloud = self.pointcloud.get_next()


Add a timeout to avoid blocking indefinitely when the pointcloud stream is unavailable, mirroring the pattern already used in SeatPlanner.

Suggested change

pointcloud = self.pointcloud.get_next()

try:

pointcloud = self.pointcloud.get_next(timeout=2.0)

except Exception as e:

return f"Pointcloud unavailable: {e}"

greptile-apps · 2026-05-28T16:32:48Z

+            linear=Vector3(0.0, 0.0, 0.0), angular=Vector3(0.0, 0.0, SCAN_YAW_RATE)
+        )
+        try:
+            while True:
+                with self._lock:
+                    detections = self._latest
+                candidates = selector(detections.detections) if detections is not None else []
+                if candidates or time.time() >= deadline:
+                    return detections, candidates
+                now = time.time()
+                if now >= next_log:
+                    seen = [d.name for d in detections.detections] if detections else []
+                    logger.info(f"SeatFinder scan: rotating, currently seeing {seen}")
+                    next_log = now + SCAN_LOG_EVERY
+                self.cmd_vel.publish(yaw)
+                time.sleep(SCAN_TICK)
+        finally:
+            self.cmd_vel.publish(Twist.zero())
+
+    def _select_empty_seats(self, detections: list[Detection2DBBox]) -> list[Detection2DBBox]:
+        seats = [d for d in detections if d.name in SEAT_CLASSES]
+        persons = [d for d in detections if d.name == "person"]
+        return [s for s in seats if not self._is_occupied(s, persons)]
+


Scan loop cannot be interrupted by stop()

_scan_in_place loops with time.sleep(SCAN_TICK) and only exits when a candidate is found or the 14-second deadline expires. stop() publishes Twist.zero() once, but the next iteration of the spin loop immediately overwrites it with the yaw command — so the robot keeps rotating for up to 14 s after an emergency stop or user cancellation. A threading Event should be set in stop() and checked (or used as a sleep target) inside the loop.

greptile-apps · 2026-05-28T16:32:50Z

+            return f"No {label} found after looking around."
+
+        pointcloud = self.pointcloud.get_next()
+        transform = self.tf.get("camera_optical", pointcloud.frame_id, detections.image.ts, 5.0)
+        if not transform:
+            return f"Could not resolve the camera transform, cannot locate the {label}."
+
+        best = max(candidates, key=lambda d: d.bbox_2d_volume())
+        target3d = Detection3DPC.from_2d(
+            best,
+            world_pointcloud=pointcloud,
+            camera_info=self.config.camera_info,
+            world_to_optical_transform=transform,
+        )
+        if target3d is None:
+            return f"Found a {label} but could not compute its 3D position."
+
+        pose = target3d.pose
+        self.goal_request.publish(pose)
+        return (
+            f"Found a {label} at ({pose.position.x:.2f}, {pose.position.y:.2f}). "
+            "Navigating there now."
+        )
+
+    def _scan_in_place(self, selector: Any) -> tuple[ImageDetections2D | None, list[Detection2DBBox]]:


Temporal mismatch between YOLO detections and pointcloud for 3D projection

_scan_in_place returns the detections captured at the moment the scan stopped. self.pointcloud.get_next() then fetches the next pointcloud to arrive — which is from a later timestamp. Detection3DPC.from_2d fuses the 2-D bbox (from the older frame) with the newer world cloud, potentially projecting the seat to the wrong 3D position if the robot or the scene moved between the two frames.

greptile-apps · 2026-05-28T16:33:02Z

+#!/usr/bin/env python3
+"""Standalone webcam test for the YOLO empty-seat logic (no robot needed).
+
+Mirrors SeatFinderSkill's detection: YOLO detects chairs/couches and people,
+then a seat is "occupied" if a person box overlaps it past a threshold.
+
+Usage:
+    .venv/bin/python seat_check_webcam.py [--camera 0]
+
+Overlay:


Test script placed at repository root

seat_check_webcam.py lives at the repo root rather than under a tests/, scripts/, or examples/ directory. Consider moving it to a location that signals "this is not shipped code".

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

feat(seat-guide): add Go2 hardware guidance flow

greptile-apps · 2026-05-29T04:02:44Z

+    @rpc
+    def stop(self) -> None:
+        if self._subscription is not None:
+            self._subscription.dispose()
+            self._subscription = None
+        super().stop()


Voice worker keeps publishing after stop()

stop() disposes the frame subscription and the human-input subscription (via register_disposable), but it does not signal or await the _voice_worker daemon thread. If stop() is called while a voice worker is mid-flight inside find_empty_seat_now(), the worker will continue executing — including the self.goal_request.publish(pose) call — on a module that is nominally stopped. Additionally, _voice_busy is not reset on stop, so if the worker is still running when the module is restarted and a new keyword fires, _on_human_input will silently drop the trigger ("already running") until the old worker completes.

SeatFinderSkill: continuous YOLO-based empty-seat detector with in-place scan (slow yaw rate + 10Hz cmd_vel publishing) that surfaces the nearest unoccupied chair/couch/bench using person-box overlap as occupancy test. SeatPlanner: on-demand variant for the manual-map flow — operator drives the Go2 to build the voxel map, then triggers find_empty_seat_now via MCP; the planner projects the chosen 2D detection to a 3D PoseStamped on goal_request without publishing any cmd_vel itself.

unitree_go2_guide: slim "guide dog" blueprint that leads a leashed person to an empty seat. Drops SecurityModule, SpatialMemory, PerceiveLoopSkill and PersonFollowSkill to keep GPU/compute light; keeps nav stack, agent, voice I/O and SeatFinder. unitree_go2_seat_demo (+ _record, _reuse): manual-map → on-demand YOLO seat-find blueprint variants wired to SeatPlanner via McpServer. The record/reuse variants attach a RecordingModule for capturing and replaying the manual mapping pass.

Webcam-only test for the empty-seat logic — no robot needed. Mirrors SeatFinderSkill's detection (YOLO chairs/couches + people, occupancy via person-box overlap) and renders an annotated overlay: green = empty seat, red = occupied seat, blue = person. Usage: .venv/bin/python seat_check_webcam.py [--camera 0]

39 annotated YOLO frames (~7MB) from seat_check_webcam.py and the live SeatFinder runs, kept for repro and regression debugging of the empty-seat detection logic.

bogwi and others added 8 commits May 25, 2026 13:46

Forward SHM transports to Rerun and unify Go2 replay IPC

b80e3c1

fix: mypy

c48366d

fix: Greptile P1

928c08f

feat: add Go2 SeatGuide hardware flow

49e2b61

merge: integrate macOS SHM replay routing

353b19f

feat(mcp): support OpenRouter agent models

26dae6c

feat(seat-guide): add Go2 hardware guidance flow

00c0ad4

feat(seat-guide): add Vercel phone speaker relay

f87edf7

Ueti999 requested review from leshy, mustafab0, paul-nechifor and spomichter as code owners May 28, 2026 16:27

greptile-apps Bot reviewed May 28, 2026

View reviewed changes

huaruic added 6 commits May 29, 2026 04:05

Merge branch 'dimensionalOS:main' into feat/seat-guide-hardware

5d04493

chore(seat-guide): trim Vercel speaker dependencies

06a5bc6

refactor(seat-guide): remove unsupported Go2 audio path

9e6fa3c

fix(seat-guide): close phone relay guidance loop

e0e6bed

chore(seat-guide): remove obsolete moondream camera demo

a1167d6

Merge pull request #1 from cheer-up-hackathon/feat/seat-guide-hardware

d018235

feat(seat-guide): add Go2 hardware guidance flow

leshy added the hackaton label May 29, 2026

huaruic force-pushed the feat/find-seat branch from 3fa5d63 to 4015040 Compare May 29, 2026 03:57

greptile-apps Bot reviewed May 29, 2026

View reviewed changes

huaruic and others added 6 commits May 29, 2026 12:09

Merge branch 'dimensionalOS:main' into main

12ef54c

chore: add seat-finder debug captures

f3d9e39

39 annotated YOLO frames (~7MB) from seat_check_webcam.py and the live SeatFinder runs, kept for repro and regression debugging of the empty-seat detection logic.

chore: clean up seat finder branch artifacts

5f65d10

huaruic force-pushed the feat/find-seat branch from 4015040 to 5f65d10 Compare May 29, 2026 04:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/find seat#2299

Feat/find seat#2299
Ueti999 wants to merge 20 commits into
dimensionalOS:mainfrom
cheer-up-hackathon:feat/find-seat

Ueti999 commented May 28, 2026

Uh oh!

greptile-apps Bot commented May 28, 2026 •

edited

Loading

Security Review

Uh oh!

greptile-apps Bot May 28, 2026

Uh oh!

greptile-apps Bot May 28, 2026

Uh oh!

greptile-apps Bot May 28, 2026

Uh oh!

greptile-apps Bot May 28, 2026

Uh oh!

greptile-apps Bot May 28, 2026

Uh oh!

greptile-apps Bot May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-        pointcloud = self.pointcloud.get_next()
+        try:
+            pointcloud = self.pointcloud.get_next(timeout=2.0)
+        except Exception as e:
+            return f"Pointcloud unavailable: {e}"

Conversation

Ueti999 commented May 28, 2026

Problem

Solution

How to Test

Uh oh!

greptile-apps Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Security Review

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

greptile-apps Bot commented May 28, 2026 •

edited

Loading