Feat/find seat#2299
Conversation
Greptile SummaryAdds an end-to-end empty-seat guidance feature to the Go2 stack: YOLO-based detection (
Confidence Score: 4/5Safe to merge for hackathon/demo use; the Vercel speaker endpoint is publicly writable (token is sent but never checked), and The core detection and navigation logic is well-structured. The remaining open issues from the prior review thread (scan-loop interruptibility, blocking
|
| Filename | Overview |
|---|---|
| dimos/agents/skills/seat_finder.py | New YOLO-based seat finder skill with in-place scan loop; pointcloud.get_next() has no timeout (blocks forever if stream is unavailable) and the scan loop cannot be interrupted by stop() — both flagged in prior review thread. |
| dimos/agents/skills/seat_planner.py | On-demand seat planner with voice-trigger worker; _voice_worker thread is not joined on stop() and _voice_busy lock is not reset, allowing stale-worker collisions on restart — flagged in prior thread. |
| apps/seat-guide-speaker-vercel/api/[...speaker].js | Vercel serverless TTS relay; the /api/speak POST endpoint never validates the Authorization header despite SEAT_GUIDE_SPEAKER_TOKEN being configured on the robot side. |
| dimos/agents/mcp/mcp_client.py | Adds OpenRouter routing logic with correct header names (X-OpenRouter-Title) and proper fallback chain; routing logic is sound, though OPENROUTER_MODEL env var silently overrides the explicit model config when both are set. |
| dimos/robot/unitree/unitree_skill_container.py | Adds direct_move and updates relative_move with x/y compat aliases; direct_move is exposed as an LLM-callable @skill with no duration upper bound, allowing the robot to run indefinitely under adversarial or errant prompts. |
| dimos/agents/skills/seat_guide.py | Large new skill container with planner, navigation helpers, and VLM-based observation provider; logic appears clean with proper locking, though shares duplicated _is_occupied / annotation helpers with seat_finder.py. |
| dimos/agents/web_human_input.py | Rewired to import SeatGuideRequestSpec and route voice/text directly to SeatGuide; the cloud speaker POST runs synchronously in the subscriber callback which blocks for up to 5 s if the endpoint is down, but this is low-frequency. |
| dimos/agents/mcp/mcp_server.py | Adds SSE fan-out endpoint and agent_send skill; clean implementation with proper queue cleanup on stop and keepalive pings. |
| dimos/core/coordination/module_coordinator.py | Adds SHM transport forwarding to RerunBridgeModule after stream wiring; logic is straightforward and guarded by an early return when the bridge isn't deployed. |
| dimos/agents/skills/navigation.py | Makes _spatial_memory optional with early-return guards; changes are minimal and correct. |
Sequence Diagram
sequenceDiagram
participant Browser as iPhone Browser
participant Vercel as Vercel /api
participant Robot as Go2 (WebInput)
participant MCP as McpServer / McpClient
participant Skill as SeatFinderSkill / SeatPlanner
participant Nav as A* Planner
Robot->>Vercel: "POST /api/speak {text, device, Authorization: Bearer token}"
Note over Vercel: Token NOT validated
Vercel-->>Robot: 200 OK
Browser->>Vercel: "GET /api/latest?device=go2-demo (polls 700ms)"
Vercel-->>Browser: "{message: {text, id}}"
Browser->>Browser: speechSynthesis.speak(text)
Robot->>MCP: human_input / dimos mcp call find_empty_seat_now
MCP->>Skill: RPC find_empty_seat_now() / find_empty_seat()
Skill->>Skill: _scan_in_place() — publish cmd_vel yaw
Skill->>Skill: _select_empty_seats(detections)
Skill->>Skill: pointcloud.get_next()
Skill->>Skill: Detection3DPC.from_2d(best, pointcloud)
Skill->>Nav: goal_request.publish(pose)
Nav-->>Robot: "A* path → robot walks to seat"
Reviews (3): Last reviewed commit: "chore: clean up seat finder branch artif..." | Re-trigger Greptile
| return f"Could not resolve the camera transform, cannot locate the {label}." | ||
|
|
||
| best = max(candidates, key=lambda d: d.bbox_2d_volume()) | ||
| target3d = Detection3DPC.from_2d( |
There was a problem hiding this comment.
get_next() without a timeout can block forever
self.pointcloud.get_next() is called with no timeout argument. If the pointcloud stream is slow or disconnected the skill thread hangs indefinitely, holding any scan motion lock and making the robot unresponsive. SeatPlanner already uses self.pointcloud.get_next(timeout=2.0) — the same guard should be applied here.
| if detections is None or not candidates: | ||
| return f"No {label} found after looking around." | ||
|
|
||
| pointcloud = self.pointcloud.get_next() |
There was a problem hiding this comment.
Add a timeout to avoid blocking indefinitely when the pointcloud stream is unavailable, mirroring the pattern already used in
SeatPlanner.
| pointcloud = self.pointcloud.get_next() | |
| try: | |
| pointcloud = self.pointcloud.get_next(timeout=2.0) | |
| except Exception as e: | |
| return f"Pointcloud unavailable: {e}" |
| linear=Vector3(0.0, 0.0, 0.0), angular=Vector3(0.0, 0.0, SCAN_YAW_RATE) | ||
| ) | ||
| try: | ||
| while True: | ||
| with self._lock: | ||
| detections = self._latest | ||
| candidates = selector(detections.detections) if detections is not None else [] | ||
| if candidates or time.time() >= deadline: | ||
| return detections, candidates | ||
| now = time.time() | ||
| if now >= next_log: | ||
| seen = [d.name for d in detections.detections] if detections else [] | ||
| logger.info(f"SeatFinder scan: rotating, currently seeing {seen}") | ||
| next_log = now + SCAN_LOG_EVERY | ||
| self.cmd_vel.publish(yaw) | ||
| time.sleep(SCAN_TICK) | ||
| finally: | ||
| self.cmd_vel.publish(Twist.zero()) | ||
|
|
||
| def _select_empty_seats(self, detections: list[Detection2DBBox]) -> list[Detection2DBBox]: | ||
| seats = [d for d in detections if d.name in SEAT_CLASSES] | ||
| persons = [d for d in detections if d.name == "person"] | ||
| return [s for s in seats if not self._is_occupied(s, persons)] | ||
|
|
There was a problem hiding this comment.
Scan loop cannot be interrupted by
stop()
_scan_in_place loops with time.sleep(SCAN_TICK) and only exits when a candidate is found or the 14-second deadline expires. stop() publishes Twist.zero() once, but the next iteration of the spin loop immediately overwrites it with the yaw command — so the robot keeps rotating for up to 14 s after an emergency stop or user cancellation. A threading Event should be set in stop() and checked (or used as a sleep target) inside the loop.
| return f"No {label} found after looking around." | ||
|
|
||
| pointcloud = self.pointcloud.get_next() | ||
| transform = self.tf.get("camera_optical", pointcloud.frame_id, detections.image.ts, 5.0) | ||
| if not transform: | ||
| return f"Could not resolve the camera transform, cannot locate the {label}." | ||
|
|
||
| best = max(candidates, key=lambda d: d.bbox_2d_volume()) | ||
| target3d = Detection3DPC.from_2d( | ||
| best, | ||
| world_pointcloud=pointcloud, | ||
| camera_info=self.config.camera_info, | ||
| world_to_optical_transform=transform, | ||
| ) | ||
| if target3d is None: | ||
| return f"Found a {label} but could not compute its 3D position." | ||
|
|
||
| pose = target3d.pose | ||
| self.goal_request.publish(pose) | ||
| return ( | ||
| f"Found a {label} at ({pose.position.x:.2f}, {pose.position.y:.2f}). " | ||
| "Navigating there now." | ||
| ) | ||
|
|
||
| def _scan_in_place(self, selector: Any) -> tuple[ImageDetections2D | None, list[Detection2DBBox]]: |
There was a problem hiding this comment.
Temporal mismatch between YOLO detections and pointcloud for 3D projection
_scan_in_place returns the detections captured at the moment the scan stopped. self.pointcloud.get_next() then fetches the next pointcloud to arrive — which is from a later timestamp. Detection3DPC.from_2d fuses the 2-D bbox (from the older frame) with the newer world cloud, potentially projecting the seat to the wrong 3D position if the robot or the scene moved between the two frames.
| #!/usr/bin/env python3 | ||
| """Standalone webcam test for the YOLO empty-seat logic (no robot needed). | ||
|
|
||
| Mirrors SeatFinderSkill's detection: YOLO detects chairs/couches and people, | ||
| then a seat is "occupied" if a person box overlaps it past a threshold. | ||
|
|
||
| Usage: | ||
| .venv/bin/python seat_check_webcam.py [--camera 0] | ||
|
|
||
| Overlay: |
There was a problem hiding this comment.
Test script placed at repository root
seat_check_webcam.py lives at the repo root rather than under a tests/, scripts/, or examples/ directory. Consider moving it to a location that signals "this is not shipped code".
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
feat(seat-guide): add Go2 hardware guidance flow
| @rpc | ||
| def stop(self) -> None: | ||
| if self._subscription is not None: | ||
| self._subscription.dispose() | ||
| self._subscription = None | ||
| super().stop() |
There was a problem hiding this comment.
Voice worker keeps publishing after
stop()
stop() disposes the frame subscription and the human-input subscription (via register_disposable), but it does not signal or await the _voice_worker daemon thread. If stop() is called while a voice worker is mid-flight inside find_empty_seat_now(), the worker will continue executing — including the self.goal_request.publish(pose) call — on a module that is nominally stopped. Additionally, _voice_busy is not reset on stop, so if the worker is still running when the module is restarted and a new keyword fires, _on_human_input will silently drop the trigger ("already running") until the old worker completes.
SeatFinderSkill: continuous YOLO-based empty-seat detector with in-place scan (slow yaw rate + 10Hz cmd_vel publishing) that surfaces the nearest unoccupied chair/couch/bench using person-box overlap as occupancy test. SeatPlanner: on-demand variant for the manual-map flow — operator drives the Go2 to build the voxel map, then triggers find_empty_seat_now via MCP; the planner projects the chosen 2D detection to a 3D PoseStamped on goal_request without publishing any cmd_vel itself.
unitree_go2_guide: slim "guide dog" blueprint that leads a leashed person to an empty seat. Drops SecurityModule, SpatialMemory, PerceiveLoopSkill and PersonFollowSkill to keep GPU/compute light; keeps nav stack, agent, voice I/O and SeatFinder. unitree_go2_seat_demo (+ _record, _reuse): manual-map → on-demand YOLO seat-find blueprint variants wired to SeatPlanner via McpServer. The record/reuse variants attach a RecordingModule for capturing and replaying the manual mapping pass.
Webcam-only test for the empty-seat logic — no robot needed. Mirrors
SeatFinderSkill's detection (YOLO chairs/couches + people, occupancy
via person-box overlap) and renders an annotated overlay:
green = empty seat, red = occupied seat, blue = person.
Usage:
.venv/bin/python seat_check_webcam.py [--camera 0]
39 annotated YOLO frames (~7MB) from seat_check_webcam.py and the live SeatFinder runs, kept for repro and regression debugging of the empty-seat detection logic.
Problem
For the hackathon, we were missing an "empty-seat guidance" feature on the
G1/Go2 stack. We needed a minimal loop where the robot detects an empty
seat from the camera, then either guides a person there or walks there
autonomously.
Closes DIM-XXX
Solution
Adds YOLO-based empty-seat detection (
SeatFinderSkill/SeatPlanner)and three Go2 agentic blueprints that use it.
Split into 4 commits, one module per commit:
37687aa47dimos/agents/skills/—seat_finder.py(continuous YOLO + in-place scan) andseat_planner.py(on-demand, 3D-projection only)fa5159165dimos/robot/unitree/go2/blueprints/agentic/—unitree_go2_guide.py(guide-dog mode),unitree_go2_seat_demo.py(+_record/_reuse)b483f95dbseat_check_webcam.py— standalone webcam test, no robot needed3fa5d633aseat_finder_debug/Design choices:
SeatFinderSkillpublishes cmd_vel for the in-place scan (SCAN_YAW_RATE = 0.5 rad/s, 10 Hz tick, ~14 s for one revolution).SeatPlannernever publishes cmd_vel — only emitsgoal_request. Used in the manual-map → MCP-trigger flow, which is the safer path.unitree_go2_guidedrops SecurityModule / SpatialMemory / PerceiveLoopSkill / PersonFollowSkill to lower GPU load (not needed for guiding).OCCUPANCY_OVERLAP = 0.2— a seat is "occupied" if a person bbox covers ≥20% of its area.How to Test
A. Webcam only (no robot):
.venv/bin/python seat_check_webcam.py [--camera 0]
Overlay: green = empty seat, red = occupied, blue = person.
B. Manual map → MCP trigger (real Go2):
unitree_go2_seat_demoblueprint.dimos mcp call find_empty_seat_now
SeatPlanner picks an empty seat → projects it to 3D → publishes
goal_request→ A* plans the path.C. Guide-dog mode:
python -m dimos.robot.unitree.go2.blueprints.agentic.unitree_go2_guide
Agent + voice I/O + SeatFinder + nav stack guide a leashed person to an empty seat.