Skip to content

Feat/find seat#2299

Open
Ueti999 wants to merge 20 commits into
dimensionalOS:mainfrom
cheer-up-hackathon:feat/find-seat
Open

Feat/find seat#2299
Ueti999 wants to merge 20 commits into
dimensionalOS:mainfrom
cheer-up-hackathon:feat/find-seat

Conversation

@Ueti999
Copy link
Copy Markdown

@Ueti999 Ueti999 commented May 28, 2026

Problem

For the hackathon, we were missing an "empty-seat guidance" feature on the
G1/Go2 stack. We needed a minimal loop where the robot detects an empty
seat from the camera, then either guides a person there or walks there
autonomously.

Closes DIM-XXX

Solution

Adds YOLO-based empty-seat detection (SeatFinderSkill / SeatPlanner)
and three Go2 agentic blueprints that use it.

Split into 4 commits, one module per commit:

# Commit What
1 37687aa47 dimos/agents/skills/seat_finder.py (continuous YOLO + in-place scan) and seat_planner.py (on-demand, 3D-projection only)
2 fa5159165 dimos/robot/unitree/go2/blueprints/agentic/unitree_go2_guide.py (guide-dog mode), unitree_go2_seat_demo.py (+ _record / _reuse)
3 b483f95db seat_check_webcam.py — standalone webcam test, no robot needed
4 3fa5d633a 39 debug capture JPGs under seat_finder_debug/

Design choices:

  • SeatFinderSkill publishes cmd_vel for the in-place scan (SCAN_YAW_RATE = 0.5 rad/s, 10 Hz tick, ~14 s for one revolution).
  • SeatPlanner never publishes cmd_vel — only emits goal_request. Used in the manual-map → MCP-trigger flow, which is the safer path.
  • unitree_go2_guide drops SecurityModule / SpatialMemory / PerceiveLoopSkill / PersonFollowSkill to lower GPU load (not needed for guiding).
  • Occupancy test: OCCUPANCY_OVERLAP = 0.2 — a seat is "occupied" if a person bbox covers ≥20% of its area.

How to Test

A. Webcam only (no robot):
.venv/bin/python seat_check_webcam.py [--camera 0]
Overlay: green = empty seat, red = occupied, blue = person.

B. Manual map → MCP trigger (real Go2):

  1. Launch the unitree_go2_seat_demo blueprint.
  2. Drive the Go2 manually — click-to-goal in Rerun or keyboard teleop — to build the voxel map.
  3. From another terminal:
    dimos mcp call find_empty_seat_now
    SeatPlanner picks an empty seat → projects it to 3D → publishes goal_request → A* plans the path.

C. Guide-dog mode:
python -m dimos.robot.unitree.go2.blueprints.agentic.unitree_go2_guide
Agent + voice I/O + SeatFinder + nav stack guide a leashed person to an empty seat.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 28, 2026

Greptile Summary

Adds an end-to-end empty-seat guidance feature to the Go2 stack: YOLO-based detection (SeatFinderSkill, SeatPlanner, SeatGuideSkillContainer), three agentic blueprints, a Vercel-hosted TTS relay for phone speaker output, OpenRouter model routing in McpClient, and supporting infrastructure (SHM→Rerun forwarding, optional SpatialMemory, MCP SSE fan-out).

  • SeatFinderSkill / SeatPlanner: continuous YOLO detection on a sharpness-filtered stream, in-place scan loop with cmd_vel publishing, and 3D projection via pointcloud; prior review thread covers the blocking get_next(), non-interruptible scan loop, timestamp mismatch, and _voice_worker lifecycle issues.
  • Vercel speaker app (apps/seat-guide-speaker-vercel): serverless relay that stores the latest TTS message in-process memory and serves it to a polling browser page; the Authorization header sent by the robot is never validated by the handler.
  • McpClient OpenRouter routing: new helper functions correctly select the OpenRouter base URL and headers (X-OpenRouter-Title is the current canonical name); a global OPENROUTER_MODEL env-var override can silently replace the explicit model name with no log warning.

Confidence Score: 4/5

Safe to merge for hackathon/demo use; the Vercel speaker endpoint is publicly writable (token is sent but never checked), and direct_move exposes unbounded velocity commands to the LLM — both are low-impact for a demo context but worth addressing before production use.

The core detection and navigation logic is well-structured. The remaining open issues from the prior review thread (scan-loop interruptibility, blocking get_next(), voice-worker lifecycle) are not yet addressed. Two new findings in this pass — the unauthenticated Vercel TTS endpoint and the unclamped direct_move duration — add to the list but are also demo-scope concerns rather than blocking correctness bugs.

dimos/agents/skills/seat_finder.py and dimos/agents/skills/seat_planner.py still carry the threading issues noted in prior rounds; apps/seat-guide-speaker-vercel/api/[...speaker].js needs a token validation pass before any non-demo deployment; dimos/robot/unitree/unitree_skill_container.py (direct_move) would benefit from a duration cap.

Security Review

  • Unauthenticated /api/speak endpoint (apps/seat-guide-speaker-vercel/api/[...speaker].js): The robot sends Authorization: Bearer <token> but the serverless handler never reads or validates that header. Any caller with the Vercel URL can inject arbitrary TTS text without a token. The SEAT_GUIDE_SPEAKER_TOKEN env var creates a misleading impression of access control that does not exist server-side.

Important Files Changed

Filename Overview
dimos/agents/skills/seat_finder.py New YOLO-based seat finder skill with in-place scan loop; pointcloud.get_next() has no timeout (blocks forever if stream is unavailable) and the scan loop cannot be interrupted by stop() — both flagged in prior review thread.
dimos/agents/skills/seat_planner.py On-demand seat planner with voice-trigger worker; _voice_worker thread is not joined on stop() and _voice_busy lock is not reset, allowing stale-worker collisions on restart — flagged in prior thread.
apps/seat-guide-speaker-vercel/api/[...speaker].js Vercel serverless TTS relay; the /api/speak POST endpoint never validates the Authorization header despite SEAT_GUIDE_SPEAKER_TOKEN being configured on the robot side.
dimos/agents/mcp/mcp_client.py Adds OpenRouter routing logic with correct header names (X-OpenRouter-Title) and proper fallback chain; routing logic is sound, though OPENROUTER_MODEL env var silently overrides the explicit model config when both are set.
dimos/robot/unitree/unitree_skill_container.py Adds direct_move and updates relative_move with x/y compat aliases; direct_move is exposed as an LLM-callable @skill with no duration upper bound, allowing the robot to run indefinitely under adversarial or errant prompts.
dimos/agents/skills/seat_guide.py Large new skill container with planner, navigation helpers, and VLM-based observation provider; logic appears clean with proper locking, though shares duplicated _is_occupied / annotation helpers with seat_finder.py.
dimos/agents/web_human_input.py Rewired to import SeatGuideRequestSpec and route voice/text directly to SeatGuide; the cloud speaker POST runs synchronously in the subscriber callback which blocks for up to 5 s if the endpoint is down, but this is low-frequency.
dimos/agents/mcp/mcp_server.py Adds SSE fan-out endpoint and agent_send skill; clean implementation with proper queue cleanup on stop and keepalive pings.
dimos/core/coordination/module_coordinator.py Adds SHM transport forwarding to RerunBridgeModule after stream wiring; logic is straightforward and guarded by an early return when the bridge isn't deployed.
dimos/agents/skills/navigation.py Makes _spatial_memory optional with early-return guards; changes are minimal and correct.

Sequence Diagram

sequenceDiagram
    participant Browser as iPhone Browser
    participant Vercel as Vercel /api
    participant Robot as Go2 (WebInput)
    participant MCP as McpServer / McpClient
    participant Skill as SeatFinderSkill / SeatPlanner
    participant Nav as A* Planner

    Robot->>Vercel: "POST /api/speak {text, device, Authorization: Bearer token}"
    Note over Vercel: Token NOT validated
    Vercel-->>Robot: 200 OK
    Browser->>Vercel: "GET /api/latest?device=go2-demo (polls 700ms)"
    Vercel-->>Browser: "{message: {text, id}}"
    Browser->>Browser: speechSynthesis.speak(text)

    Robot->>MCP: human_input / dimos mcp call find_empty_seat_now
    MCP->>Skill: RPC find_empty_seat_now() / find_empty_seat()
    Skill->>Skill: _scan_in_place() — publish cmd_vel yaw
    Skill->>Skill: _select_empty_seats(detections)
    Skill->>Skill: pointcloud.get_next()
    Skill->>Skill: Detection3DPC.from_2d(best, pointcloud)
    Skill->>Nav: goal_request.publish(pose)
    Nav-->>Robot: "A* path → robot walks to seat"
Loading

Reviews (3): Last reviewed commit: "chore: clean up seat finder branch artif..." | Re-trigger Greptile

return f"Could not resolve the camera transform, cannot locate the {label}."

best = max(candidates, key=lambda d: d.bbox_2d_volume())
target3d = Detection3DPC.from_2d(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 get_next() without a timeout can block forever

self.pointcloud.get_next() is called with no timeout argument. If the pointcloud stream is slow or disconnected the skill thread hangs indefinitely, holding any scan motion lock and making the robot unresponsive. SeatPlanner already uses self.pointcloud.get_next(timeout=2.0) — the same guard should be applied here.

if detections is None or not candidates:
return f"No {label} found after looking around."

pointcloud = self.pointcloud.get_next()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Add a timeout to avoid blocking indefinitely when the pointcloud stream is unavailable, mirroring the pattern already used in SeatPlanner.

Suggested change
pointcloud = self.pointcloud.get_next()
try:
pointcloud = self.pointcloud.get_next(timeout=2.0)
except Exception as e:
return f"Pointcloud unavailable: {e}"

Comment on lines +182 to +205
linear=Vector3(0.0, 0.0, 0.0), angular=Vector3(0.0, 0.0, SCAN_YAW_RATE)
)
try:
while True:
with self._lock:
detections = self._latest
candidates = selector(detections.detections) if detections is not None else []
if candidates or time.time() >= deadline:
return detections, candidates
now = time.time()
if now >= next_log:
seen = [d.name for d in detections.detections] if detections else []
logger.info(f"SeatFinder scan: rotating, currently seeing {seen}")
next_log = now + SCAN_LOG_EVERY
self.cmd_vel.publish(yaw)
time.sleep(SCAN_TICK)
finally:
self.cmd_vel.publish(Twist.zero())

def _select_empty_seats(self, detections: list[Detection2DBBox]) -> list[Detection2DBBox]:
seats = [d for d in detections if d.name in SEAT_CLASSES]
persons = [d for d in detections if d.name == "person"]
return [s for s in seats if not self._is_occupied(s, persons)]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Scan loop cannot be interrupted by stop()

_scan_in_place loops with time.sleep(SCAN_TICK) and only exits when a candidate is found or the 14-second deadline expires. stop() publishes Twist.zero() once, but the next iteration of the spin loop immediately overwrites it with the yaw command — so the robot keeps rotating for up to 14 s after an emergency stop or user cancellation. A threading Event should be set in stop() and checked (or used as a sleep target) inside the loop.

Comment on lines +152 to +176
return f"No {label} found after looking around."

pointcloud = self.pointcloud.get_next()
transform = self.tf.get("camera_optical", pointcloud.frame_id, detections.image.ts, 5.0)
if not transform:
return f"Could not resolve the camera transform, cannot locate the {label}."

best = max(candidates, key=lambda d: d.bbox_2d_volume())
target3d = Detection3DPC.from_2d(
best,
world_pointcloud=pointcloud,
camera_info=self.config.camera_info,
world_to_optical_transform=transform,
)
if target3d is None:
return f"Found a {label} but could not compute its 3D position."

pose = target3d.pose
self.goal_request.publish(pose)
return (
f"Found a {label} at ({pose.position.x:.2f}, {pose.position.y:.2f}). "
"Navigating there now."
)

def _scan_in_place(self, selector: Any) -> tuple[ImageDetections2D | None, list[Detection2DBBox]]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Temporal mismatch between YOLO detections and pointcloud for 3D projection

_scan_in_place returns the detections captured at the moment the scan stopped. self.pointcloud.get_next() then fetches the next pointcloud to arrive — which is from a later timestamp. Detection3DPC.from_2d fuses the 2-D bbox (from the older frame) with the newer world cloud, potentially projecting the seat to the wrong 3D position if the robot or the scene moved between the two frames.

Comment thread demo_seat_check_webcam.py
Comment on lines +1 to +10
#!/usr/bin/env python3
"""Standalone webcam test for the YOLO empty-seat logic (no robot needed).

Mirrors SeatFinderSkill's detection: YOLO detects chairs/couches and people,
then a seat is "occupied" if a person box overlaps it past a threshold.

Usage:
.venv/bin/python seat_check_webcam.py [--camera 0]

Overlay:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Test script placed at repository root

seat_check_webcam.py lives at the repo root rather than under a tests/, scripts/, or examples/ directory. Consider moving it to a location that signals "this is not shipped code".

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +106 to +111
@rpc
def stop(self) -> None:
if self._subscription is not None:
self._subscription.dispose()
self._subscription = None
super().stop()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Voice worker keeps publishing after stop()

stop() disposes the frame subscription and the human-input subscription (via register_disposable), but it does not signal or await the _voice_worker daemon thread. If stop() is called while a voice worker is mid-flight inside find_empty_seat_now(), the worker will continue executing — including the self.goal_request.publish(pose) call — on a module that is nominally stopped. Additionally, _voice_busy is not reset on stop, so if the worker is still running when the module is restarted and a new keyword fires, _on_human_input will silently drop the trigger ("already running") until the old worker completes.

huaruic and others added 6 commits May 29, 2026 12:09
SeatFinderSkill: continuous YOLO-based empty-seat detector with in-place
scan (slow yaw rate + 10Hz cmd_vel publishing) that surfaces the nearest
unoccupied chair/couch/bench using person-box overlap as occupancy test.

SeatPlanner: on-demand variant for the manual-map flow — operator drives
the Go2 to build the voxel map, then triggers find_empty_seat_now via
MCP; the planner projects the chosen 2D detection to a 3D PoseStamped on
goal_request without publishing any cmd_vel itself.
unitree_go2_guide: slim "guide dog" blueprint that leads a leashed person
to an empty seat. Drops SecurityModule, SpatialMemory, PerceiveLoopSkill
and PersonFollowSkill to keep GPU/compute light; keeps nav stack, agent,
voice I/O and SeatFinder.

unitree_go2_seat_demo (+ _record, _reuse): manual-map → on-demand YOLO
seat-find blueprint variants wired to SeatPlanner via McpServer. The
record/reuse variants attach a RecordingModule for capturing and replaying
the manual mapping pass.
Webcam-only test for the empty-seat logic — no robot needed. Mirrors
SeatFinderSkill's detection (YOLO chairs/couches + people, occupancy
via person-box overlap) and renders an annotated overlay:
green = empty seat, red = occupied seat, blue = person.

Usage:
    .venv/bin/python seat_check_webcam.py [--camera 0]
39 annotated YOLO frames (~7MB) from seat_check_webcam.py and the live
SeatFinder runs, kept for repro and regression debugging of the
empty-seat detection logic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants