Drop-in Guide: agentic 3-phase guide-robot blueprint for Unitree Go2 (muShanghai 2026)#2289
Drop-in Guide: agentic 3-phase guide-robot blueprint for Unitree Go2 (muShanghai 2026)#2289arome3 wants to merge 4 commits into
Conversation
…eprint for Unitree Go2
Submission for muShanghai 2026 (Dimensional Robot Hackathon) — Agents track,
with secondary fit in Autonomy & Navigation.
The blueprint adds the **interaction-intelligence layer** on top of dimOS's
existing motion + task layers, turning the Go2 into a zero-prep guide robot
for unfamiliar buildings.
Phases:
1. Generative priming — operator drives the Go2; at each spot it observes
the scene via `describe_scene` (OpenAI Vision), proposes a tag name via
`speak`, and the operator confirms with a single keypress. Every
confirmation grounds a `tag_location` call.
2. Guided navigation — visitor says "take me to X"; Claude logs the
grounding decision (`log_nav_decision`), waves Hello, sets the goal via
`navigate_with_text`, and auto-waves on arrival when the planner's
/goal_reached fires (new `ArrivalAnnouncerSkill`).
3. Reactive Q&A — `list_tagged_places`, `narrate_tour`, `what_did_you_skip`.
New experimental skills:
- arrival_announcer_skill — subscribes to /goal_reached and injects a
synthetic [SYSTEM ARRIVAL EVENT] into McpClient's human_input so the
agent auto-greets on arrival
- decision_audit_skill — JSONL nav-decision trace for grounded review
- lead_with_follow_skill — pauses + speaks "I'll wait for you" when the
visitor lags behind
- reactive_qa_skills — list/narrate/skip-recall helpers
- scene_caption_skill — synchronous OpenAI Vision captioning that Claude
can use directly during priming (vs async `observe`)
- teleop_velocity_skill — Out[Twist] WASD teleop that bypasses the nav
planner; useful when costmap refuses pure-rotation or pure-backward
goals during priming
The blueprint composes from `unitree_go2` (not `unitree_go2_spatial`) to
avoid CUDA-required modules (EdgeTAM/SecurityModule) on Apple Silicon.
NOTE: depends on PR dimensionalOS#2245 (camera fix) for live Go2 hardware demo.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Greptile SummaryThis follow-up commit addresses the P1 and P2 findings from the initial Drop-in Guide review: the teleop blocking-sleep was moved off the agent's tool-call thread, class-level mutable defaults were removed from
Confidence Score: 3/5The fix commit resolves most issues from the initial review, but ArrivalAnnouncerSkill still has an unregistered subscription that can inject duplicate arrival prompts on module restart — the exact pattern that was patched in SceneCaptionSkill in the same commit. Most previous findings are now addressed: the teleop blocking-sleep is gone, class-level mutable defaults are removed, and scene_caption's subscription is properly disposed. However, ArrivalAnnouncerSkill.start() still calls self.goal_reached.subscribe() without register_disposable — on stop/restart the old callback is not unsubscribed. This means every goal_reached pulse fires the callback twice, injecting two [SYSTEM ARRIVAL EVENT] prompts and causing the agent to speak two arrival lines back-to-back. The fix is a one-liner mirroring what was already done for SceneCaptionSkill in this same commit. dimos/experimental/arrival_announcer_skill.py — subscription registered without register_disposable, leaking on stop/restart Important Files Changed
Sequence DiagramsequenceDiagram
participant Visitor
participant McpClient as Claude (McpClient)
participant ArrivalAnnouncer as ArrivalAnnouncerSkill
participant NavPlanner as Nav Planner
participant Speak as SpeakSkill
Visitor->>McpClient: "take me to the copier"
McpClient->>McpClient: log_nav_decision(...)
McpClient->>Speak: speak("Going to the copier — follow me.")
McpClient->>McpClient: execute_sport_command("Hello")
McpClient->>NavPlanner: navigate_with_text("copier")
NavPlanner-->>McpClient: goal accepted
Note over NavPlanner,ArrivalAnnouncer: Robot navigating...
alt Visitor says "wait"
Visitor->>McpClient: "wait, I'm behind"
McpClient->>NavPlanner: stop_navigation()
McpClient->>Speak: speak("I'll wait for you here.")
Visitor->>McpClient: "okay, I'm ready"
McpClient->>NavPlanner: navigate_with_text("copier")
end
NavPlanner->>ArrivalAnnouncer: "goal_reached=True"
ArrivalAnnouncer->>McpClient: human_input("[SYSTEM ARRIVAL EVENT]")
McpClient->>Speak: speak("Here we are at the copier. Anything else?")
McpClient->>McpClient: execute_sport_command("Hello")
Reviews (4): Last reviewed commit: "fix(drop-in-guide): address Greptile rev..." | Re-trigger Greptile |
| system_prompt=DROP_IN_GUIDE_SYSTEM_PROMPT, | ||
| ), | ||
| _common_agentic, | ||
| ).global_config(n_workers=14, obstacle_avoidance=False) |
There was a problem hiding this comment.
obstacle_avoidance=False applies to all navigation, including Phase 2
global_config(obstacle_avoidance=False) is a blueprint-wide setting that turns off collision avoidance for every nav call — including the autonomous guided navigation in Phase 2 where the robot is moving through crowds with real visitors. The PR notes this was needed to prevent the robot from trotting in place during teleop priming (Phase 1), but the fix currently disables safety for the entire session. A robot guiding visitors through a building without obstacle avoidance will drive into people, furniture, or walls without stopping if the planner path passes through an obstacle.
| period = 1.0 / PUB_HZ | ||
|
|
||
| deadline = time.time() + dur | ||
| pubs = 0 | ||
| while time.time() < deadline: | ||
| self.tele_cmd_vel.publish(msg) | ||
| pubs += 1 | ||
| time.sleep(period) | ||
|
|
||
| # Stop — publish zero velocity a few times to be sure. | ||
| zero = Twist(Vector3(), Vector3()) | ||
| for _ in range(3): | ||
| self.tele_cmd_vel.publish(zero) | ||
| time.sleep(period) |
There was a problem hiding this comment.
Blocking sleep in skill body freezes the agent's tool-call thread
teleop_velocity holds the calling thread in a time.sleep() loop for up to 5 seconds. The agent (McpClient) invokes skills synchronously, so while this loop is running the agent cannot process any incoming message — including the visitor saying "wait, I'm behind", which is the signature interaction of Phase 2b. Any voice-pause command issued while a teleop move is in progress will be silently delayed until the full duration_s expires, making the pause-on-command guarantee unreliable.
| _tagged: list[dict[str, Any]] = [] | ||
| _skipped: list[dict[str, Any]] = [] |
There was a problem hiding this comment.
Mutable class-level list defaults shared across instances
_tagged and _skipped are declared as class attributes holding mutable lists. Python class-level mutables are shared by every instance of the class. start() does reinitialize them per-instance, but any code path that appends to these lists before start() is called — or any second ReactiveQASkills instance that misses its start() — will silently share state with other instances. The same pattern exists in DecisionAuditSkill._trace. The idiomatic fix is to initialize them in start() only (remove the class-level assignment).
| while not cancel.is_set(): | ||
| state = self._navigation.get_state() | ||
| if state == NavigationState.IDLE: | ||
| if self._navigation.is_goal_reached(): | ||
| logger.info(f"lead_to: arrived at '{destination}'") | ||
| return | ||
| if not paused: | ||
| # Nav exited idle without success — either failed or no | ||
| # goal was ever set by the agent. | ||
| logger.info( | ||
| f"lead_to: nav exited IDLE before reaching goal " | ||
| f"'{destination}'. The agent likely needs to call " | ||
| f"navigate_with_text/set_goal first." | ||
| ) | ||
| return |
There was a problem hiding this comment.
_lead_loop can exit immediately if nav hasn't transitioned from IDLE yet
lead_to is called right after navigate_with_text, but the loop's first get_state() call may still see NavigationState.IDLE if the planner hasn't had time to transition to NAVIGATING. In that case, since paused starts as False, the branch at line 116–124 fires and the function returns immediately with a "nav exited IDLE before reaching goal" log, abandoning follower monitoring for the whole trip. Adding a brief startup grace period or checking whether a goal is pending would prevent this false early exit.
Single-file HTML deck at assets/drop_in_guide_slides.html, styled to match the operator-console aesthetic (dark bg, mint accents, italic serif + monospace). Self-contained — open in any browser, print to PDF for slides delivery. Six slides: 01 cover 02 the thesis — inverting system integration 03 the three intelligences — motion / task / interaction 04 3-phase runtime — priming, guidance, reactive Q&A 05 defining gesture — voice-triggered pause + auto-arrival wave 06 submission summary — PR dimensionalOS#2289 + field-test results 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The HTML deck rendered to PDF for static slide delivery during the live judging session. Stored alongside the HTML source so reviewers can pick the format that fits their flow. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| @skill | ||
| def lead_to(self, destination: str) -> str: | ||
| """Lead a visitor to a destination, **pausing if they fall behind**. | ||
|
|
||
| Call this AFTER `log_nav_decision` and `speak`, in place of | ||
| `navigate_with_text`, when there is a visitor being guided (not just | ||
| a delivery). The robot will: | ||
|
|
||
| 1. Begin moving toward the destination using the same nav stack as | ||
| `navigate_with_text` (the agent should have already set the goal | ||
| via `navigate_with_text` before calling `lead_to`). | ||
| 2. Continuously check that the perception pipeline is producing | ||
| camera frames (proxy for "visitor still in view"). | ||
| 3. If frames stop arriving: cancel the nav goal, call `speak("I'll | ||
| wait for you")`, and poll for reacquisition. | ||
| 4. On reacquisition: return to the agent for a follow-up call. | ||
| 5. On arrival: return to the agent for a `speak("Here's the X")`. | ||
|
|
||
| This is the defining gesture that distinguishes Drop-in Guide from a | ||
| delivery bot. Use it whenever a person is following the robot. | ||
|
|
||
| Args: | ||
| destination: the tagged-location name to lead the visitor to. | ||
| Must match a name from `tag_location`. | ||
| """ | ||
| if not destination or not destination.strip(): | ||
| return "Error: destination is required." | ||
|
|
||
| with self._lead_lock: | ||
| if self._lead_thread is not None and self._lead_thread.is_alive(): | ||
| return f"Already leading somewhere. Call `stop_navigation` first." | ||
| self._stop_event = threading.Event() | ||
| self._lead_thread = threading.Thread( | ||
| target=self._lead_loop, | ||
| args=(destination.strip(), self._stop_event), | ||
| daemon=True, | ||
| name="LeadWithFollow", | ||
| ) | ||
| self._lead_thread.start() | ||
|
|
||
| return ( | ||
| f"Leading to '{destination}'. I'll pause if the perception " | ||
| f"pipeline loses sight of you and resume when it recovers." | ||
| ) |
There was a problem hiding this comment.
lead_to docstring promises speak call that never happens
The @skill docstring presented to Claude states "If frames stop arriving: cancel the nav goal, call speak("I'll wait for you"), and poll for reacquisition." But _lead_loop only calls self._navigation.cancel_goal() — there is no speak call anywhere in the thread. LeadWithFollowSkill holds no reference to a speak skill, so it physically cannot emit audio. In Phase 2c, the robot silently cancels the nav goal mid-trip with no audio feedback; visitors observe the robot stopping with no explanation, contradicting the "defining gesture" this module is meant to provide.
P1 fixes:
- teleop_velocity: move publish loop to background thread (_publish_burst).
Previously time.sleep() in the skill body held the McpClient's tool-call
thread for up to 5s, blocking the agent from processing the visitor's
"wait, I'm behind" command mid-burst — undermining the Phase 2b
voice-pause guarantee. Now the skill returns immediately and a single
burst thread publishes at PUB_HZ until duration or cancel. teleop_stop
also cancels any in-flight burst before publishing zeros.
- drop_in_guide blueprint: document the obstacle_avoidance=False tradeoff
as a hackathon-scope decision with a clear path to per-phase toggling
in production (track as top follow-up).
P2 fixes:
- ReactiveQASkills, DecisionAuditSkill: drop the class-level mutable list
defaults (_tagged, _skipped, _trace). start() already reinitializes
per-instance; the class-level assignment was a shared-state footgun.
- LeadWithFollowSkill: add _STARTUP_GRACE_S window so _lead_loop doesn't
exit immediately on IDLE before the planner transitions to NAVIGATING
after navigate_with_text. Was silently dropping every guided trip
where lead_to was called right after navigate_with_text.
P3 fixes:
- LeadWithFollowSkill.lead_to docstring: remove the self-contradiction
("in place of navigate_with_text" vs "the agent should have already
set the goal via navigate_with_text"). Always call AFTER, never instead.
- SceneCaptionSkill: register the color_image subscription as a disposable
so the _on_image callback is cleaned up on stop().
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks @greptile-apps — addressed all 6 in commit 6b59163. Triage: P1
P2
P3
The voice- |
|
All 6 fixes verified in 6b59163. Quick per-item confirmation: P1 ✅
P2 ✅
P3 ✅
One minor new item worth tracking post-hackathon: # teleop_velocity_skill.py
_burst_lock: threading.Lock = threading.Lock() # class-level mutable default
_burst_thread: threading.Thread | None = None
_burst_cancel: threading.Event | None = None
Otherwise the commit is clean — the Phase 2b voice-pause correctness guarantee is now structurally sound. LGTM. Tip: You can customize Greptile's behavior for this repo with |
muShanghai 2026 Hackathon Submission — Agents track
Drop-in Guide is an interaction-intelligence layer on dimOS that turns a Unitree Go2 into a zero-prep guide robot for unfamiliar buildings.
Submitted to the Agents track with secondary fit in Autonomy & Navigation.
What this PR adds
A new blueprint
drop-in-guideplus sixdimos/experimental/*_skill.pymodules that compose three coherent phases:describe_scene) to propose a tag name; operator confirms with one keypress;tag_locationfires.scene_caption_skill,teleop_velocity_skilllog_nav_decision), waves, callsnavigate_with_text. Auto-waves on arrival when planner's/goal_reachedfires. If visitor says "wait" mid-trip, robot pauses and resumes on "okay".arrival_announcer_skill, voice-wait prompt rulesreactive_qa_skillsValidated in replay (
data/go2_short.db)ArrivalAnnouncerSkill/goal_reachedsubscription wires correctly[SYSTEM ARRIVAL EVENT]triggersspeak+Hellowave in correct orderstop_navigation+speak("I'll wait for you here.")(signature line)log_nav_decision+navigate_with_text(resume same target)Architecture decisions
unitree_go2, notunitree_go2_spatial— avoids CUDA-required modules (EdgeTAM/SecurityModule) so the demo runs natively on Apple Silicon. Real Go2 hardware is the target.describe_sceneinstead of dimOS's default asyncobserve— Vision result returns inline to Claude, eliminating the tool_update race that made the priming dialogue brittle.teleop_velocity_skillpublishes directly toMovementManager.tele_cmd_vel, sidestepping the nav planner's costmap which refuses pure-rotation and pure-backward goals at low map quality during the first walkthrough./goal_reached→ synthetic human_input —ArrivalAnnouncerSkillsubscribes to the planner's arrival signal and injects a[SYSTEM ARRIVAL EVENT]prompt into McpClient'shuman_inputstream, so the agent auto-greets without a polling loop.query, matched_tier, confidence, targettonav_trace.jsonlfor grounded review. Methodology ported from operator-grounded verification work in healthcare AI.lead_with_follow_skill) is unreliable on macOS because the Go2 camera streams continuously regardless of where the visitor is. Voice-triggered pause (visitor says "wait") is more reliable and ships in the prompt as PHASE 2b.Field-debugging gotchas (worth a heads-up for reviewers)
Tested on real Go2 at venue:
obstacle_avoidance=True(default) silently blocked all forward velocity while letting rotation through — symptom was the robot trotting in place. The blueprint setsobstacle_avoidance=Falseinglobal_configto keep the demo flowing.danvi/experimental/route-replay-through-SHM) is a runtime dependency for the live demo — it contains a camera fix without whichdescribe_scenegets stale frames.sudo route add -net 224.0.0.0/4 -interface lo0after every WiFi event.How to run
Then either:
http://localhost:5555(Whisper + chat UI, ships with dimOS)Files added
dimos/robot/unitree/go2/blueprints/agentic/drop_in_guide.py— main blueprint composition + 3-phase system promptdimos/experimental/arrival_announcer_skill.pydimos/experimental/decision_audit_skill.pydimos/experimental/lead_with_follow_skill.pydimos/experimental/reactive_qa_skills.pydimos/experimental/scene_caption_skill.pydimos/experimental/teleop_velocity_skill.pyDROP_IN_GUIDE_README.md— full thesis, three-intelligences framing, demo arc, operator-console screenshotassets/drop_in_guide_operator_console.png— UI capture (LFS-opt-out via single .gitattributes line)dimos/robot/all_blueprints.py— registerdrop-in-guideblueprint +scene-caption-skillmoduleTest plan
dimos --replay --replay-db data/go2_short.db run drop-in-guide --daemon→ 22 modules startI→Y→ confirm tag vialist_tagged_places🤖 Generated with Claude Code