[MuShanghai DimOS Hackathon 2026] Goldie — Team Perception#2294
Open
mkuwdev wants to merge 22 commits into
Open
[MuShanghai DimOS Hackathon 2026] Goldie — Team Perception#2294mkuwdev wants to merge 22 commits into
mkuwdev wants to merge 22 commits into
Conversation
fix normal macos
Drops the working Goldie webapp into /webapp verbatim (Next 16 Pages Router, Tailwind v4, on-device STT + TTS, voice/manual modes, joystick, PWA) so it keeps working as-is. The previous App-Router stub is preserved in webapp/SCAFFOLD-REFERENCE.md together with the monorepo backend's agent_state/token contract for later merging. Adds webapp/AGENTS.md briefing. Also drops the build artifacts (.next) the scaffold had committed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Browser SpeechSynthesis silently drops async speak() on iOS (agent replies arrive over SSE, outside a user gesture), so responses were shown but never heard. Replace it with OpenAI gpt-4o-mini-tts behind a server-side /api/tts route, played through a single <audio> element unlocked by one in-gesture tap, which iOS allows for async playback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
/api/tts imports the openai SDK; un-ignore webapp/package.json so the dependency is reproducible from a clean clone / deploy build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…quick actions
Speak only the agent's final replies:
- classify `{kind:"ai"|"tool"|"system"}` envelopes and TTS only `kind==="ai"`;
tool/system stay on screen but silent. Mock now emits the same envelopes for
dev/prod parity.
TTS voice tuning:
- default to a female voice (`shimmer`); steer pace via gpt-4o-mini-tts
`instructions` (numeric `speed` isn't honored by this model). Documented in
`.env.example`.
Barge-in UX:
- while the user is holding to speak, suppress incoming TTS so the previous
turn's reply doesn't talk over them.
Network robustness (`dimos.ts`):
- `fetchWithRetry` (8s timeout + retries) on submit_query / unitree command /
interrupt; timeout-only on status polling. Commands now throw on failure,
and the UI shows "Couldn't reach the robot — try again" instead of silently
swallowing the loss.
STT hardening:
- tear down the previous SpeechRecognition session before starting a new one
(no overlapping mic sessions / stale callbacks); `onend` ignores superseded
sessions; start/stop wrapped against iOS state-throwing.
- map real error codes to user-facing messages (`no-speech` / `network` /
`aborted` no longer mislabeled "check mic permission"); `aborted` stays
silent.
Quick actions:
- buttons now Sit / Jump / Stand, sent through `/submit_query` as
natural-language commands ("sit"/"jump"/"stand up") so the agent narrates
them like voice turns.
Dev diagnostics:
- per-frame `📩 agent-msg [kind=..]` log and `🔊 tts(SPOKEN|muted, kind=..)`
log printed to the dev terminal; `🎤⚠️ stt-error: <code>` for STT errors.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
webapp: replace scaffold with Goldie (voice + manual control app)
WebInput previously created an empty `agent_responses` text stream and never
fed it. Subscribe to the agent's LCM "/agent" channel and forward each
`BaseMessage` to `agent_responses` as a typed JSON envelope:
{"kind": "ai" | "tool" | "system", "text": "<content>"}
`human` messages (the echoed user input) are skipped. The webapp reads this
envelope and speaks only `kind == "ai"` (tool/system are shown as status).
Also: STT is now a hard dependency (the try/except around WhisperNode is
gone), and audio_subject is no longer optional.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WebInput previously created an empty `agent_responses` text stream and never
fed it. Subscribe to the agent's LCM "/agent" channel and forward each
`BaseMessage` to `agent_responses` as a typed JSON envelope:
{"kind": "ai" | "tool" | "system", "text": "<content>"}
`human` messages (the echoed user input) are skipped. The webapp reads this
envelope and speaks only `kind == "ai"` (tool/system are shown as status).
Also: STT is now a hard dependency (the try/except around WhisperNode is
gone), and audio_subject is no longer optional.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
backend: forward LLM agent replies on agent_responses as typed envelopes
- Replace upstream DimOS README with Goldie hackathon submission doc covering story, what we built, architecture, stair-climbing achievement, challenges, and quick start - Add docs/goldie-architecture.png - Add docs/screenshots/ (splash, voice, manual mode) - Add webapp/TECHFLOW.md (full end-to-end channel trace) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Goldie webapp
docs: update README with team name, video link, numbered challenges, …
docs: fix demo video link
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hackathon submission for the MuShanghai DimOS Hackathon 2026 by Team Perception
(Joy Munn, Yichu Lau, Cecilia Zhang, Brecht Davos, Figo Saleh).
We built Goldie — a voice-controlled guide-dog interface for the Unitree Go2, designed for low-vision and blind users. Speak a destination, the dog confirms out loud and navigates you there, narrating along the way.
What's included
webapp/— iPhone PWA with hold-to-speak voice control, OpenAI TTS replies, manual joystick teleop, and a full mock for development without a robotLinks