[MuShanghai DimOS Hackathon 2026] Goldie — Team Perception by mkuwdev · Pull Request #2294 · dimensionalOS/dimos

mkuwdev · 2026-05-28T16:00:46Z

Hackathon submission for the MuShanghai DimOS Hackathon 2026 by Team Perception
(Joy Munn, Yichu Lau, Cecilia Zhang, Brecht Davos, Figo Saleh).

We built Goldie — a voice-controlled guide-dog interface for the Unitree Go2, designed for low-vision and blind users. Speak a destination, the dog confirms out loud and navigates you there, narrating along the way.

What's included

webapp/ — iPhone PWA with hold-to-speak voice control, OpenAI TTS replies, manual joystick teleop, and a full mock for development without a robot
Typed agent message envelopes — so the phone knows what to speak vs show
Direct move skill with stall recovery — enabling the Go2 to climb stairs
macOS support fixes for the full DimOS stack

Links

fix normal macos

Drops the working Goldie webapp into /webapp verbatim (Next 16 Pages Router, Tailwind v4, on-device STT + TTS, voice/manual modes, joystick, PWA) so it keeps working as-is. The previous App-Router stub is preserved in webapp/SCAFFOLD-REFERENCE.md together with the monorepo backend's agent_state/token contract for later merging. Adds webapp/AGENTS.md briefing. Also drops the build artifacts (.next) the scaffold had committed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Browser SpeechSynthesis silently drops async speak() on iOS (agent replies arrive over SSE, outside a user gesture), so responses were shown but never heard. Replace it with OpenAI gpt-4o-mini-tts behind a server-side /api/tts route, played through a single <audio> element unlocked by one in-gesture tap, which iOS allows for async playback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

/api/tts imports the openai SDK; un-ignore webapp/package.json so the dependency is reproducible from a clean clone / deploy build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

stairs

…quick actions Speak only the agent's final replies: - classify `{kind:"ai"|"tool"|"system"}` envelopes and TTS only `kind==="ai"`; tool/system stay on screen but silent. Mock now emits the same envelopes for dev/prod parity. TTS voice tuning: - default to a female voice (`shimmer`); steer pace via gpt-4o-mini-tts `instructions` (numeric `speed` isn't honored by this model). Documented in `.env.example`. Barge-in UX: - while the user is holding to speak, suppress incoming TTS so the previous turn's reply doesn't talk over them. Network robustness (`dimos.ts`): - `fetchWithRetry` (8s timeout + retries) on submit_query / unitree command / interrupt; timeout-only on status polling. Commands now throw on failure, and the UI shows "Couldn't reach the robot — try again" instead of silently swallowing the loss. STT hardening: - tear down the previous SpeechRecognition session before starting a new one (no overlapping mic sessions / stale callbacks); `onend` ignores superseded sessions; start/stop wrapped against iOS state-throwing. - map real error codes to user-facing messages (`no-speech` / `network` / `aborted` no longer mislabeled "check mic permission"); `aborted` stays silent. Quick actions: - buttons now Sit / Jump / Stand, sent through `/submit_query` as natural-language commands ("sit"/"jump"/"stand up") so the agent narrates them like voice turns. Dev diagnostics: - per-frame `📩 agent-msg [kind=..]` log and `🔊 tts(SPOKEN|muted, kind=..)` log printed to the dev terminal; `🎤⚠️ stt-error: <code>` for STT errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

webapp: replace scaffold with Goldie (voice + manual control app)

WebInput previously created an empty `agent_responses` text stream and never fed it. Subscribe to the agent's LCM "/agent" channel and forward each `BaseMessage` to `agent_responses` as a typed JSON envelope: {"kind": "ai" | "tool" | "system", "text": "<content>"} `human` messages (the echoed user input) are skipped. The webapp reads this envelope and speaks only `kind == "ai"` (tool/system are shown as status). Also: STT is now a hard dependency (the try/except around WhisperNode is gone), and audio_subject is no longer optional. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

backend: forward LLM agent replies on agent_responses as typed envelopes

- Replace upstream DimOS README with Goldie hackathon submission doc covering story, what we built, architecture, stair-climbing achievement, challenges, and quick start - Add docs/goldie-architecture.png - Add docs/screenshots/ (splash, voice, manual mode) - Add webapp/TECHFLOW.md (full end-to-end channel trace) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Goldie webapp

…team members

docs: update README with team name, video link, numbered challenges, …

docs: fix demo video link

CeciliaZ030 and others added 22 commits May 27, 2026 01:47

update

c913b62

mcp optimization

6889392

webapp scafold

106afa6

fix normal macos

1985fb6

Merge pull request #1 from CeciliaZ030/yc/fix-macos

f142e71

fix normal macos

webapp: track package.json so the openai TTS dep is captured

7144af2

/api/tts imports the openai SDK; un-ignore webapp/package.json so the dependency is reproducible from a clean clone / deploy build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix

dbcecc3

stairs update

bfd889f

Merge pull request #5 from CeciliaZ030/yc/stairs

337c026

stairs

Merge pull request #4 from CeciliaZ030/goldie-webapp

2b9cb36

webapp: replace scaffold with Goldie (voice + manual control app)

Merge pull request #6 from CeciliaZ030/goldie-agent-envelopes

aebb67c

backend: forward LLM agent replies on agent_responses as typed envelopes

Merge pull request #7 from CeciliaZ030/goldie-webapp

25b076a

Goldie webapp

docs: update README with team name, video link, numbered challenges, …

fad6445

…team members

Merge pull request #8 from CeciliaZ030/goldie-webapp

d22d640

docs: update README with team name, video link, numbered challenges, …

docs: fix demo video link

44e0d40

Merge pull request #9 from CeciliaZ030/goldie-webapp

f8049ea

docs: fix demo video link

mkuwdev requested review from arkluc, leshy, mustafab0, paul-nechifor and spomichter as code owners May 28, 2026 16:00

leshy added the hackaton label May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MuShanghai DimOS Hackathon 2026] Goldie — Team Perception#2294

[MuShanghai DimOS Hackathon 2026] Goldie — Team Perception#2294
mkuwdev wants to merge 22 commits into
dimensionalOS:mainfrom
CeciliaZ030:main

mkuwdev commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mkuwdev commented May 28, 2026

What's included

Links

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants