From fe668e6450fa87041cbb742987db2fe36879e981 Mon Sep 17 00:00:00 2001
From: "Wenjie F." <wenjiefu8@gmail.com>
Date: Fri, 29 May 2026 00:17:47 +0800
Subject: [PATCH 1/8] Add Fetch hackathon submission (Team Pivot)

---
 hackathon/fetch/README.md | 91 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 91 insertions(+)
 create mode 100644 hackathon/fetch/README.md

diff --git a/hackathon/fetch/README.md b/hackathon/fetch/README.md
new file mode 100644
index 0000000000..3a37c8c387
--- /dev/null
+++ b/hackathon/fetch/README.md
@@ -0,0 +1,91 @@
+# Fetch
+
+Hackathon submission from **Team Pivot** — Philip Seifi ([@seifip](https://github.com/seifip)), Wenjie Fu ([@Wenjix](https://github.com/Wenjix)), and GuoZi ([@GuoZhuoRan](https://github.com/GuoZhuoRan)).
+
+**A Unitree Go2 robot dog that trades ice-cold Cokes for instant photos.**
+
+## If You Only Have 60 Seconds
+
+1. Watch the demo: _publishing to YouTube — link coming shortly._
+2. Read the project: https://github.com/seifip/robodog-fetch
+3. Run it yourself (zero hardware — phone or laptop browser camera):
+   https://github.com/seifip/robodog-fetch#quickstart-run-it-yourself
+
+## One Sentence
+
+Fetch works a crowd like a tiny, soda-carrying street performer: a vision LLM
+"reads the room" on every camera frame and decides where to move, what to say, and
+when to snap the photo — all running as a single FastAPI + WebSocket server you can
+try from a phone browser **before any robot is involved**.
+
+```text
+camera frame (+depth)
+  -> vision LLM (OpenAI / Gemini)
+  -> decision (state, cmd_vel, line, photo?)
+  -> act (move Go2, speak, snap photo)
+  ^------------------ ~1s scan loop ------------------v
+```
+
+## What Matters
+
+- **DimOS is the runtime.** Fetch reuses the DimOS teleop web pattern — a FastAPI
+  server serves an HTTPS phone UI, the phone streams camera frames over a WebSocket,
+  and the server returns motion / speech / photo decisions. On the dog it drives
+  DimOS Unitree WebRTC control + LiDAR.
+- **Fetch is the behavior layer.** The vision-LLM decision loop, persona, approach/
+  trade/photo state machine, and voice all sit on top of DimOS primitives.
+- **Three camera sources, one loop:** phone browser camera (zero hardware), Record3D
+  USB RGBD (real iPhone LiDAR depth), and a live Unitree Go2 over Wi-Fi.
+
+## What We Built
+
+- A vision-LLM that evaluates each frame and emits a structured decision: target,
+  `cmd_vel`, spoken line, and photo-framing readiness.
+- The full interaction flow: scan for a relaxed guest → obstacle-aware approach →
+  wave + personalized one-liner → "grab a Coke, pose" → snap the instant photo + dance.
+- A single-page phone UI (camera feed, previews, decision display, audio routing,
+  photo flow) backed by a FastAPI + WebSocket server.
+- Runtime-switchable **voice**: one-way TTS across Cartesia / Gemini Live / OpenAI,
+  plus an opt-in **two-way Gemini Live** conversation that drives the dog through
+  tool calls (`accept_offer`, `take_photo`, `celebrate`, `do_trick`, `stop_and_reset`).
+- Safety/privacy guardrails: humor constrained to visible, non-sensitive context;
+  LiDAR/depth-enforced `<4m` stop and obstacle avoidance.
+
+## Reviewer Map
+
+| Question | Open this |
+| --- | --- |
+| What is the demo? | _Publishing to YouTube — link coming shortly._ |
+| Where is the full source? | https://github.com/seifip/robodog-fetch |
+| How do I run it (no hardware)? | https://github.com/seifip/robodog-fetch#quickstart-run-it-yourself |
+| How does the decision loop work? | https://github.com/seifip/robodog-fetch#how-it-works-at-a-glance |
+| What's the DimOS integration? | https://github.com/seifip/robodog-fetch#built-on-dimos |
+| Where's the DimOS runtime? | https://github.com/dimensionalOS/dimos |
+
+## How to Run
+
+Zero-hardware path (phone or laptop browser camera), from the DimOS monorepo root:
+
+```bash
+python -m dimos.experimental.fetch.iphone_middleware --host 0.0.0.0 --port 8455
+```
+
+Open `https://127.0.0.1:8455/fetch` and tap **Record** to start the ~1-second scan
+loop. The full quickstart (Record3D USB and live Go2 paths, provider keys, and the
+voice modes) is in the project README.
+
+## Scope Boundary
+
+This PR is a hackathon **submission pointer**: the full source, demo video, and
+assets are hosted externally at https://github.com/seifip/robodog-fetch. It adds a
+single markdown file under `hackathon/` and **does not vendor Fetch into DimOS or
+modify any DimOS runtime code**. (Fetch is designed to live at
+`dimos/experimental/fetch/` in the monorepo; that vendoring is intentionally out of
+scope for this submission.)
+
+## Validation
+
+- `pytest -q` in the project repo is green — all provider calls are mocked, so there
+  are no real API calls (covers the 26 middleware tests plus the policy, conversation,
+  and TTS suites).
+- This submission adds markdown only; no DimOS code is touched.

From aed8d7ed72435fa664b8065ee63690aa54f5358b Mon Sep 17 00:00:00 2001
From: "Wenjie F." <wenjiefu8@gmail.com>
Date: Fri, 29 May 2026 00:27:21 +0800
Subject: [PATCH 2/8] Enrich Fetch submission README with technical detail

---
 hackathon/fetch/README.md | 63 +++++++++++++++++++++++++--------------
 1 file changed, 40 insertions(+), 23 deletions(-)

diff --git a/hackathon/fetch/README.md b/hackathon/fetch/README.md
index 3a37c8c387..9735b44826 100644
--- a/hackathon/fetch/README.md
+++ b/hackathon/fetch/README.md
@@ -7,7 +7,7 @@ Hackathon submission from **Team Pivot** — Philip Seifi ([@seifip](https://git
 ## If You Only Have 60 Seconds
 
 1. Watch the demo: _publishing to YouTube — link coming shortly._
-2. Read the project: https://github.com/seifip/robodog-fetch
+2. Full source: https://github.com/seifip/robodog-fetch
 3. Run it yourself (zero hardware — phone or laptop browser camera):
    https://github.com/seifip/robodog-fetch#quickstart-run-it-yourself
 
@@ -28,28 +28,44 @@ camera frame (+depth)
 
 ## What Matters
 
-- **DimOS is the runtime.** Fetch reuses the DimOS teleop web pattern — a FastAPI
-  server serves an HTTPS phone UI, the phone streams camera frames over a WebSocket,
-  and the server returns motion / speech / photo decisions. On the dog it drives
-  DimOS Unitree WebRTC control + LiDAR.
+- **DimOS is the runtime.** Fetch reuses the DimOS teleop web pattern (HTTPS phone
+  UI, WebSocket camera frames, motion/speech/photo decisions) and drives a **real
+  Unitree Go2 over DimOS's WebRTC stack** — with selectable connection modes
+  (`auto` / `local_ap` / `local_sta`) so it reaches the dog on its local-AP network
+  at `192.168.12.1` as well as standard Wi-Fi.
 - **Fetch is the behavior layer.** The vision-LLM decision loop, persona, approach/
   trade/photo state machine, and voice all sit on top of DimOS primitives.
-- **Three camera sources, one loop:** phone browser camera (zero hardware), Record3D
-  USB RGBD (real iPhone LiDAR depth), and a live Unitree Go2 over Wi-Fi.
+- **Real-time by design.** A ~1-second scan loop and low-latency speech (Cartesia
+  Sonic by default) keep the interaction feeling live, not turn-based.
 
 ## What We Built
 
-- A vision-LLM that evaluates each frame and emits a structured decision: target,
-  `cmd_vel`, spoken line, and photo-framing readiness.
-- The full interaction flow: scan for a relaxed guest → obstacle-aware approach →
-  wave + personalized one-liner → "grab a Coke, pose" → snap the instant photo + dance.
-- A single-page phone UI (camera feed, previews, decision display, audio routing,
-  photo flow) backed by a FastAPI + WebSocket server.
+- A vision-LLM that turns each frame into a structured decision: target, `cmd_vel`,
+  spoken line, and photo-framing readiness.
+- The full interaction flow: scan for a relaxed guest → obstacle-aware approach
+  (turning to keep the subject in frame) → wave + personalized one-liner →
+  "grab a Coke, pose" → snap the instant photo + dance.
+- **Instant photo → the guest's phone.** Shots save locally and can mirror to an
+  iCloud or Google Drive folder (`FETCH_PHOTO_MIRROR_DIRS`) so the demo phone syncs
+  the picture seconds after it's taken.
+- A single-page phone UI (camera feed, previews, live decision display, audio
+  routing, photo flow) backed by FastAPI + WebSocket.
 - Runtime-switchable **voice**: one-way TTS across Cartesia / Gemini Live / OpenAI,
-  plus an opt-in **two-way Gemini Live** conversation that drives the dog through
-  tool calls (`accept_offer`, `take_photo`, `celebrate`, `do_trick`, `stop_and_reset`).
-- Safety/privacy guardrails: humor constrained to visible, non-sensitive context;
-  LiDAR/depth-enforced `<4m` stop and obstacle avoidance.
+  plus opt-in **two-way Gemini Live** conversation that drives the dog through tool
+  calls (`accept_offer`, `take_photo`, `celebrate`, `do_trick`, `stop_and_reset`).
+- Safety/privacy guardrails: humor limited to visible, non-sensitive context;
+  LiDAR/depth-enforced `<4 m` stop and obstacle avoidance.
+
+## Under the Hood (for the technically curious)
+
+| Piece | What it does |
+| --- | --- |
+| **Camera sources** | One loop, three inputs: phone browser camera (zero hardware), Record3D USB RGBD (real iPhone LiDAR depth), and a live Go2 over WebRTC. |
+| **Vision policy** | `FetchPolicy.analyze_frame()` sends image + prompt to the vision LLM and normalizes the JSON into a decision dict (default OpenAI `gpt-5-mini`; `--vision-provider gemini` for `gemini-3.5-flash`). |
+| **Go2 transport** | DimOS Unitree WebRTC; `--robot-connection-method auto\|local_ap\|local_sta` (default `local_ap`) + `--robot-ip` select how to reach the dog. |
+| **Voice** | Provider-switchable TTS at runtime (no restart) plus an optional persistent Gemini Live session with server-side VAD / barge-in. |
+| **Photos** | Capture to `static/captures/`, optionally mirrored to iCloud/Drive folders via `FETCH_PHOTO_MIRROR_DIRS`. |
+| **Tests** | **76 passing tests**, all providers mocked — no live API calls needed to review (policy, middleware routes, TTS, conversation tools, photo saving). |
 
 ## Reviewer Map
 
@@ -71,8 +87,9 @@ python -m dimos.experimental.fetch.iphone_middleware --host 0.0.0.0 --port 8455
 ```
 
 Open `https://127.0.0.1:8455/fetch` and tap **Record** to start the ~1-second scan
-loop. The full quickstart (Record3D USB and live Go2 paths, provider keys, and the
-voice modes) is in the project README.
+loop. To drive a real dog, add `--robot-ip 192.168.12.1 --robot-connection-method
+local_ap`. The full quickstart (Record3D USB, live Go2, provider keys, and voice
+modes) is in the project README.
 
 ## Scope Boundary
 
@@ -81,11 +98,11 @@ assets are hosted externally at https://github.com/seifip/robodog-fetch. It adds
 single markdown file under `hackathon/` and **does not vendor Fetch into DimOS or
 modify any DimOS runtime code**. (Fetch is designed to live at
 `dimos/experimental/fetch/` in the monorepo; that vendoring is intentionally out of
-scope for this submission.)
+scope for this pointer.)
 
 ## Validation
 
-- `pytest -q` in the project repo is green — all provider calls are mocked, so there
-  are no real API calls (covers the 26 middleware tests plus the policy, conversation,
-  and TTS suites).
+- `pytest -q` in the project repo: **76 passed**, providers mocked (no real API
+  calls) — covers policy normalization, middleware routes, TTS, conversation tools,
+  and photo saving.
 - This submission adds markdown only; no DimOS code is touched.

From 616be61d3c8fee30aba4eeb7393c05a4a7001dae Mon Sep 17 00:00:00 2001
From: "Wenjie F." <wenjiefu8@gmail.com>
Date: Fri, 29 May 2026 00:31:01 +0800
Subject: [PATCH 3/8] Note branded photos and Gemini 2.5 Flash-Lite demo config

---
 hackathon/fetch/README.md | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/hackathon/fetch/README.md b/hackathon/fetch/README.md
index 9735b44826..912d6307cf 100644
--- a/hackathon/fetch/README.md
+++ b/hackathon/fetch/README.md
@@ -35,8 +35,9 @@ camera frame (+depth)
   at `192.168.12.1` as well as standard Wi-Fi.
 - **Fetch is the behavior layer.** The vision-LLM decision loop, persona, approach/
   trade/photo state machine, and voice all sit on top of DimOS primitives.
-- **Real-time by design.** A ~1-second scan loop and low-latency speech (Cartesia
-  Sonic by default) keep the interaction feeling live, not turn-based.
+- **Real-time by design.** A ~1-second scan loop, a low-latency vision model (the
+  live demo runs on **Gemini 2.5 Flash-Lite**), and fast speech (Cartesia Sonic by
+  default) keep the interaction feeling live, not turn-based.
 
 ## What We Built
 
@@ -45,9 +46,10 @@ camera frame (+depth)
 - The full interaction flow: scan for a relaxed guest → obstacle-aware approach
   (turning to keep the subject in frame) → wave + personalized one-liner →
   "grab a Coke, pose" → snap the instant photo + dance.
-- **Instant photo → the guest's phone.** Shots save locally and can mirror to an
-  iCloud or Google Drive folder (`FETCH_PHOTO_MIRROR_DIRS`) so the demo phone syncs
-  the picture seconds after it's taken.
+- **Instant, Fetch-branded photo → the guest's phone.** Each capture is composited
+  with the Fetch logo in a Polaroid-style branded photo view (with a print sound),
+  saved locally and optionally mirrored to an iCloud or Google Drive folder
+  (`FETCH_PHOTO_MIRROR_DIRS`) so the demo phone syncs it seconds after it's taken.
 - A single-page phone UI (camera feed, previews, live decision display, audio
   routing, photo flow) backed by FastAPI + WebSocket.
 - Runtime-switchable **voice**: one-way TTS across Cartesia / Gemini Live / OpenAI,
@@ -61,10 +63,10 @@ camera frame (+depth)
 | Piece | What it does |
 | --- | --- |
 | **Camera sources** | One loop, three inputs: phone browser camera (zero hardware), Record3D USB RGBD (real iPhone LiDAR depth), and a live Go2 over WebRTC. |
-| **Vision policy** | `FetchPolicy.analyze_frame()` sends image + prompt to the vision LLM and normalizes the JSON into a decision dict (default OpenAI `gpt-5-mini`; `--vision-provider gemini` for `gemini-3.5-flash`). |
+| **Vision policy** | `FetchPolicy.analyze_frame()` sends image + prompt to a provider-selectable vision LLM (OpenAI or Gemini) and normalizes the JSON into a decision dict; the live demo runs on **Gemini 2.5 Flash-Lite** for the lowest latency. |
 | **Go2 transport** | DimOS Unitree WebRTC; `--robot-connection-method auto\|local_ap\|local_sta` (default `local_ap`) + `--robot-ip` select how to reach the dog. |
 | **Voice** | Provider-switchable TTS at runtime (no restart) plus an optional persistent Gemini Live session with server-side VAD / barge-in. |
-| **Photos** | Capture to `static/captures/`, optionally mirrored to iCloud/Drive folders via `FETCH_PHOTO_MIRROR_DIRS`. |
+| **Photos** | Fetch-branded capture (logo composited via `<canvas>`) to `static/captures/`, optionally mirrored to iCloud/Drive folders via `FETCH_PHOTO_MIRROR_DIRS`. |
 | **Tests** | **76 passing tests**, all providers mocked — no live API calls needed to review (policy, middleware routes, TTS, conversation tools, photo saving). |
 
 ## Reviewer Map

From d6b078c3f18ca34b23d19d2d7eb388b483ef9aac Mon Sep 17 00:00:00 2001
From: "Wenjie F." <wenjiefu8@gmail.com>
Date: Fri, 29 May 2026 00:48:15 +0800
Subject: [PATCH 4/8] Add published YouTube demo link

---
 hackathon/fetch/README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hackathon/fetch/README.md b/hackathon/fetch/README.md
index 912d6307cf..52cc11e404 100644
--- a/hackathon/fetch/README.md
+++ b/hackathon/fetch/README.md
@@ -6,7 +6,7 @@ Hackathon submission from **Team Pivot** — Philip Seifi ([@seifip](https://git
 
 ## If You Only Have 60 Seconds
 
-1. Watch the demo: _publishing to YouTube — link coming shortly._
+1. Watch the demo: https://www.youtube.com/watch?v=8hHYE1239wg
 2. Full source: https://github.com/seifip/robodog-fetch
 3. Run it yourself (zero hardware — phone or laptop browser camera):
    https://github.com/seifip/robodog-fetch#quickstart-run-it-yourself
@@ -73,7 +73,7 @@ camera frame (+depth)
 
 | Question | Open this |
 | --- | --- |
-| What is the demo? | _Publishing to YouTube — link coming shortly._ |
+| What is the demo? | https://www.youtube.com/watch?v=8hHYE1239wg |
 | Where is the full source? | https://github.com/seifip/robodog-fetch |
 | How do I run it (no hardware)? | https://github.com/seifip/robodog-fetch#quickstart-run-it-yourself |
 | How does the decision loop work? | https://github.com/seifip/robodog-fetch#how-it-works-at-a-glance |

From 16f94b8a36ead0c19b252d1f73df2eb537c526a2 Mon Sep 17 00:00:00 2001
From: "Wenjie F." <wenjiefu8@gmail.com>
Date: Fri, 29 May 2026 00:49:21 +0800
Subject: [PATCH 5/8] Reframe reviewer path as 90 seconds

---
 hackathon/fetch/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hackathon/fetch/README.md b/hackathon/fetch/README.md
index 52cc11e404..00fa90d5d0 100644
--- a/hackathon/fetch/README.md
+++ b/hackathon/fetch/README.md
@@ -4,7 +4,7 @@ Hackathon submission from **Team Pivot** — Philip Seifi ([@seifip](https://git
 
 **A Unitree Go2 robot dog that trades ice-cold Cokes for instant photos.**
 
-## If You Only Have 60 Seconds
+## If You Only Have 90 Seconds
 
 1. Watch the demo: https://www.youtube.com/watch?v=8hHYE1239wg
 2. Full source: https://github.com/seifip/robodog-fetch

From b5fce1b1e6221008af7c593b3d569dc304d091d6 Mon Sep 17 00:00:00 2001
From: "Wenjie F." <wenjiefu8@gmail.com>
Date: Fri, 29 May 2026 01:43:25 +0800
Subject: [PATCH 6/8] Add why-a-beach, UX latency, and roadmap/business
 sections

---
 hackathon/fetch/README.md | 29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/hackathon/fetch/README.md b/hackathon/fetch/README.md
index 00fa90d5d0..0e530ce3a9 100644
--- a/hackathon/fetch/README.md
+++ b/hackathon/fetch/README.md
@@ -35,9 +35,21 @@ camera frame (+depth)
   at `192.168.12.1` as well as standard Wi-Fi.
 - **Fetch is the behavior layer.** The vision-LLM decision loop, persona, approach/
   trade/photo state machine, and voice all sit on top of DimOS primitives.
-- **Real-time by design.** A ~1-second scan loop, a low-latency vision model (the
-  live demo runs on **Gemini 2.5 Flash-Lite**), and fast speech (Cartesia Sonic by
-  default) keep the interaction feeling live, not turn-based.
+- **Real-time by design.** We benchmarked round-trip latency across vision and speech
+  models (`scripts/latency_bench.py`) and run the fastest combo — **Gemini 2.5
+  Flash-Lite** vision + **Cartesia Sonic** speech; camera frames are downscaled
+  (≤640 px) before analysis, and the whole scan loop lands around one second.
+
+## Why a beach?
+
+Quadrupeds earn their keep on terrain wheels can't handle, so we built Fetch around
+that. We chose **sand** for a form-factor reason: the Go2's camera sits low and looks
+*up* at standing people, but on a beach people sit or lie on the sand — dropping into
+the dog's natural eye-line and making the interaction feel natural. And it's feasible
+today: quadrupeds already run on sand
+([RaiBo](https://techxplore.com/news/2023-01-raibo-versatile-robo-dog-sandy-beach.html)
+at 3 m/s) and [sand-walking foot
+adaptations](https://www.popsci.com/technology/robot-moose/) cut foot sinkage ~46%.
 
 ## What We Built
 
@@ -93,6 +105,17 @@ loop. To drive a real dog, add `--robot-ip 192.168.12.1 --robot-connection-metho
 local_ap`. The full quickstart (Record3D USB, live Go2, provider keys, and voice
 modes) is in the project README.
 
+## What's Next
+
+- **Sense the trade.** The Go2 EDU's [foot-force sensors](https://www.unitree.com/go2/foot/)
+  could detect a Coke lifted from the back via the change in total load — closing the
+  loop without the camera's framing check.
+- **Real sand.** Fit sand-walking foot adaptations for an outdoor beach deployment.
+
+**The bigger picture:** Fetch is an autonomous **brand ambassador** and **mobile
+vendor** (Coca-Cola here), pointing toward fleets of autonomous robot-dog vendors that
+self-resupply at beachside bars/vendors or autonomous resupply stations.
+
 ## Scope Boundary
 
 This PR is a hackathon **submission pointer**: the full source, demo video, and

From b39f7d4e2d53bd84e9ffe99b6705b9a3e0c8797f Mon Sep 17 00:00:00 2001
From: "Wenjie F." <wenjiefu8@gmail.com>
Date: Fri, 29 May 2026 01:52:09 +0800
Subject: [PATCH 7/8] Promote business framing to early 'The opportunity'
 section

---
 hackathon/fetch/README.md | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/hackathon/fetch/README.md b/hackathon/fetch/README.md
index 0e530ce3a9..856fadab4b 100644
--- a/hackathon/fetch/README.md
+++ b/hackathon/fetch/README.md
@@ -26,6 +26,14 @@ camera frame (+depth)
   ^------------------ ~1s scan loop ------------------v
 ```
 
+## The opportunity
+
+Fetch is an autonomous **brand ambassador** and **mobile vendor** — here for Coca-Cola
+— that hands out product, creates a memorable branded moment, and walks away with the
+guest's photo. The longer-term vision: fleets of autonomous robot-dog vendors that roam
+the beach and **self-resupply** at beachside bars and vendors, or at dedicated
+autonomous resupply stations.
+
 ## What Matters
 
 - **DimOS is the runtime.** Fetch reuses the DimOS teleop web pattern (HTTPS phone
@@ -112,10 +120,6 @@ modes) is in the project README.
   loop without the camera's framing check.
 - **Real sand.** Fit sand-walking foot adaptations for an outdoor beach deployment.
 
-**The bigger picture:** Fetch is an autonomous **brand ambassador** and **mobile
-vendor** (Coca-Cola here), pointing toward fleets of autonomous robot-dog vendors that
-self-resupply at beachside bars/vendors or autonomous resupply stations.
-
 ## Scope Boundary
 
 This PR is a hackathon **submission pointer**: the full source, demo video, and

From a8f545b9d54de4f043f894b95757bf4f6af136d4 Mon Sep 17 00:00:00 2001
From: "Wenjie F." <wenjiefu8@gmail.com>
Date: Fri, 29 May 2026 10:32:06 +0800
Subject: [PATCH 8/8] Document camera/LiDAR capture, iCloud/Drive sync, and
 Xiaomi-printer print pipeline

---
 hackathon/fetch/README.md | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/hackathon/fetch/README.md b/hackathon/fetch/README.md
index 856fadab4b..7e8ffa9930 100644
--- a/hackathon/fetch/README.md
+++ b/hackathon/fetch/README.md
@@ -66,10 +66,11 @@ adaptations](https://www.popsci.com/technology/robot-moose/) cut foot sinkage ~4
 - The full interaction flow: scan for a relaxed guest → obstacle-aware approach
   (turning to keep the subject in frame) → wave + personalized one-liner →
   "grab a Coke, pose" → snap the instant photo + dance.
-- **Instant, Fetch-branded photo → the guest's phone.** Each capture is composited
-  with the Fetch logo in a Polaroid-style branded photo view (with a print sound),
-  saved locally and optionally mirrored to an iCloud or Google Drive folder
-  (`FETCH_PHOTO_MIRROR_DIRS`) so the demo phone syncs it seconds after it's taken.
+- **Instant photo → the guest's hands.** Captured from the Go2's camera + LiDAR and
+  composited with the Fetch logo (Polaroid-style branded view + print sound), the shot —
+  plus our demo recordings — syncs to iCloud / Google Drive via mirror folders
+  (`FETCH_PHOTO_MIRROR_DIRS`); at the event, a synced phone sends it to a **Xiaomi
+  mini-printer** through the printer's app for an instant physical print.
 - A single-page phone UI (camera feed, previews, live decision display, audio
   routing, photo flow) backed by FastAPI + WebSocket.
 - Runtime-switchable **voice**: one-way TTS across Cartesia / Gemini Live / OpenAI,
@@ -86,7 +87,7 @@ adaptations](https://www.popsci.com/technology/robot-moose/) cut foot sinkage ~4
 | **Vision policy** | `FetchPolicy.analyze_frame()` sends image + prompt to a provider-selectable vision LLM (OpenAI or Gemini) and normalizes the JSON into a decision dict; the live demo runs on **Gemini 2.5 Flash-Lite** for the lowest latency. |
 | **Go2 transport** | DimOS Unitree WebRTC; `--robot-connection-method auto\|local_ap\|local_sta` (default `local_ap`) + `--robot-ip` select how to reach the dog. |
 | **Voice** | Provider-switchable TTS at runtime (no restart) plus an optional persistent Gemini Live session with server-side VAD / barge-in. |
-| **Photos** | Fetch-branded capture (logo composited via `<canvas>`) to `static/captures/`, optionally mirrored to iCloud/Drive folders via `FETCH_PHOTO_MIRROR_DIRS`. |
+| **Photos** | Fetch-branded capture (logo composited via `<canvas>`) to `static/captures/`, mirrored to iCloud/Drive folders via `FETCH_PHOTO_MIRROR_DIRS`; a synced phone then prints it on a Xiaomi mini-printer via the printer's app. |
 | **Tests** | **76 passing tests**, all providers mocked — no live API calls needed to review (policy, middleware routes, TTS, conversation tools, photo saving). |
 
 ## Reviewer Map