odin - Object Detection Interpretive Narrator

Odin captures your webcam, runs Ultralytics YOLO object detection, estimates rough distance from a single camera, infers motion and pose-related cues (e.g. possible waving, hands near a surface), and speaks a natural-language summary with gTTS + pygame. Optional Purdue GenAI Studio can polish the same structured context over an OpenAI-compatible HTTP API.

Narration is deduplicated: it only speaks again when the scene “fingerprint” changes (objects, position, distance bucket, motion/pose hints, or lighting)—not on a fixed repeat loop.

Features

Area	What it does
Detection	`yolo26n.pt` (auto-downloaded on first run). Configurable via `YOLO_MODEL`.
Scene interpretation	scikit-image luminance, edges, and quadrant brightness → text briefing for GenAI
Depth (approximate)	Maps bounding-box size to spoken ranges like “about three feet away.” Tune `odin/depth.py` for your webcam if needed.
Motion	Frame differencing on a downscaled gray stream for person ROI movement cues.
Pose	Optional `yolo11n-pose.pt` (throttled) for arm/hand hints.
Speech	gTTS + pygame with a queue so updates are not dropped.
Purdue GenAI	If `PURDUE_GENAI_API_KEY` is set, structured sensor lines + local draft are sent to GenAI Studio. Without a key, local narration still runs.

Project layout

main.py — Camera loop, YOLO, scikit-image layout, TTS, optional GenAI.
odin/ — depth.py, motion.py, pose_hints.py, narration.py, narrative_builder.py.
tests/ — python -m unittest discover -s tests -v

Setup

Python 3.10+ recommended.
Install dependencies:
```
pip install -r requirements.txt
```
Optional — Purdue GenAI Studio

API key: GenAI Studio UI → avatar → Settings → Account → API Keys.

Copy .env.example to .env:
```
PURDUE_GENAI_API_KEY=your-key-here
```
Or in PowerShell:
```
$env:PURDUE_GENAI_API_KEY = "your-key"
```
Optional: PURDUE_GENAI_URL, PURDUE_GENAI_MODEL (default llama3.1:latest).

Usage

python main.py

q — Quit
l — Change TTS language (e.g. en, es)

Adjust YOLO_CONFIDENCE, CAMERA_INDEX, POSE_EVERY_N_FRAMES at the top of main.py.

Limitations

Distance is a monocular heuristic, not a depth sensor.
Gesture hints are best-effort and can false-positive in clutter.
gTTS needs network access to synthesize speech.

Dependencies

See requirements.txt: ultralytics, opencv-python, numpy, scikit-image, requests, python-dotenv, gTTS, pygame.

Maninder Kaur

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
__pycache__		__pycache__
odin		odin
tests		tests
.env.example		.env.example
README.md		README.md
main.py		main.py
name.names		name.names
requirements.txt		requirements.txt
yolo26n.pt		yolo26n.pt
yolov3.cfg		yolov3.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

odin - Object Detection Interpretive Narrator

Features

Project layout

Setup

Usage

Limitations

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

odin - Object Detection Interpretive Narrator

Features

Project layout

Setup

Usage

Limitations

Dependencies

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages