A local CLI tool that downloads audio from a URL, transcribes it on-device, and stores the result in a searchable database.
Runs entirely locally — no cloud APIs or others
primarily macOS-compatible for now, but support will expand in the future
- Download audio from YouTube, podcast RSS feeds, and yt-dlp-compatible source
- Transcribe using
mlx-whisper(for apple silicon) - Two search modes on transcripts:
- Keyword search — exact/full-text via SQLite FTS5
- Semantic search — meaning-based via Qdrant vector database
- macOS, Apple Silicon (M-series) - (support will expand in the futur)
- Python 3.11+
- uv (package manager)
- ffmpeg
brew install ffmpeg# Create virtualenv and install dependencies
uv venv --python 3.11
source .venv/bin/activate
uv pip install -r requirements.txt# Auto-detect language
uv run python main.py ingest "https://www.youtube.com/watch?v=..."
# Specify language (faster, more accurate)
uv run python main.py ingest "https://..." --language fr
# Re-process a URL already in the database
uv run python main.py ingest "https://..." --force# Keyword / full-text search
uv run python main.py search keyword "budget deficit"
# Semantic search (find related ideas, not just exact words)
uv run python main.py search semantic "financial planning strategies"
# Limit results (default: 10)
uv run python main.py search keyword "climate" --limit 5Results show the source title, timestamp range, and the matching transcript segment.
Audio URL
→ yt-dlp download + convert to MP3
→ mlx-whisper transcribe → timestamped segments
→ SQLite (FTS5) store segments for keyword search
→ Qdrant store embeddings for semantic search
| Layer | Tool |
|---|---|
| Audio download | yt-dlp + ffmpeg |
| Transcription | mlx-whisper (Apple MLX) |
| Keyword search | SQLite FTS5 |
| Semantic search | Qdrant |
| Embeddings | sentence-transformers (multilingual MiniLM) |
- The Whisper and embedding models are downloaded on first run and cached (~5gb overall)
- Audio files and model weights are gitignored — they must be set up locally.