Skip to content

qatd/voxearch

Repository files navigation

Voxearch

A local CLI tool that downloads audio from a URL, transcribes it on-device, and stores the result in a searchable database.

Runs entirely locally — no cloud APIs or others

primarily macOS-compatible for now, but support will expand in the future

Features

  • Download audio from YouTube, podcast RSS feeds, and yt-dlp-compatible source
  • Transcribe using mlx-whisper (for apple silicon)
  • Two search modes on transcripts:
    • Keyword search — exact/full-text via SQLite FTS5
    • Semantic search — meaning-based via Qdrant vector database

Requirements

  • macOS, Apple Silicon (M-series) - (support will expand in the futur)
  • Python 3.11+
  • uv (package manager)
  • ffmpeg
brew install ffmpeg

Setup

# Create virtualenv and install dependencies
uv venv --python 3.11
source .venv/bin/activate
uv pip install -r requirements.txt

Usage

Ingest an audio URL

# Auto-detect language
uv run python main.py ingest "https://www.youtube.com/watch?v=..."

# Specify language (faster, more accurate)
uv run python main.py ingest "https://..." --language fr

# Re-process a URL already in the database
uv run python main.py ingest "https://..." --force

Search transcripts

# Keyword / full-text search
uv run python main.py search keyword "budget deficit"

# Semantic search (find related ideas, not just exact words)
uv run python main.py search semantic "financial planning strategies"

# Limit results (default: 10)
uv run python main.py search keyword "climate" --limit 5

Results show the source title, timestamp range, and the matching transcript segment.

Architecture

Audio URL
  → yt-dlp          download + convert to MP3
  → mlx-whisper     transcribe → timestamped segments
  → SQLite (FTS5)   store segments for keyword search
  → Qdrant          store embeddings for semantic search
Layer Tool
Audio download yt-dlp + ffmpeg
Transcription mlx-whisper (Apple MLX)
Keyword search SQLite FTS5
Semantic search Qdrant
Embeddings sentence-transformers (multilingual MiniLM)

Notes

  • The Whisper and embedding models are downloaded on first run and cached (~5gb overall)
  • Audio files and model weights are gitignored — they must be set up locally.

About

a tool to download, transcribe and perform semantic/keyword searches on audio files, all locally

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages