A self-contained Windows desktop app for offline audio transcription, powered by faster-whisper (CTranslate2). Built with .NET 10 + Windows Forms.
It downloads, installs and manages Python and all its dependencies for you. No setup required beyond launching the .exe.
- Zero-setup install — first launch downloads embedded Python + faster-whisper + CUDA libraries automatically
- NVIDIA GPU acceleration — auto-detects card, picks the right
compute_type(special-cased for RTX 50-series Blackwell), with CPU fallback - All Whisper models —
tiny,base,small,medium,large-v2,large-v3,large-v3-turbo,turbo - Speaker diarization — optional, via pyannote.audio; guides you through the HuggingFace token + gated-model acceptance flow with clickable links
- Initial prompt — pass a vocabulary hint (proper nouns, jargon) to dramatically improve accuracy
- 10-language UI — Italian, English, Spanish, French, German, Portuguese, Russian, Chinese, Japanese, Arabic
- Real-time output — segments appear as they are produced, with optional timestamps
- Export — save as
.txtor.srt(SubRip subtitles) - Persisted settings — remembers your last model, language, prompt and UI choices
- Drag & drop — drop an audio file onto the window
- "Open with…" — pass the audio path as the first CLI argument
- Powerful CLI — fully scriptable, including auto-start and auto-quit
- Single-file portable .exe — ~50 MB, no installer
- Windows 10/11 (x64)
- ~1 GB free disk for Python + faster-whisper, plus model size (75 MB – 3 GB)
- Optional: NVIDIA GPU with CUDA 12 support for hardware acceleration
- Download
WhisperTranscriber.exefrom the latest release - Place it anywhere and double-click
The first launch will download Python (~25 MB), faster-whisper, and CUDA libraries (~500 MB) into %LOCALAPPDATA%\WhisperTranscriber\. This takes a few minutes; subsequent launches are instant.
Launch the app, pick an audio file (browse or drag & drop), choose a model, then click Transcribe. Output appears live; save to .txt or .srt when done.
WhisperTranscriber.exe [audio-file] [options]
-m, --model <name> Model (tiny, base, small, medium, large-v3, turbo, ...)
-l, --language <code> Audio language (it, en, fr, de, es, pt, ru, zh, ja, ar, auto)
-p, --prompt <text> Initial prompt (vocabulary hint)
-d, --diarize Identify speakers (requires HuggingFace token)
-o, --output <path> Save result to file (.txt or .srt)
--no-timestamps Output without timestamps (alias --plain)
--timestamps Force timestamp inclusion
-s, --start Start transcription automatically
-q, --quit Close app on completion (implies --start)
-h, --help Show help
Examples:
WhisperTranscriber.exe "interview.aac" -m large-v3-turbo -l it -s
WhisperTranscriber.exe "lecture.mp3" -m medium -l en -p "Riemann, eigenvalue, Hilbert" -o "lecture.txt" -q
WhisperTranscriber.exe "meeting.wav" -m large-v3 -d -o "meeting.srt" -qRight-click an audio file in Explorer → Open with → Choose another app → browse to WhisperTranscriber.exe. The file is preselected when the app opens.
When you enable Identify speakers for the first time:
- The app asks for a free HuggingFace token (saved locally, never sent anywhere else)
- You must accept the user conditions on all three pyannote model pages:
- If a model has not been accepted yet, the app will pop up a dialog with a clickable link straight to that model page
Once configured, every transcription with diarize on prefixes each segment with SPEAKER_00:, SPEAKER_01:, etc.
All operations write to %LOCALAPPDATA%\WhisperTranscriber\log.txt. Open the data folder via the Open folder button to inspect logs, clear cached models, or do a full reset (delete the whole folder).
Common issues:
- Stuck on first launch → check
log.txt; usually a network problem during pip install cublas64_12.dllnot found → likely you have an NVIDIA GPU but old drivers; update or fall back to CPUPipeline is Noneduring diarization → token invalid or you haven't accepted all the gated models above
- .NET 10 + Windows Forms
- faster-whisper (CTranslate2 backend)
- OpenAI Whisper (models)
- pyannote.audio (diarization)
- Embedded Python 3.11
MIT © 2026 Davide Fasolo
