Skip to content

faze79/WhisperTranscriber

Repository files navigation

Whisper Transcriber

A self-contained Windows desktop app for offline audio transcription, powered by faster-whisper (CTranslate2). Built with .NET 10 + Windows Forms.

It downloads, installs and manages Python and all its dependencies for you. No setup required beyond launching the .exe.

Whisper Transcriber screenshot

Features

  • Zero-setup install — first launch downloads embedded Python + faster-whisper + CUDA libraries automatically
  • NVIDIA GPU acceleration — auto-detects card, picks the right compute_type (special-cased for RTX 50-series Blackwell), with CPU fallback
  • All Whisper modelstiny, base, small, medium, large-v2, large-v3, large-v3-turbo, turbo
  • Speaker diarization — optional, via pyannote.audio; guides you through the HuggingFace token + gated-model acceptance flow with clickable links
  • Initial prompt — pass a vocabulary hint (proper nouns, jargon) to dramatically improve accuracy
  • 10-language UI — Italian, English, Spanish, French, German, Portuguese, Russian, Chinese, Japanese, Arabic
  • Real-time output — segments appear as they are produced, with optional timestamps
  • Export — save as .txt or .srt (SubRip subtitles)
  • Persisted settings — remembers your last model, language, prompt and UI choices
  • Drag & drop — drop an audio file onto the window
  • "Open with…" — pass the audio path as the first CLI argument
  • Powerful CLI — fully scriptable, including auto-start and auto-quit
  • Single-file portable .exe — ~50 MB, no installer

Requirements

  • Windows 10/11 (x64)
  • ~1 GB free disk for Python + faster-whisper, plus model size (75 MB – 3 GB)
  • Optional: NVIDIA GPU with CUDA 12 support for hardware acceleration

Installation

  1. Download WhisperTranscriber.exe from the latest release
  2. Place it anywhere and double-click

The first launch will download Python (~25 MB), faster-whisper, and CUDA libraries (~500 MB) into %LOCALAPPDATA%\WhisperTranscriber\. This takes a few minutes; subsequent launches are instant.

Usage

GUI

Launch the app, pick an audio file (browse or drag & drop), choose a model, then click Transcribe. Output appears live; save to .txt or .srt when done.

CLI

WhisperTranscriber.exe [audio-file] [options]

  -m, --model <name>      Model (tiny, base, small, medium, large-v3, turbo, ...)
  -l, --language <code>   Audio language (it, en, fr, de, es, pt, ru, zh, ja, ar, auto)
  -p, --prompt <text>     Initial prompt (vocabulary hint)
  -d, --diarize           Identify speakers (requires HuggingFace token)
  -o, --output <path>     Save result to file (.txt or .srt)
      --no-timestamps     Output without timestamps (alias --plain)
      --timestamps        Force timestamp inclusion
  -s, --start             Start transcription automatically
  -q, --quit              Close app on completion (implies --start)
  -h, --help              Show help

Examples:

WhisperTranscriber.exe "interview.aac" -m large-v3-turbo -l it -s

WhisperTranscriber.exe "lecture.mp3" -m medium -l en -p "Riemann, eigenvalue, Hilbert" -o "lecture.txt" -q

WhisperTranscriber.exe "meeting.wav" -m large-v3 -d -o "meeting.srt" -q

"Open with…" integration

Right-click an audio file in Explorer → Open with → Choose another app → browse to WhisperTranscriber.exe. The file is preselected when the app opens.

Speaker diarization setup

When you enable Identify speakers for the first time:

  1. The app asks for a free HuggingFace token (saved locally, never sent anywhere else)
  2. You must accept the user conditions on all three pyannote model pages:
  3. If a model has not been accepted yet, the app will pop up a dialog with a clickable link straight to that model page

Once configured, every transcription with diarize on prefixes each segment with SPEAKER_00:, SPEAKER_01:, etc.

Troubleshooting

All operations write to %LOCALAPPDATA%\WhisperTranscriber\log.txt. Open the data folder via the Open folder button to inspect logs, clear cached models, or do a full reset (delete the whole folder).

Common issues:

  • Stuck on first launch → check log.txt; usually a network problem during pip install
  • cublas64_12.dll not found → likely you have an NVIDIA GPU but old drivers; update or fall back to CPU
  • Pipeline is None during diarization → token invalid or you haven't accepted all the gated models above

Built with

License

MIT © 2026 Davide Fasolo

About

Self-contained Windows Forms app for offline audio transcription with faster-whisper. Auto-installs Python+CUDA, NVIDIA GPU acceleration, speaker diarization, 10-language UI.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages