Whisper Transcriber

A self-contained Windows desktop app for offline audio transcription, powered by faster-whisper (CTranslate2). Built with .NET 10 + Windows Forms.

It downloads, installs and manages Python and all its dependencies for you. No setup required beyond launching the .exe.

Features

Zero-setup install — first launch downloads embedded Python + faster-whisper + CUDA libraries automatically
NVIDIA GPU acceleration — auto-detects card, picks the right compute_type (special-cased for RTX 50-series Blackwell), with CPU fallback
All Whisper models — tiny, base, small, medium, large-v2, large-v3, large-v3-turbo, turbo
Speaker diarization — optional, via pyannote.audio; guides you through the HuggingFace token + gated-model acceptance flow with clickable links
Initial prompt — pass a vocabulary hint (proper nouns, jargon) to dramatically improve accuracy
10-language UI — Italian, English, Spanish, French, German, Portuguese, Russian, Chinese, Japanese, Arabic
Real-time output — segments appear as they are produced, with optional timestamps
Export — save as .txt or .srt (SubRip subtitles)
Persisted settings — remembers your last model, language, prompt and UI choices
Drag & drop — drop an audio file onto the window
"Open with…" — pass the audio path as the first CLI argument
Powerful CLI — fully scriptable, including auto-start and auto-quit
Single-file portable .exe — ~50 MB, no installer

Requirements

Windows 10/11 (x64)
~1 GB free disk for Python + faster-whisper, plus model size (75 MB – 3 GB)
Optional: NVIDIA GPU with CUDA 12 support for hardware acceleration

Installation

Download WhisperTranscriber.exe from the latest release
Place it anywhere and double-click

The first launch will download Python (~25 MB), faster-whisper, and CUDA libraries (~500 MB) into %LOCALAPPDATA%\WhisperTranscriber\. This takes a few minutes; subsequent launches are instant.

Usage

GUI

Launch the app, pick an audio file (browse or drag & drop), choose a model, then click Transcribe. Output appears live; save to .txt or .srt when done.

CLI

WhisperTranscriber.exe [audio-file] [options]

  -m, --model <name>      Model (tiny, base, small, medium, large-v3, turbo, ...)
  -l, --language <code>   Audio language (it, en, fr, de, es, pt, ru, zh, ja, ar, auto)
  -p, --prompt <text>     Initial prompt (vocabulary hint)
  -d, --diarize           Identify speakers (requires HuggingFace token)
  -o, --output <path>     Save result to file (.txt or .srt)
      --no-timestamps     Output without timestamps (alias --plain)
      --timestamps        Force timestamp inclusion
  -s, --start             Start transcription automatically
  -q, --quit              Close app on completion (implies --start)
  -h, --help              Show help

Examples:

WhisperTranscriber.exe "interview.aac" -m large-v3-turbo -l it -s

WhisperTranscriber.exe "lecture.mp3" -m medium -l en -p "Riemann, eigenvalue, Hilbert" -o "lecture.txt" -q

WhisperTranscriber.exe "meeting.wav" -m large-v3 -d -o "meeting.srt" -q

"Open with…" integration

Right-click an audio file in Explorer → Open with → Choose another app → browse to WhisperTranscriber.exe. The file is preselected when the app opens.

Speaker diarization setup

When you enable Identify speakers for the first time:

The app asks for a free HuggingFace token (saved locally, never sent anywhere else)
You must accept the user conditions on all three pyannote model pages:
If a model has not been accepted yet, the app will pop up a dialog with a clickable link straight to that model page

Once configured, every transcription with diarize on prefixes each segment with SPEAKER_00:, SPEAKER_01:, etc.

Troubleshooting

All operations write to %LOCALAPPDATA%\WhisperTranscriber\log.txt. Open the data folder via the Open folder button to inspect logs, clear cached models, or do a full reset (delete the whole folder).

Common issues:

Stuck on first launch → check log.txt; usually a network problem during pip install
cublas64_12.dll not found → likely you have an NVIDIA GPU but old drivers; update or fall back to CPU
Pipeline is None during diarization → token invalid or you haven't accepted all the gated models above

Built with

.NET 10 + Windows Forms
faster-whisper (CTranslate2 backend)
OpenAI Whisper (models)
pyannote.audio (diarization)
Embedded Python 3.11

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
.gitignore		.gitignore
AppSettings.cs		AppSettings.cs
GpuDetector.cs		GpuDetector.cs
LICENSE		LICENSE
Localization.cs		Localization.cs
Logger.cs		Logger.cs
MainForm.Designer.cs		MainForm.Designer.cs
MainForm.cs		MainForm.cs
Program.cs		Program.cs
README.md		README.md
SetupManager.cs		SetupManager.cs
StartupArgs.cs		StartupArgs.cs
TokenDialog.cs		TokenDialog.cs
TranscriptionRunner.cs		TranscriptionRunner.cs
WhisperTranscriber.csproj		WhisperTranscriber.csproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Transcriber

Features

Requirements

Installation

Usage

GUI

CLI

"Open with…" integration

Speaker diarization setup

Troubleshooting

Built with

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Whisper Transcriber

Features

Requirements

Installation

Usage

GUI

CLI

"Open with…" integration

Speaker diarization setup

Troubleshooting

Built with

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages