Scribus is a small Python toolkit for turning audio into text—and, for longer material, turning YouTube lectures into readable PDFs. It ships as a library plus two entry points: a command-line interface and an optional desktop GUI (PyQt5).
- What it does
- Which path should I use?
- Prerequisites
- Installation
- Quick start
- Desktop application (GUI)
- Command line: YouTube → PDF (Whisper)
- Command line: local audio → text (Google)
- Troubleshooting
- Project layout
- Development
- License and responsible use
| Capability | How it works | Best for |
|---|---|---|
| Local file → text (Google) | pydub loads audio; SpeechRecognition calls Google’s public Web Speech API. | Short clips, quick experiments. Needs internet. Subject to Google’s limits and terms—not a bulk production service. |
| YouTube → PDF (Whisper) | yt-dlp downloads audio; faster-whisper transcribes on your machine; ReportLab writes a structured PDF. | Long talks, lectures, interviews. Works offline after download; first run may fetch Whisper model weights. |
| GUI | PyQt5 front end around the YouTube → Whisper → PDF pipeline. | Users who prefer a window over a terminal. |
You are responsible for complying with YouTube’s Terms of Service, the rights of content owners, and applicable law when downloading or reusing material.
- I have a file on disk and only need a short transcript → use the Google CLI (minimal install is enough).
- I have a YouTube link and want a full session as a PDF → use the GUI or
scribus youtube(defaultrequirements.txtor[gui]/[lecture]extras). - I need maximum control or automation → use the CLI with flags for model, device, and language.
- Python 3.10 or newer
- ffmpeg on your
PATH(used by pydub and by yt-dlp’s audio post-processing)
Ubuntu / Debian
sudo apt-get update && sudo apt-get install -y ffmpegmacOS (Homebrew)
brew install ffmpegWindows
Install ffmpeg from a trusted build (for example the official downloads) and ensure the ffmpeg executable is on your PATH. Package managers such as Chocolatey (choco install ffmpeg) or Scoop are common choices.
Clone the repository, create a virtual environment, then install the package in editable form so the scribus and scribus-gui commands are available.
requirements.txt lists pydub, SpeechRecognition, yt-dlp, faster-whisper, reportlab, and PyQt5, so one file covers the usual YouTube → PDF and GUI workflows.
git clone https://github.com/Leli254/Scribus.git
cd Scribus
python3 -m venv .venv
source .venv/bin/activate # Windows CMD: .venv\Scripts\activate.bat
# Windows PowerShell: .venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python -m pip install -e .Equivalent dependency set via pyproject.toml:
python -m pip install -e ".[gui]"Use [lecture] if you want YouTube + Whisper + PDF without the PyQt5 GUI.
Smaller download; no YouTube, Whisper, PDF export, or GUI:
python -m pip install -r requirements-minimal.txt
python -m pip install -e .| Extra | Installs |
|---|---|
[youtube] |
yt-dlp |
[whisper] |
faster-whisper |
[pdf] |
reportlab |
[lecture] |
youtube + whisper + pdf |
[gui] |
lecture + PyQt5 |
[dev] |
pytest and reportlab (for tests) |
After Default or Alternative installation:
# Desktop app
scribus-gui
# Or from the repo without activating scripts:
python -m scribus_gui# YouTube → PDF (replace URL and output path)
scribus youtube "https://www.youtube.com/watch?v=VIDEO_ID" --pdf lecture.pdf --model small# Local file, one segment, plain text (Google)
python -m scribus path/to/audio.mp3 --start 1:45 --end 1:55 --out transcript.txtThe first Whisper run can download large model files; pick a smaller --model (for example tiny or base) if you are constrained on RAM or disk.
- Start
scribus-gui(orpython -m scribus_gui). - Paste a YouTube watch URL.
- Choose the output PDF path (use Browse…).
- Optionally set language (ISO-639-1, for example
en); leave blank for automatic detection. - Choose a Whisper model (
smallis a reasonable default balance of speed and quality). - Click Start and watch the log and progress bar.
Cancel asks the pipeline to stop between major steps. Transcription itself may not interrupt instantly inside a single model call.
scribus youtube "https://www.youtube.com/watch?v=VIDEO_ID" --pdf lecture.pdf| Option | Description |
|---|---|
--pdf PATH |
Required. Output PDF file. |
--model NAME |
Whisper model: tiny, base, small, medium, large-v2, large-v3 (default: small). |
--language CODE |
ISO-639-1 (for example en). Omit for auto-detect. |
--device VALUE |
auto, cpu, or cuda. |
--compute-type VALUE |
Passed to faster-whisper (for example int8, float16); see upstream docs. |
-v, --verbose |
Debug logging. |
Works with any format pydub can decode (via ffmpeg). Requires network access for Google recognition.
python -m scribus path/to/audio.mp3 --start 1:45 --end 1:55 --out transcript.txtShim (same CLI):
python main.py path/to/audio.mp3 --start 1:45 --end 1:55 --out transcript.txtIf the package is not installed but the repo is on disk:
PYTHONPATH=. python -m scribus path/to/audio.mp3 --start 0s --end 30s --out out.txt| Form | Example | Meaning |
|---|---|---|
| Milliseconds | 90000 or 90000ms |
90 seconds |
| Seconds | 90s or 12.5s |
Wall-clock seconds |
| Minutes and seconds | 1:45 |
1 minute 45 seconds |
| Hours (optional) | 0:1:05 |
1 minute 5 seconds |
A bare integer is interpreted as milliseconds (for example 105000 is 1:45).
Omit --start and --end to transcribe the entire file in chunks (configurable):
python -m scribus path/to/episode.mp3 --max-chunk-ms 55000 --out whole_episode.txt| Flag | Purpose |
|---|---|
--out FILE |
UTF-8 transcript with trailing newline |
--audio-out FILE |
Export WAV (selected segment, or full decode in full-file mode) |
--language CODE |
BCP-47 tag (default en-US) |
--max-chunk-ms N |
Maximum chunk length in full-file mode |
--max-segment-ms N |
Maximum allowed span for --start/--end (default: 30 minutes) |
--allow-huge-segment |
Allow spans wider than --max-segment-ms |
| Code | Meaning |
|---|---|
0 |
Success |
1 |
Validation, I/O, API, or processing error |
2 |
Invalid CLI usage (argparse) |
130 |
Interrupted (Ctrl+C) |
“yt-dlp is required for YouTube downloads”
You are missing optional dependencies. Install them with:
python -m pip install -r requirements.txtor python -m pip install -e ".[youtube]" / ".[lecture]" / ".[gui]", then try again.
“Could not decode audio” / ffmpeg errors
Install ffmpeg and confirm ffmpeg -version works in the same terminal (and virtual environment) you use to run Scribus.
Whisper is slow or uses too much memory
Use a smaller --model (tiny, base). On CPU, prefer modest chunking defaults already used by faster-whisper; for GPU, set --device cuda if drivers and CUDA builds match your environment.
GUI does not start
Confirm PyQt5 installed (python -m pip show PyQt5) and that you are launching from an environment where scribus-gui was installed (pip install -e . after dependencies).
Scribus/
├── Assets/ # Branding and sample notes
├── scribus/ # Core package (CLI, audio, YouTube, Whisper, PDF, pipeline)
├── scribus_gui/ # PyQt5 application
├── tests/ # pytest
├── main.py # Compatibility entry → CLI
├── pyproject.toml
├── requirements.txt # Default (full) dependency set
├── requirements-minimal.txt
├── README.md
└── LICENSE
python -m pip install -e ".[dev]"
pytestThis project is released under the MIT License.
Use of third-party services (Google speech endpoints) and of YouTube content is governed by their respective terms. Scribus is a tool; you remain responsible for how you use it.
