Skip to content

Leli254/Scribus

Repository files navigation

Scribus logo

Scribus

Scribus is a small Python toolkit for turning audio into text—and, for longer material, turning YouTube lectures into readable PDFs. It ships as a library plus two entry points: a command-line interface and an optional desktop GUI (PyQt5).


Table of contents


What it does

Capability How it works Best for
Local file → text (Google) pydub loads audio; SpeechRecognition calls Google’s public Web Speech API. Short clips, quick experiments. Needs internet. Subject to Google’s limits and terms—not a bulk production service.
YouTube → PDF (Whisper) yt-dlp downloads audio; faster-whisper transcribes on your machine; ReportLab writes a structured PDF. Long talks, lectures, interviews. Works offline after download; first run may fetch Whisper model weights.
GUI PyQt5 front end around the YouTube → Whisper → PDF pipeline. Users who prefer a window over a terminal.

You are responsible for complying with YouTube’s Terms of Service, the rights of content owners, and applicable law when downloading or reusing material.


Which path should I use?

  • I have a file on disk and only need a short transcript → use the Google CLI (minimal install is enough).
  • I have a YouTube link and want a full session as a PDF → use the GUI or scribus youtube (default requirements.txt or [gui] / [lecture] extras).
  • I need maximum control or automation → use the CLI with flags for model, device, and language.

Prerequisites

  • Python 3.10 or newer
  • ffmpeg on your PATH (used by pydub and by yt-dlp’s audio post-processing)

Installing ffmpeg

Ubuntu / Debian

sudo apt-get update && sudo apt-get install -y ffmpeg

macOS (Homebrew)

brew install ffmpeg

Windows
Install ffmpeg from a trusted build (for example the official downloads) and ensure the ffmpeg executable is on your PATH. Package managers such as Chocolatey (choco install ffmpeg) or Scoop are common choices.


Installation

Clone the repository, create a virtual environment, then install the package in editable form so the scribus and scribus-gui commands are available.

Default (recommended): full stack

requirements.txt lists pydub, SpeechRecognition, yt-dlp, faster-whisper, reportlab, and PyQt5, so one file covers the usual YouTube → PDF and GUI workflows.

git clone https://github.com/Leli254/Scribus.git
cd Scribus
python3 -m venv .venv
source .venv/bin/activate          # Windows CMD: .venv\Scripts\activate.bat
                                   # Windows PowerShell: .venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python -m pip install -e .

Alternative: extras only (no requirements.txt)

Equivalent dependency set via pyproject.toml:

python -m pip install -e ".[gui]"

Use [lecture] if you want YouTube + Whisper + PDF without the PyQt5 GUI.

Minimal: Google-only CLI

Smaller download; no YouTube, Whisper, PDF export, or GUI:

python -m pip install -r requirements-minimal.txt
python -m pip install -e .

Optional dependency groups (reference)

Extra Installs
[youtube] yt-dlp
[whisper] faster-whisper
[pdf] reportlab
[lecture] youtube + whisper + pdf
[gui] lecture + PyQt5
[dev] pytest and reportlab (for tests)

Quick start

After Default or Alternative installation:

# Desktop app
scribus-gui

# Or from the repo without activating scripts:
python -m scribus_gui
# YouTube → PDF (replace URL and output path)
scribus youtube "https://www.youtube.com/watch?v=VIDEO_ID" --pdf lecture.pdf --model small
# Local file, one segment, plain text (Google)
python -m scribus path/to/audio.mp3 --start 1:45 --end 1:55 --out transcript.txt

The first Whisper run can download large model files; pick a smaller --model (for example tiny or base) if you are constrained on RAM or disk.


Desktop application (GUI)

  1. Start scribus-gui (or python -m scribus_gui).
  2. Paste a YouTube watch URL.
  3. Choose the output PDF path (use Browse…).
  4. Optionally set language (ISO-639-1, for example en); leave blank for automatic detection.
  5. Choose a Whisper model (small is a reasonable default balance of speed and quality).
  6. Click Start and watch the log and progress bar.

Cancel asks the pipeline to stop between major steps. Transcription itself may not interrupt instantly inside a single model call.


Command line: YouTube → PDF (Whisper)

scribus youtube "https://www.youtube.com/watch?v=VIDEO_ID" --pdf lecture.pdf
Option Description
--pdf PATH Required. Output PDF file.
--model NAME Whisper model: tiny, base, small, medium, large-v2, large-v3 (default: small).
--language CODE ISO-639-1 (for example en). Omit for auto-detect.
--device VALUE auto, cpu, or cuda.
--compute-type VALUE Passed to faster-whisper (for example int8, float16); see upstream docs.
-v, --verbose Debug logging.

Command line: local audio → text (Google)

Works with any format pydub can decode (via ffmpeg). Requires network access for Google recognition.

python -m scribus path/to/audio.mp3 --start 1:45 --end 1:55 --out transcript.txt

Shim (same CLI):

python main.py path/to/audio.mp3 --start 1:45 --end 1:55 --out transcript.txt

If the package is not installed but the repo is on disk:

PYTHONPATH=. python -m scribus path/to/audio.mp3 --start 0s --end 30s --out out.txt

Time formats for --start / --end

Form Example Meaning
Milliseconds 90000 or 90000ms 90 seconds
Seconds 90s or 12.5s Wall-clock seconds
Minutes and seconds 1:45 1 minute 45 seconds
Hours (optional) 0:1:05 1 minute 5 seconds

A bare integer is interpreted as milliseconds (for example 105000 is 1:45).

Full file with Google (chunked)

Omit --start and --end to transcribe the entire file in chunks (configurable):

python -m scribus path/to/episode.mp3 --max-chunk-ms 55000 --out whole_episode.txt

Useful flags (Google path)

Flag Purpose
--out FILE UTF-8 transcript with trailing newline
--audio-out FILE Export WAV (selected segment, or full decode in full-file mode)
--language CODE BCP-47 tag (default en-US)
--max-chunk-ms N Maximum chunk length in full-file mode
--max-segment-ms N Maximum allowed span for --start/--end (default: 30 minutes)
--allow-huge-segment Allow spans wider than --max-segment-ms

Exit codes

Code Meaning
0 Success
1 Validation, I/O, API, or processing error
2 Invalid CLI usage (argparse)
130 Interrupted (Ctrl+C)

Troubleshooting

“yt-dlp is required for YouTube downloads”
You are missing optional dependencies. Install them with:

python -m pip install -r requirements.txt

or python -m pip install -e ".[youtube]" / ".[lecture]" / ".[gui]", then try again.

“Could not decode audio” / ffmpeg errors
Install ffmpeg and confirm ffmpeg -version works in the same terminal (and virtual environment) you use to run Scribus.

Whisper is slow or uses too much memory
Use a smaller --model (tiny, base). On CPU, prefer modest chunking defaults already used by faster-whisper; for GPU, set --device cuda if drivers and CUDA builds match your environment.

GUI does not start
Confirm PyQt5 installed (python -m pip show PyQt5) and that you are launching from an environment where scribus-gui was installed (pip install -e . after dependencies).


Project layout

Scribus/
├── Assets/                 # Branding and sample notes
├── scribus/                # Core package (CLI, audio, YouTube, Whisper, PDF, pipeline)
├── scribus_gui/            # PyQt5 application
├── tests/                  # pytest
├── main.py                 # Compatibility entry → CLI
├── pyproject.toml
├── requirements.txt        # Default (full) dependency set
├── requirements-minimal.txt
├── README.md
└── LICENSE

Development

python -m pip install -e ".[dev]"
pytest

License and responsible use

This project is released under the MIT License.

Use of third-party services (Google speech endpoints) and of YouTube content is governed by their respective terms. Scribus is a tool; you remain responsible for how you use it.

About

A Python toolkit and PyQt5 GUI for transcribing audio and converting YouTube lectures into structured, readable PDFs using faster-whisper and Google Speech API

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages