Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,6 @@ _site
.tmp/
vendor/bundle
.bundle
__pycache__/
*.py[cod]
*$py.class
84 changes: 84 additions & 0 deletions scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,90 @@

This directory contains scripts for managing and converting transcripts for Gary's Economics.

## download.py

This script downloads YouTube video transcripts from the Gary's Economics channel and saves them in the `./transcripts` folder format with metadata and VTT files.

### What it does:

1. Fetches the list of all videos from the YouTube channel
2. For each video, downloads metadata (title, views, likes, etc.)
3. Downloads English subtitles (preferring manual subtitles over auto-generated)
4. Saves data to `./transcripts/YYYY-MM-DD-slug/` folders with:
- `meta.json`: Video metadata
- `transcript.vtt`: WebVTT subtitle file

### Usage:

```bash
python3 scripts/download.py [--offset N]
```

**Options:**
- `--offset N`: Start processing from video number N (useful for resuming interrupted downloads)

### Notes:

- Requires `yt-dlp` to be installed
- Skips videos that already have transcript folders
- Applies rate limiting (0.5s delay between videos) to avoid API restrictions
- Prefers subtitles in order: en-GB → en-orig → en
- Downloads both manual and auto-generated subtitles if manual are not available

## update.py

This script combines the functionality of `download.py` and `convert_transcripts.py`, downloading transcripts and directly creating Jekyll posts and VTT caption files. It also supports updating existing posts with current view counts and like counts.

### What it does:

1. Fetches the list of all videos from the YouTube channel
2. Identifies existing posts by reading all `youtube_id` values from `_posts/` files
3. For videos **not yet in the site**:
- Downloads metadata and subtitles from YouTube
- Creates Jekyll post in `_posts/` with YAML front matter
- Copies VTT file to `_includes/captions/` named by YouTube ID
4. When using `--update` flag, for **all existing videos**:
- Fetches current metadata from YouTube
- Updates `view_count` and `like_count` in the corresponding `_posts/` file

### Usage:

```bash
# Download new videos and create posts
python3 scripts/update.py

# Update view and like counts for all existing posts
python3 scripts/update.py --update

# Start from a specific offset
python3 scripts/update.py --offset 50
```

**Options:**
- `--update`: Update existing posts with current view_count and like_count
- `--offset N`: Start processing from video number N

### Output Format:

**Post files** (`_posts/YYYY-MM-DD-slug.md`):
- YAML front matter with metadata (title, date, youtube_id, view_count, like_count, etc.)
- Title and descriptions properly escaped for YAML
- Tags and categories as lists
- Reference to caption file

**Caption files** (`_includes/captions/YOUTUBE_ID.vtt`):
- WebVTT format subtitle files
- Named by YouTube video ID for easy reference

### Notes:

- Requires `yt-dlp` to be installed
- Creates `_posts/` and `_includes/captions/` directories if they don't exist
- Special characters in titles and descriptions are properly escaped
- Multi-line descriptions use YAML literal block scalar format
- Applies rate limiting (0.5s delay between videos)
- In update mode, only fetches metadata (faster than full download)

## convert_transcripts.py

This script converts transcript data from the `./transcripts` folder into the Jekyll format used by `_posts` and `_includes/captions`.
Expand Down
Loading