Skip to content

Add script to process YouTube videos and linter for post/caption consistency#15

Open
Copilot wants to merge 4 commits into
mainfrom
copilot/add-youtube-vtt-download-script
Open

Add script to process YouTube videos and linter for post/caption consistency#15
Copilot wants to merge 4 commits into
mainfrom
copilot/add-youtube-vtt-download-script

Conversation

Copilot AI commented Oct 26, 2025

Copy link
Copy Markdown
Contributor
  • Explore repository structure and understand existing code
  • Create process_video.py script to download VTT files and create Jekyll post
  • Create linter.py script to check consistency between posts and captions
  • Test linter.py script (passes with 349 matching posts/VTT files)
  • Update scripts/README.md to document both new scripts
  • Address code review feedback (imports, exceptions, .gitignore)
  • Pass CodeQL security scan (0 alerts)
  • Inline extract_youtube_id_from_vtt_filename function (requested in PR comment)
  • Final verification of all changes

Summary

Successfully implemented two new Python scripts and addressed all feedback:

process_video.py (302 lines) - Downloads VTT caption files for a YouTube video using yt-dlp and creates Jekyll post with complete frontmatter

linter.py (172 lines) - Validates consistency between _posts/ and _includes/captions/ directories

Latest change: Merged extract_youtube_id_from_vtt_filename() inline into get_all_vtt_files() since it was only called once, simplifying the code structure.

Original prompt

for youtube video id Ja9dTjY3uWU, write a new file in /scripts with a python script, re-using the existing code files in this folder in other files and passing the video ID as an arg, to download all the VTT files if they do not already exist, using the yt-dlp command in issue #14, and also create the /_posts/ jekyll post markdown file with frontmatter for this video, with all the metadata provided in all other /_posts files in the frontmatter. Update the ./scripts/README to explain what it does using the same structure as that file's existing information for the existing code files.

Then write another new file in the /scripts folder called linter.py that checks that for each transcript file in /_includes/captions there is a /_posts/ file with the same youtube ID, and for each /_posts/ file there is at least one VTT transcript file in /_includes/captions with the youtube ID as the $BASE_NAME.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits October 26, 2025 22:52
Co-authored-by: davelab6 <261579+davelab6@users.noreply.github.com>
…handling, add __pycache__ to .gitignore

Co-authored-by: davelab6 <261579+davelab6@users.noreply.github.com>
Copilot AI changed the title [WIP] Add script to download VTT files and create Jekyll post Add script to process YouTube videos and linter for post/caption consistency Oct 26, 2025
Copilot AI requested a review from davelab6 October 26, 2025 23:01
@davelab6

Copy link
Copy Markdown
Contributor

@copilot since extract_youtube_id_from_vtt_filename() is only called once, merge it into the function that calls it.

Co-authored-by: davelab6 <261579+davelab6@users.noreply.github.com>

Copilot AI commented Oct 26, 2025

Copy link
Copy Markdown
Contributor Author

@copilot since extract_youtube_id_from_vtt_filename() is only called once, merge it into the function that calls it.

Done in 14d9644. The function is now inlined into get_all_vtt_files() where it's used.

@davelab6 davelab6 marked this pull request as ready for review October 26, 2025 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants