Reference health: scanner, CI workflow, and bib cleanup#38
Merged
Conversation
Adds database/pyscripts/check_references.py — a standalone scanner that parses papers.bib, HEAD-checks every URL in url/pdf/html fields, and writes a structured report to database/build/reference_check.json. Supports --recover to query the Wayback Machine availability API for broken NERC URLs. Wired into the CLI as `python pswiki.py check-refs`. Scan findings fixed in papers.bib: - 5 broken NERC URLs replaced with Wayback Machine archive snapshots (nerc2011ancillary, nerc2013terminology, nerc2010flexible, nerc2023faq, and non-NERC wollenberg2015cimhistory) - 2 PJM manual entries pinned from floating current-doc URLs to versioned archive/ paths (pjm2025m3v69 → m03v69, pjm2024m14b → m14bv57) - 2 unrecoverable entries annotated with note fields (nerc2015bal0011, nyiso2024ancillary) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds .github/workflows/check-references.yml: quarterly scan (Jan/Apr/Jul/Oct 8) plus push trigger on papers.bib changes. Two jobs: scan (runs check-refs --recover, uploads artifact) and report (renders and updates a persistent tracking GitHub Issue with broken refs grouped by recoverability, involved wiki terms, and a collapsed agent-instructions JSON block for skill pick-up). Extends check_references.py with find_citing_terms() so each broken entry in the JSON report now carries a used_by list of _wiki/*.md paths. Adds render_ref_issue.py, the standalone renderer called by the workflow to generate the issue body. Also adds changelog entry for all reference health work on this branch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove pdf field from 64 entries where it duplicated url (al-folio display artifact)
- Remove html field from all 21 entries (al-folio display artifact; url is the canonical field)
- Remove bibtex_show = {true} from all 123 entries (stripped by bib2json.py, never surfaced)
- Clean url for giraldez2024large and shah2024interconnection (removed ephemeral refresh params)
- Fix matpowerv71: url now points to versioned v7.1 PDF; note field records documentation index
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Update nerc2015bal0011 URL to NERCipedia archive (NERC.com PDF was broken) - Add "Inactive." to CPS2 description to flag superseded status - Remove stale danger block warning about unavailable reference - Remove erroneous page reference (p3) from BAL-001-1 citation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Reclassify HTTP 403 as server_error (bot-blocking CDNs and DOI resolvers return 403 to automated clients but work in browsers; wrong to flag as broken) - Sort issue table rows so entries with no used_by terms appear last - Update section heading to "5xx / 403 / timeout" to match new classification Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Document check-refs command in Developer CLI section of development.md - Add bib sanitization and scanner false-positive fix to changelog - Add check-refs to CLI command list in changelog entry Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
python pswiki.py check-refs) — scans allpapers.bibURLs for broken links, with optional Wayback Machine recovery for NERC entries; reclassifies HTTP 403 as "possibly temporary" to avoid false positives from bot-blocking CDNs (PJM, DOI resolvers).github/workflows/check-references.yml) — runs on schedule and onpapers.bibpush; surfaces results in a persistent GitHub Issue (#37) with structured agent instructions blockpapers.bibfixes — resolved 7 broken citations (5 NERC → Wayback Machine archives, 2 PJM floating → versioned archive paths); fixed 2 ESIG entries with ephemeral query params; updatednerc2015bal0011to NERCipedia archive; fixednerc2013terminologywith Wayback Machine snapshotpapers.bibsanitization — removed al-folio display fields (pdf,html,bibtex_show) no longer used by the MkDocs renderercheck-refsadded to Developer CLI section indocs/development.md; changelog updatedCloses #11
Closes #37
Test plan
maintouchingpapers.bib)reference_check.jsonartifact is uploaded and downloadable from the Actions runpython pswiki.py check-refslocally and confirm no regressions🤖 Generated with Claude Code