Skip to content

Reference health: scanner, CI workflow, and bib cleanup#38

Merged
hanzelei merged 7 commits into
mainfrom
fix/broken-references
May 27, 2026
Merged

Reference health: scanner, CI workflow, and bib cleanup#38
hanzelei merged 7 commits into
mainfrom
fix/broken-references

Conversation

@hanzelei

@hanzelei hanzelei commented May 27, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Reference URL scanner (python pswiki.py check-refs) — scans all papers.bib URLs for broken links, with optional Wayback Machine recovery for NERC entries; reclassifies HTTP 403 as "possibly temporary" to avoid false positives from bot-blocking CDNs (PJM, DOI resolvers)
  • Quarterly CI workflow (.github/workflows/check-references.yml) — runs on schedule and on papers.bib push; surfaces results in a persistent GitHub Issue (#37) with structured agent instructions block
  • papers.bib fixes — resolved 7 broken citations (5 NERC → Wayback Machine archives, 2 PJM floating → versioned archive paths); fixed 2 ESIG entries with ephemeral query params; updated nerc2015bal0011 to NERCipedia archive; fixed nerc2013terminology with Wayback Machine snapshot
  • papers.bib sanitization — removed al-folio display fields (pdf, html, bibtex_show) no longer used by the MkDocs renderer
  • Term updates — CPS2 description updated (marked Inactive, stale danger block removed); operating-reserve reference updated
  • Docscheck-refs added to Developer CLI section in docs/development.md; changelog updated

Closes #11
Closes #37

Test plan

  • Confirm CI workflow triggers on merge (push to main touching papers.bib)
  • Verify reference_check.json artifact is uploaded and downloadable from the Actions run
  • Verify tracking issue body is updated and a comment is posted with the run link
  • Spot-check that PJM and DOI URLs no longer appear in the "broken" table (should be "possibly temporary" or absent)
  • Run python pswiki.py check-refs locally and confirm no regressions

🤖 Generated with Claude Code

hanzelei and others added 7 commits May 27, 2026 12:58
Adds database/pyscripts/check_references.py — a standalone scanner that
parses papers.bib, HEAD-checks every URL in url/pdf/html fields, and
writes a structured report to database/build/reference_check.json.
Supports --recover to query the Wayback Machine availability API for
broken NERC URLs. Wired into the CLI as `python pswiki.py check-refs`.

Scan findings fixed in papers.bib:
- 5 broken NERC URLs replaced with Wayback Machine archive snapshots
  (nerc2011ancillary, nerc2013terminology, nerc2010flexible, nerc2023faq,
  and non-NERC wollenberg2015cimhistory)
- 2 PJM manual entries pinned from floating current-doc URLs to versioned
  archive/ paths (pjm2025m3v69 → m03v69, pjm2024m14b → m14bv57)
- 2 unrecoverable entries annotated with note fields
  (nerc2015bal0011, nyiso2024ancillary)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds .github/workflows/check-references.yml: quarterly scan (Jan/Apr/Jul/Oct 8)
plus push trigger on papers.bib changes. Two jobs: scan (runs check-refs --recover,
uploads artifact) and report (renders and updates a persistent tracking GitHub Issue
with broken refs grouped by recoverability, involved wiki terms, and a collapsed
agent-instructions JSON block for skill pick-up).

Extends check_references.py with find_citing_terms() so each broken entry in the
JSON report now carries a used_by list of _wiki/*.md paths. Adds render_ref_issue.py,
the standalone renderer called by the workflow to generate the issue body.

Also adds changelog entry for all reference health work on this branch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove pdf field from 64 entries where it duplicated url (al-folio display artifact)
- Remove html field from all 21 entries (al-folio display artifact; url is the canonical field)
- Remove bibtex_show = {true} from all 123 entries (stripped by bib2json.py, never surfaced)
- Clean url for giraldez2024large and shah2024interconnection (removed ephemeral refresh params)
- Fix matpowerv71: url now points to versioned v7.1 PDF; note field records documentation index

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Update nerc2015bal0011 URL to NERCipedia archive (NERC.com PDF was broken)
- Add "Inactive." to CPS2 description to flag superseded status
- Remove stale danger block warning about unavailable reference
- Remove erroneous page reference (p3) from BAL-001-1 citation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Reclassify HTTP 403 as server_error (bot-blocking CDNs and DOI resolvers
  return 403 to automated clients but work in browsers; wrong to flag as broken)
- Sort issue table rows so entries with no used_by terms appear last
- Update section heading to "5xx / 403 / timeout" to match new classification

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Document check-refs command in Developer CLI section of development.md
- Add bib sanitization and scanner false-positive fix to changelog
- Add check-refs to CLI command list in changelog entry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@hanzelei hanzelei merged commit 8616534 into main May 27, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] Reference Health Tracker nerc2013terminology point to missing document

1 participant