Skip to content

Fix SND (Sunderland) scraper#360

Open
symroe wants to merge 1 commit into
masterfrom
fix/SND-scraper
Open

Fix SND (Sunderland) scraper#360
symroe wants to merge 1 commit into
masterfrom
fix/SND-scraper

Conversation

@symroe

@symroe symroe commented Jun 17, 2026

Copy link
Copy Markdown
Member

What broke

Sunderland's CMIS server (committees.sunderland.gov.uk) uses a TLS certificate not trusted by wreq's embedded BoringSSL CA bundle. Every request failed immediately with CERTIFICATE_VERIFY_FAILED. The server also had a transient 503 outage that obscured this cert layer (documented in #359); the server came back up on 2026-06-17, revealing the cert error as the root cause.

Additionally, the CMIS councillor list cards each contain a PenPicResize img element with the councillor's thumbnail photo — but the base CMISCouncillorScraper doesn't extract photos. This PR overrides get_single_councillor to extract and resolve those photo URLs.

What was fixed

  • Added verify_requests = False to the Scraper class — bypasses wreq's BoringSSL cert validation so requests reach the server
  • Added get_single_councillor override to extract the PenPicResize thumbnail from each list-page card and set councillor.photo_url (resolves relative /committees/CMIS5/images/... paths against the base URL)

Scrape results

Metric Count
Councillors found 75
With email address 74
With photo 75

Closes #359.


Generated by Claude Code

Sunderland's CMIS server (committees.sunderland.gov.uk) uses a TLS
certificate not trusted by wreq's embedded BoringSSL CA bundle. Adding
verify_requests = False bypasses cert verification so wreq can reach the
endpoint. The server had a transient outage (503) that cleared up, so
the fix can now be verified.

Additionally, the CMIS list-page card divs each contain a PenPicResize
img element with the councillor's headshot. The base CMISCouncillorScraper
does not extract photos, so this scraper overrides get_single_councillor
to pull the photo URL from the list-page HTML and resolve it against the
base URL. All 75 councillors now have photos.

Fixes #359.
@symroe

symroe commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

Re-scrape after 3804833

Initial scrape with verify_requests = False + get_single_councillor photo extraction.

Metric Count
Councillors found 75
With email address 74
With photo 75

Photos: the PenPicResize thumbnail is present on all 75 councillor list cards. Sample URL: https://committees.sunderland.gov.uk/committees/CMIS5/images/CMIS/PenPics/Thumbnails/1974.jpg — returns HTTP 200.

Previously: 0 photos (base CMISCouncillorScraper does not extract photos). Now: 75 → all covered.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SND (Sunderland) scraper failing — cert fix identified but CMIS server returning 503

2 participants