Fix SND (Sunderland) scraper#360
Open
symroe wants to merge 1 commit into
Open
Conversation
Sunderland's CMIS server (committees.sunderland.gov.uk) uses a TLS certificate not trusted by wreq's embedded BoringSSL CA bundle. Adding verify_requests = False bypasses cert verification so wreq can reach the endpoint. The server had a transient outage (503) that cleared up, so the fix can now be verified. Additionally, the CMIS list-page card divs each contain a PenPicResize img element with the councillor's headshot. The base CMISCouncillorScraper does not extract photos, so this scraper overrides get_single_councillor to pull the photo URL from the list-page HTML and resolve it against the base URL. All 75 councillors now have photos. Fixes #359.
Member
Author
Re-scrape after 3804833Initial scrape with
Photos: the Previously: 0 photos (base CMISCouncillorScraper does not extract photos). Now: 75 → all covered. Generated by Claude Code |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What broke
Sunderland's CMIS server (
committees.sunderland.gov.uk) uses a TLS certificate not trusted by wreq's embedded BoringSSL CA bundle. Every request failed immediately withCERTIFICATE_VERIFY_FAILED. The server also had a transient 503 outage that obscured this cert layer (documented in #359); the server came back up on 2026-06-17, revealing the cert error as the root cause.Additionally, the CMIS councillor list cards each contain a
PenPicResizeimg element with the councillor's thumbnail photo — but the baseCMISCouncillorScraperdoesn't extract photos. This PR overridesget_single_councillorto extract and resolve those photo URLs.What was fixed
verify_requests = Falseto theScraperclass — bypasses wreq's BoringSSL cert validation so requests reach the serverget_single_councilloroverride to extract thePenPicResizethumbnail from each list-page card and setcouncillor.photo_url(resolves relative/committees/CMIS5/images/...paths against the base URL)Scrape results
Closes #359.
Generated by Claude Code