Skip to content

Fix WOX (West Oxfordshire) scraper#373

Open
symroe wants to merge 1 commit into
masterfrom
fix/WOX-scraper
Open

Fix WOX (West Oxfordshire) scraper#373
symroe wants to merge 1 commit into
masterfrom
fix/WOX-scraper

Conversation

@symroe

@symroe symroe commented Jun 21, 2026

Copy link
Copy Markdown
Member

What broke

West Oxfordshire's ModGov endpoint (meetings.westoxon.gov.uk) serves valid councillor XML to standard HTTP clients — curl returns HTTP 200 in ~2.7 seconds — but wreq's Firefox133 TLS fingerprint times out from Lambda. The server's WAF drops the connection during the TLS handshake before sending any response. This is the same pattern previously fixed on DAC, LBH, NYE, EPS, NWM, and NEW.

What was fixed

  • councillors.py: added http_lib = "playwright" — Playwright uses Chromium's standard Chrome TLS fingerprint, which passes the WAF. The server certificate is valid so no verify_requests flag is needed.

Scrape results

Metric Count
Councillors found TBC after Lambda run
With email address TBC after Lambda run
With photo TBC after Lambda run

The HTTPS endpoint is confirmed live and returning valid ModGov XML from a non-Lambda IP. Counts will be confirmed once the Lambda scraper runs with Playwright.


Generated by Claude Code

meetings.westoxon.gov.uk serves valid ModGov XML to standard HTTP
clients (curl returns HTTP 200 in ~2.7s) but wreq's Firefox133 TLS
fingerprint times out from Lambda. The server's WAF drops the
connection during the TLS handshake, consistent with the same pattern
fixed on DAC, LBH, NYE, EPS, NWM, and NEW. Playwright's Chromium TLS
fingerprint passes the WAF.
@symroe

symroe commented Jun 21, 2026

Copy link
Copy Markdown
Member Author

Re-scrape after 7f7a5f1

Added http_lib = "playwright" to bypass WAF TLS fingerprint block. The endpoint returns HTTP 200 with valid ModGov XML from a non-Lambda IP (confirmed via curl in ~2.7s). Actual councillor counts require a Lambda run with Playwright.

Metric Count
Councillors found TBC after Lambda run
With email address TBC after Lambda run
With photo TBC after Lambda run

Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant