Skip to content

Fix NYE (North Yorkshire) scraper#364

Open
symroe wants to merge 1 commit into
masterfrom
fix/NYE-scraper
Open

Fix NYE (North Yorkshire) scraper#364
symroe wants to merge 1 commit into
masterfrom
fix/NYE-scraper

Conversation

@symroe

@symroe symroe commented Jun 18, 2026

Copy link
Copy Markdown
Member

What broke

North Yorkshire's edemocracy server (edemocracy.northyorks.gov.uk) fails from Lambda with a ConnectionResetError (errno 110, ETIMEDOUT) during the TLS handshake — the server accepts the TCP connection but drops it during the SSL exchange, consistent with WAF-level TLS fingerprint filtering on wreq's Firefox133 client hello. From a clean IP with system SSL (curl), the endpoint returns HTTP 200 with valid ModGov XML.

What was fixed

  • Added http_lib = "playwright" — Playwright uses Chromium's standard Chrome TLS fingerprint, bypassing the server-side WAF that blocks wreq's fingerprint. Chromium also uses the system cert store (trusting the server's valid cert), so no verify_requests flag is needed.

Scrape results

Cannot run locally (TLS inspection proxy in the build environment). The endpoint returns HTTP 200 with valid councillor XML from a clean IP.

Metric Count
Councillors found TBC after Lambda run
With email address TBC
With photo TBC

Generated by Claude Code

North Yorkshire's edemocracy server fails from Lambda with a TCP
connection timeout during the TLS handshake — wreq's Firefox133
fingerprint is being dropped/reset before the server responds. Locally
with curl (system SSL) the endpoint returns 200 OK with valid XML.
Playwright (Chromium) uses a Chrome TLS fingerprint that passes the
WAF, bypassing the block.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants