Skip to content

Fix EPS (Epsom and Ewell) scraper#365

Open
symroe wants to merge 1 commit into
masterfrom
fix/EPS-scraper
Open

Fix EPS (Epsom and Ewell) scraper#365
symroe wants to merge 1 commit into
masterfrom
fix/EPS-scraper

Conversation

@symroe

@symroe symroe commented Jun 19, 2026

Copy link
Copy Markdown
Member

What broke

Epsom and Ewell's ModernGov endpoint (democracy.epsom-ewell.gov.uk) serves valid councillor XML to standard HTTP clients (curl returns HTTP 200 with full XML in ~2.4s) but wreq's Firefox133 TLS fingerprint times out from Lambda — the server's WAF drops the connection during the TLS handshake before sending a response, consistent with the same pattern seen on NWM, NYE, LBH, and DAC.

What was fixed

  • Added http_lib = "playwright" — Playwright uses Chromium's standard Chrome TLS fingerprint, which passes the WAF. No verify_requests flag is needed since the server's cert is valid.

Scrape results

Raw XML confirmed 35 councillors with photo URLs. Emails are on individual councillor detail pages and will be fetched during the full scrape.

Metric Count
Councillors found 35 (confirmed from XML)
With email address TBC after Lambda run
With photo 35 (confirmed from XML)

Generated by Claude Code

…nt block

democracy.epsom-ewell.gov.uk returns valid ModGov XML to standard HTTP clients
(curl, Chrome) but wreq's Firefox133 TLS fingerprint times out from Lambda —
the server's WAF drops the connection during TLS handshake before returning
a response. Playwright uses Chromium's standard Chrome TLS fingerprint, which
passes the WAF. The endpoint serves valid councillor XML once the TLS block
is bypassed.
@symroe

symroe commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

Re-scrape after 58e7606

Initial fix: added http_lib = "playwright" to bypass WAF TLS fingerprint block.

Raw XML from democracy.epsom-ewell.gov.uk/mgWebService.asmx/GetCouncillorsByWard (fetched directly) confirms 35 councillors with photo URLs. Emails are fetched from individual councillor pages during the full scrape and will be confirmed after Lambda run.

Metric Count
Councillors found 35 (confirmed from XML)
With email address TBC after Lambda run
With photo 35 (confirmed from XML)

Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant