Skip to content

Fix LBH (Lambeth) scraper#363

Open
symroe wants to merge 1 commit into
masterfrom
fix/LBH-scraper
Open

Fix LBH (Lambeth) scraper#363
symroe wants to merge 1 commit into
masterfrom
fix/LBH-scraper

Conversation

@symroe

@symroe symroe commented Jun 18, 2026

Copy link
Copy Markdown
Member

What broke

Lambeth's ModernGov endpoint (moderngov.lambeth.gov.uk) responds in ~5 seconds for standard clients (curl, system SSL — HTTP 200 with valid XML) but wreq's Firefox133 TLS fingerprint times out after 30 seconds from Lambda. The server completes the TLS handshake for standard fingerprints but stalls on wreq's specific client hello, consistent with WAF-level TLS fingerprint filtering.

What was fixed

  • Added http_lib = "playwright" — Playwright uses Chromium's standard Chrome TLS fingerprint, bypassing the WAF that blocks wreq's Firefox133 fingerprint. The server cert is valid and trusted by Chromium, so no additional cert flags are needed.

Scrape results

Cannot run locally (TLS inspection proxy in the build environment). The endpoint returns HTTP 200 with valid councillor XML from a clean IP.

Metric Count
Councillors found TBC after Lambda run
With email address TBC
With photo TBC

Generated by Claude Code

Lambeth's ModernGov server (moderngov.lambeth.gov.uk) responds in ~5s
for standard clients (curl) but wreq's Firefox133 TLS fingerprint times
out from Lambda. Playwright (Chromium) uses a standard Chrome TLS
fingerprint that passes the server's WAF, bypassing the block.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants