Skip to content

Fix DAC (Dacorum) scraper#362

Open
symroe wants to merge 1 commit into
masterfrom
fix/DAC-scraper
Open

Fix DAC (Dacorum) scraper#362
symroe wants to merge 1 commit into
masterfrom
fix/DAC-scraper

Conversation

@symroe

@symroe symroe commented Jun 18, 2026

Copy link
Copy Markdown
Member

What broke

Dacorum's ModernGov endpoint (democracy.dacorum.gov.uk) returns valid XML from standard HTTP clients (curl, system SSL) but wreq's Firefox133 TLS fingerprint times out from Lambda — the server's WAF drops the connection during the TLS handshake before a response is sent. The previous verify_requests = False fix (PR #346) bypassed cert validation but did not address the WAF block on wreq's specific TLS client hello.

What was fixed

  • Replaced verify_requests = False with http_lib = "playwright" — Playwright uses Chromium's standard Chrome TLS fingerprint, which passes the WAF. The cert itself is valid (not a cert issue), so verify_requests is not needed alongside playwright.

Scrape results

Cannot run locally (TLS inspection proxy in the build environment interferes with wreq and Playwright). The endpoint returns HTTP 200 with valid ModGov XML from a clean IP, confirming data is available once the WAF block is bypassed.

Metric Count
Councillors found TBC after Lambda run
With email address TBC
With photo TBC

Generated by Claude Code

Dacorum's democracy server returns valid data for standard HTTP clients
(curl 200 OK) but wreq's Firefox TLS fingerprint times out from Lambda,
suggesting the server's WAF blocks that specific TLS client hello.
Playwright (Chromium) uses a standard Chrome fingerprint that passes
through the WAF.

The previous verify_requests = False fix was not sufficient because the
timeout happens at the TCP/TLS layer before cert validation, so removing
the cert-related flag changes nothing. Switching to playwright sidesteps
the wreq fingerprint block entirely.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants