Skip to content

Fix NEW (New Forest) scraper#372

Open
symroe wants to merge 1 commit into
masterfrom
fix/NEW-scraper
Open

Fix NEW (New Forest) scraper#372
symroe wants to merge 1 commit into
masterfrom
fix/NEW-scraper

Conversation

@symroe

@symroe symroe commented Jun 20, 2026

Copy link
Copy Markdown
Member

What broke

New Forest's ModGov endpoint (democracy.newforest.gov.uk) times out from Lambda with wreq's Firefox TLS fingerprint. The server's WAF drops the connection during the TLS handshake — the pattern is identical to recently-fixed scrapers (EPS, NYE, LBH, DAC) where the WAF selectively blocks wreq's Firefox133 client hello while accepting standard browser TLS fingerprints.

What was fixed

  • councillors.py: added http_lib = "playwright" — Playwright uses Chromium's standard Chrome TLS fingerprint, which passes the WAF. No verify_requests flag is needed since the server's certificate is valid.

Scrape results

Metric Count
Councillors found TBC after Lambda run
With email address TBC after Lambda run
With photo TBC after Lambda run

Cannot run locally due to DNS resolution restrictions in the build environment. Counts will be confirmed once the Lambda scraper runs with Playwright.


Generated by Claude Code

democracy.newforest.gov.uk times out from Lambda with wreq's Firefox
TLS fingerprint — the server's WAF drops the connection during the
TLS handshake. Playwright uses Chromium's standard Chrome TLS
fingerprint, which passes the WAF.
@symroe

symroe commented Jun 20, 2026

Copy link
Copy Markdown
Member Author

Re-scrape after e593c8c

Added http_lib = "playwright" to bypass the wreq timeout on democracy.newforest.gov.uk.

Cannot run locally (DNS resolution for external hosts blocked in build environment). The wreq timeout is consistent with WAF-level TLS fingerprint filtering — the same pattern seen on EPS, NYE, LBH, DAC, all of which were fixed with http_lib = "playwright". Playwright uses Chromium's Chrome TLS fingerprint rather than wreq's Firefox133 fingerprint.

Metric Count
Councillors found TBC after Lambda run
With email address TBC after Lambda run
With photo TBC after Lambda run

Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant