Fix EAT (Eastleigh) scraper#371
Open
symroe wants to merge 1 commit into
Open
Conversation
meetings.eastleigh.gov.uk returns HTTP 403 to wreq's Firefox TLS fingerprint from Lambda. Switching to playwright uses Chromium's Chrome TLS fingerprint and executes any JS challenge, bypassing the WAF filter.
Member
Author
Re-scrape after f2bb011Added Cannot run locally (DNS resolution for external hosts blocked in build environment). The endpoint returns 403 to both wreq and standard web clients from this IP range, consistent with a WAF fingerprint or IP filter that Playwright's Chromium TLS fingerprint and JS execution capability should bypass (same pattern as recent EPS, NYE, LBH fixes).
Generated by Claude Code |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What broke
Eastleigh's ModGov endpoint (
meetings.eastleigh.gov.uk) returns HTTP 403 Forbidden to wreq's Firefox TLS fingerprint from Lambda. The server's WAF or firewall is rejecting the connection at the HTTP layer (not a certificate issue — the TLS handshake completes but the response is 403). The endpoint is live but blocking Lambda's specific HTTP client.What was fixed
councillors.py: addedhttp_lib = "playwright"— Playwright uses Chromium's standard Chrome TLS fingerprint and executes any JS challenge, bypassing the WAF filter that blocks wreq's Firefox fingerprint.Scrape results
Cannot run locally due to DNS resolution restrictions in the build environment. Counts will be confirmed once the Lambda scraper runs with Playwright.
Generated by Claude Code