Skip to content

Fix ELN (East Lothian) scraper#361

Open
symroe wants to merge 1 commit into
masterfrom
fix/ELN-scraper
Open

Fix ELN (East Lothian) scraper#361
symroe wants to merge 1 commit into
masterfrom
fix/ELN-scraper

Conversation

@symroe

@symroe symroe commented Jun 18, 2026

Copy link
Copy Markdown
Member

What broke

East Lothian's website migrated from an older Drupal theme to LocalGov Drupal. The councillors list moved from /councillors/name (which now returns HTTP 301) to /council-and-democracy/councillors. The new page uses a completely different DOM: councillors are rendered as div.card elements inside a Drupal Views accordion (div.view-id-councillors), replacing the old ul.list--political / .list__item structure. The old container selector returned an empty list, causing list index out of range at get_list_container()[0].

What was fixed

  • metadata.json: updated base_url from .../councillors/name to .../council-and-democracy/councillors
  • councillors.py: new container selector .view-id-councillors, item selector .card; name, ward, and party are now extracted inline from each list-page card (.views-field-title a, .views-field-field-ward .field-content, .views-field-field-party .field-content); detail page fetched only for email (a[href^=mailto]); photo taken from .views-field-field-image img with data-src/src fallback

Scrape results

Verified locally against the live site (2026-06-18):

Metric Count
Councillors found 22
With email address 22
With photo 22

Generated by Claude Code

The council migrated from Drupal's /councillors/name to
/council-and-democracy/councillors as part of a CMS upgrade to
LocalGov Drupal. The old URL now returns HTTP 301, and the new page
uses a completely different DOM structure (Drupal Views with accordion
panes, div.card per councillor) — the old ul.list--political selector
returned an empty list, causing the list index out of range error.

Updated:
- metadata.json: base_url to /council-and-democracy/councillors
- councillors.py: container selector .view-id-councillors, item
  selector .card; extract name/ward/party from list-page card fields
  directly; fetch detail page only for email; photo from img in card
@symroe

symroe commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

Re-scrape after e07b5e2

New URL + selector fix applied. Scraped live against eastlothian.gov.uk on 2026-06-18.

Metric Count
Councillors found 22
With email address 22
With photo 22

All 22 councillors have email and photo. Photos are full-res PNGs from /sites/default/files/ (e.g. Councillor_Brooke_Ritchie.png, 600×800). Emails follow the initial+surname@eastlothian.gov.uk pattern.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants