Building a directory of local representatives #4
Replies: 1 comment
-
|
Test comment on a blog post. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
slug: building-a-directory-of-local-representatives
Building a directory of local representatives
The Problem
Local government contact information for mayors, council members, and the like are scattered across thousands of municipal websites in inconsistent formats. It's not programmatically accessible to civic app developers and journalists, unless they do the upfront work of scraping everything themselves.
Where directories do exist, the data is frequently paywalled, stale, or scoped only to the largest cities.
There are a couple of challenges to solving this problem:
In 2026, a lot of this can be solved with a carefully worded prompt to an LLM API and a Playwright browser. Using this method, CivicPatch makes the scrape once per jurisdiction a couple of times a year (to keep the data up to date) and publishes the results to a public repository. This has a few benefits, assuming downstream consumers are able to use CivicPatch-derived data:
Users this problem affects
Local Representatives Data and Data Standards: Some References
Civic tech has been working on this problem for over a decade — standards remain fragmented, and sustaining open, free access to representative data has been an ongoing challenge.
Related discussions: openstates/jurisdictions#54 — Develop Shared Standards for Demonstration/Review
Project
CivicPatch uses a multi-step pipeline to collect official records for each jurisdiction.
Google Gemini 2.5 Flash is used on the first scrape of a jurisdiction to research who the expected elected officials are — establishing a ground truth to compare against the extracted results. This reduces the review burden on community maintainers by surfacing discrepancies automatically, rather than requiring them to verify from scratch.
DeepSeek-V3 is used as a structured extraction model: given rendered page content, it produces typed JSON records for each official found, including name, role, geographic designation (ward, district, at-large), email, phone, and profile URL.
Results are submitted as pull requests to a public repository. A web-based review interface allows community maintainers to inspect, approve, or correct results before merging. A rule-based review step automatically flags discrepancies: missing officials the research step expected to find, unexpected extras, or role mismatches.
Success Criteria
Speed
Coverage
Accuracy
Pipeline output is reviewed by community maintainers before publishing, so these thresholds represent the bar for extraction quality that makes review faster than manual entry — not a requirement for perfect output.
Guardrails
CivicPatch collects information that municipal governments have published on their public websites, consistent with their role as public officials. The data collected (names, roles, and contact information) is the same information local governments are expected to publish for constituent access. Term start and end dates are collected where available but are a known limitation — accuracy is not currently guaranteed.
A human review step is built into the pipeline before data is published. Community maintainers verify AI-extracted results before they are merged into the open-data repository.
The evaluation framework is run on every prompt change to catch regressions in extraction quality.
Cost limits per pipeline run prevent runaway LLM spend.
What we're not doing
The aim of this project is to automate the discovery and extraction of publicly published official contact information.
The project does not:
If the pipeline reaches sufficient quality and coverage across US municipalities, future work will expand to all municipalities enumerated in the CivicPatch jurisdictions repository (or the OpenStates' jurisdictions repo).
Beta Was this translation helpful? Give feedback.
All reactions