Overview
CongressWatch v3 marks a fundamental shift in how we handle government data. To prepare for high-volume data (Individual Stock Trades, 20+ Votes per member, and Industry-level Donor mapping), we have moved away from a "monolithic" database structure to a Decoupled Detail Architecture.
Core Updates
- The Infrastructure Split (Performance & Scaling)
Previously, all member data was crammed into a single members.json file. This was fast for 100 members but would have crashed mobile devices once we hit 500+ members with deep histories.
• The Leaderboard (data/members.json): Now a "Lightweight" index. It contains only the fields required to render the home screen grid (Scores, Flags, Photos, and High-level Stats).
• The Vault (data/details/[BIOGUIDE_ID].json): Every member now has a dedicated JSON file. This "Vault" stores the heavy forensic evidence: itemized FEC totals, full voting history, and deep SEC EDGAR signal metadata.
• Impact: Initial site load time is reduced by ~60%, making the project "App-Ready" for mobile users.
- Corporate Insider Detection (SEC EDGAR v2.0)
We have deployed a hardened Name Normalization Engine to scan the SEC's EDGAR database for Form 4 filings.
• The Signal: We now track when a Member of Congress appears as a "Corporate Insider" (Officer, Director, or 10%+ Stakeholder) in public companies.
• High-Confidence Hits: Initial runs have identified significant corporate footprints for members including Richard McCormick (97 signals), Mike Kennedy (100 signals), and Julie Johnson (100 signals).
• Transparency: Every hit now includes an edgar_signal_type and a specific search variation log (e.g., "via 'M. Dexter'"), ensuring our data-gathering process is auditable.
- Keyless Legislative Tracking (GovTrack Integration)
Following the retirement of the ProPublica Congress API, we have successfully pivoted to a GovTrack.us pipeline.
• Real-Time Votes: We now pull the 20 most recent votes for every member directly into their individual "Vault" files.
• Zero-Key Architecture: This pipeline is 100% open-source and requires no API keys, ensuring the project remains resilient and easy for new contributors to fork and run.
- Anomaly Score 2.0 Logic
The Anomaly Score has been recalibrated to weight corporate involvement and fundraising extremes more heavily:
• Wealth Gap (25%): Net worth trajectory vs. Congressional salary.
• Insider Signals (25%): SEC EDGAR corporate footprints.
• Donor Alignment (20%): PAC vs. Individual contribution ratios.
• Attendance (5%): Missed vote percentages.
Technical Roadmap: What’s Next?
- House Bulk XML Integration: Moving from "Corporate Insider" signals to "Personal Stock Trades" using the House Clerk’s bulk disclosure data.
- Employer-to-Industry Mapping: Transforming raw FEC donor names into "Industry Profiles" (e.g., Pharma, Defense, Tech).
- Bill Similarity Engine (NLP): Comparing bill text against lobbying templates to detect "Ghostwritten" legislation.
How to Contribute
We are looking for developers and data scientists to help refine our CIK (Central Index Key) matching to further reduce false positives in common-name SEC searches.
"Sunlight is the best disinfectant."
— The CongressWatch Team(of 1)
Overview
CongressWatch v3 marks a fundamental shift in how we handle government data. To prepare for high-volume data (Individual Stock Trades, 20+ Votes per member, and Industry-level Donor mapping), we have moved away from a "monolithic" database structure to a Decoupled Detail Architecture.
Core Updates
Previously, all member data was crammed into a single members.json file. This was fast for 100 members but would have crashed mobile devices once we hit 500+ members with deep histories.
• The Leaderboard (data/members.json): Now a "Lightweight" index. It contains only the fields required to render the home screen grid (Scores, Flags, Photos, and High-level Stats).
• The Vault (data/details/[BIOGUIDE_ID].json): Every member now has a dedicated JSON file. This "Vault" stores the heavy forensic evidence: itemized FEC totals, full voting history, and deep SEC EDGAR signal metadata.
• Impact: Initial site load time is reduced by ~60%, making the project "App-Ready" for mobile users.
We have deployed a hardened Name Normalization Engine to scan the SEC's EDGAR database for Form 4 filings.
• The Signal: We now track when a Member of Congress appears as a "Corporate Insider" (Officer, Director, or 10%+ Stakeholder) in public companies.
• High-Confidence Hits: Initial runs have identified significant corporate footprints for members including Richard McCormick (97 signals), Mike Kennedy (100 signals), and Julie Johnson (100 signals).
• Transparency: Every hit now includes an edgar_signal_type and a specific search variation log (e.g., "via 'M. Dexter'"), ensuring our data-gathering process is auditable.
Following the retirement of the ProPublica Congress API, we have successfully pivoted to a GovTrack.us pipeline.
• Real-Time Votes: We now pull the 20 most recent votes for every member directly into their individual "Vault" files.
• Zero-Key Architecture: This pipeline is 100% open-source and requires no API keys, ensuring the project remains resilient and easy for new contributors to fork and run.
The Anomaly Score has been recalibrated to weight corporate involvement and fundraising extremes more heavily:
• Wealth Gap (25%): Net worth trajectory vs. Congressional salary.
• Insider Signals (25%): SEC EDGAR corporate footprints.
• Donor Alignment (20%): PAC vs. Individual contribution ratios.
• Attendance (5%): Missed vote percentages.
Technical Roadmap: What’s Next?
How to Contribute
We are looking for developers and data scientists to help refine our CIK (Central Index Key) matching to further reduce false positives in common-name SEC searches.
"Sunlight is the best disinfectant."
— The CongressWatch Team(of 1)