GitHub - mrsudo404/leadhunter: A web-based Google Maps lead scraper. Enter categories and cities in a form, watch live progress in your browser, and download Excel reports with scored leads.

A free, open-source Google Maps + Yelp lead scraper.
Type categories and cities. Get a scored Excel of local businesses ranked by how badly they need a new site.

Quick start · Features · Scoring · Architecture · Roadmap · Author

What is this? A small tool I built because I was tired of paying $99/month for a leads service that returned the same five plumbers every time. Drives a headless Chromium around Google Maps and Yelp, pulls phone, email, website, socials, and a one-line "personal snippet" off each business site, then dumps everything into a scored Excel file.

No API keys. No sign-ups. No paid services.

Why LeadHunter

$0 / month

Apollo charges $99.
Snov.io charges $39.
This is free forever.

2 sources

Google Maps for the primary scrape, Yelp to catch businesses Maps missed. Merged by name.

Outreach-ready

Every lead comes with phone, email, a personalization snippet, and four social URLs.

Features

Smart lead scoring Every business gets a 0–100 score based on whether they have a working website, SSL, reviews, ratings, and a real phone. Color-coded green/yellow/red in the Excel output.	Email extraction Visits the business homepage, then falls back to `/contact` and `/about`. Filters out `noreply`, placeholders like `user@domain.com`, and image filenames.
Personalization snippets Pulls the meta description or first real paragraph from each business site. Drop it straight into your cold-email opener.	Social URL discovery Scrapes Facebook, Instagram, LinkedIn, and X/Twitter URLs from the site footer. Filters share-intent and namespace URLs.
Multi-source merge Google Maps is the primary source. Yelp augments matched leads with profile URLs and surfaces businesses that Maps missed.	Website quality detection Identifies broken sites, Wix landing pages, Linktree-as-website, SSL errors, missing HTTPS, and slow/thin sites — all separately scored.
Live progress streaming Server-Sent Events stream every search step, every lead found, and every error to the UI in real time. Polling fallback if SSE breaks.	Polished Excel output 23 columns sorted by score, color-coded rows, frozen header, auto-filters, plus a Summary sheet with cross-platform overlap stats.

Quick start

# 1. Clone
git clone https://github.com/mrsudo404/leadhunter.git
cd leadhunter

# 2. Install
pip install -r requirements.txt
playwright install chromium

# 3. Launch
python app.py

Open http://localhost:5000 and you're in.

macOS users: port 5000 is taken by AirPlay Receiver. Either turn it off (System Settings → General → AirDrop & Handoff) or change port=5000 near the bottom of app.py.

First run: start small — 1 category × 1 city × 20 results — to verify everything works. About 3 minutes.

What you get

Column	What's in it
Lead Score	0–100, color-coded green / yellow / red
Business Name	Cleaned
Phone	Formatted (`+1 (512) 555-1234`)
Email	Extracted from website (homepage → /contact → /about)
Website	Direct URL
Personal Snippet	One-line summary for cold emails
Sources	`google_maps`, `yelp`, or both
Social links	Facebook, Instagram, LinkedIn, X/Twitter
Website Status	`NO_WEBSITE` / `BROKEN_SITE` / `WEAK_SITE` / `HAS_SITE` / …
Rating + Reviews	From Google Maps, augmented with Yelp counts when higher
Yelp & Maps URLs	Direct profile links for manual deep-dives
Address & Category	Normalized

Plus a Summary sheet with totals, hot/warm/cold counts, emails found, and how many businesses appeared on both Google Maps and Yelp vs. just one.

Lead scoring

Website signals		Social proof
No website at all	+50	50+ reviews	+25
Broken site (HTTP 4xx/5xx)	+45	20–49 reviews	+20
Facebook / Yelp / Linktree as website	+40	5–19 reviews	+10
Site times out or won't connect	+35	Rating 4.0+	+15
SSL broken	+30	Rating 3.5–3.9	+10
No HTTPS	+25	Valid phone	+10
Bare-bones site (<2KB)	+25	Email found	+5
Slow site (10+ seconds)	+20	Site is fine	+5

Your priority list.

Solid follow-ups.

Has a decent site already.

Architecture

flowchart LR
    UI[Browser UI<br/>vanilla JS]
    Flask[Flask Server<br/>app.py]
    Worker[Playwright Worker<br/>scraper_core.py]
    GMaps[Google Maps]
    Yelp[Yelp]
    Sites[Business Websites]
    Excel[Excel Export<br/>openpyxl]

    UI -->|POST /api/start| Flask
    Flask -->|background thread| Worker
    Worker -->|scrape listings| GMaps
    Worker -->|scrape + augment| Yelp
    Worker -->|fetch homepage| Sites
    Sites -.->|email, snippet, socials| Worker
    Worker -->|merge + score| Flask
    Flask -.->|SSE stream| UI
    Flask -->|on complete| Excel
    Excel -.->|download| UI

    classDef src fill:#8b5cf6,stroke:#fff,color:#fff
    classDef app fill:#ec4899,stroke:#fff,color:#fff
    classDef out fill:#f97316,stroke:#fff,color:#fff
    class GMaps,Yelp,Sites src
    class UI,Flask,Worker app
    class Excel out

Who this is for

Freelance web developers looking for small businesses with broken or missing websites
Digital agencies building targeted outreach lists for a specific niche and city
SEO consultants identifying businesses with weak Google presence
Cold-email tools that need a free upstream source of business contact data
Anyone who'd rather not pay $99 a month for the same lead-gen subscription everyone else uses

A few warnings

Warning	What to do
Google will throttle aggressive scraping.	Stick to 300–500 businesses per day per IP.
If you get blocked, wait 12–24 hours or switch IP.	Or just lower results-per-search.
Don't open the same Excel file twice.	Close Excel before re-running — saves fail otherwise.
Selectors break every few months.	Fix is in `extract_business_details` inside scraper_core.py.

Project layout

leadhunter/
├── app.py              Flask server, routes, SSE streaming
├── scraper_core.py     Playwright scraper + website enrichment + Excel export
├── requirements.txt    Python dependencies
├── templates/
│   └── index.html      Single-file vanilla-JS UI
├── assets/
│   └── screenshot.png  README screenshot
├── outputs/            Excel files end up here (auto-created)
├── LICENSE             MIT
└── README.md           You're reading it

Want to watch it work?

By default Chromium runs headless so you don't see anything. To watch it click around (this is genuinely fun the first time):

Open scraper_core.py
Find the launch() call near line 360
Change headless=True to headless=False

A browser window will pop up and you can watch Google Maps being navigated in real time.

Troubleshooting

"Address already in use" on port 5000

On macOS, port 5000 is taken by AirPlay Receiver. Either disable it in System Settings → General → AirDrop & Handoff, kill the process (lsof -i :5000, find the PID, kill <pid>), or change the port in app.py.

Browser opens then immediately closes

Either you set headless=False and your machine has no display, or Chromium didn't install properly:

python -m playwright install chromium

Form submits but nothing happens

Open browser DevTools (F12). Most likely Flask died and the fetch is timing out. Check the terminal you started it in.

Selectors returning empty data

Google changed their HTML. Fix is in extract_business_details inside scraper_core.py. Open a place URL in a real browser, F12, find the new selector.

Yelp listings missing

Yelp sometimes throws Cloudflare bot checks. If you see "Yelp skip: blocked" in the log, retry in a few minutes from a different IP. Google Maps results still work either way.

Roadmap

Stuff I want to add when I have time. PRs welcome.

Tech stack fingerprint — detect Wix / Squarespace / GoDaddy / old WordPress from HTML signatures
Domain age via WHOIS + "last updated" heuristic from homepage copyright year
Mobile responsiveness check — viewport meta + media queries
Page speed score — homepage TTFB and full-load timing
SQLite history — don't re-export businesses you've already exported
CRM-ready CSV exports — Apollo, HubSpot, Pipedrive field mappings
Bounding-box search — drop a Google Maps URL with viewport instead of city name
Scheduled scrapes — weekly cron, email new hot leads

Contributing

Found a bug or have an idea? Open an issue, or email contact@waqaskhan.com.pk directly. If you build something cool on top of it, let me know.

When sending a PR, keep changes focused — one feature or fix per PR. The selectors break every few months, so PRs that update them are always welcome.

Legal

Scraping public data from Google Maps and Yelp is in a gray zone. US courts have generally allowed public-data scraping (hiQ v. LinkedIn), but both Google's and Yelp's terms of service prohibit it. Practically, what happens if you push too hard is they block your IP for a while.

Don't resell raw scraped data. Don't run this at industrial scale. If you're doing this for a real business, switch to the Google Places API — they give you a $200/month free credit which covers a lot.

License

MIT. Do whatever you want with it. If you ship a product on top of this, a link back is appreciated but not required.

Built by MRSUDO404

If LeadHunter saves you a few hours of cold-email research, the best thank-you is a star on the repo and a link back from your site.

Got a hiring lead, partnership idea, or just want to say hi? Drop me an email.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why LeadHunter

$0 / month

2 sources

Outreach-ready

Features

Smart lead scoring

Email extraction

Personalization snippets

Social URL discovery

Multi-source merge

Website quality detection

Live progress streaming

Polished Excel output

Quick start

What you get

Lead scoring

Architecture

Who this is for

A few warnings

Project layout

Troubleshooting

Roadmap

Contributing

Legal

License

Built by MRSUDO404

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
assets		assets
templates		templates
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
scraper_core.py		scraper_core.py

Folders and files

Latest commit

History

Repository files navigation

Why LeadHunter

$0 / month

2 sources

Outreach-ready

Features

Smart lead scoring

Email extraction

Personalization snippets

Social URL discovery

Multi-source merge

Website quality detection

Live progress streaming

Polished Excel output

Quick start

What you get

Lead scoring

Architecture

Who this is for

A few warnings

Project layout

Troubleshooting

Roadmap

Contributing

Legal

License

Built by MRSUDO404

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages