Skip to content

mrsudo404/leadhunter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LeadHunter banner

A free, open-source Google Maps + Yelp lead scraper.
Type categories and cities. Get a scored Excel of local businesses ranked by how badly they need a new site.

Python 3.10+ Playwright Flask MIT License Built by MRSUDO404

Quick start  ·  Features  ·  Scoring  ·  Architecture  ·  Roadmap  ·  Author


LeadHunter Google Maps lead scraper UI


What is this? A small tool I built because I was tired of paying $99/month for a leads service that returned the same five plumbers every time. Drives a headless Chromium around Google Maps and Yelp, pulls phone, email, website, socials, and a one-line "personal snippet" off each business site, then dumps everything into a scored Excel file.

No API keys. No sign-ups. No paid services.

Why LeadHunter

$0 / month

Apollo charges $99.
Snov.io charges $39.
This is free forever.

2 sources

Google Maps for the primary scrape, Yelp to catch businesses Maps missed. Merged by name.

Outreach-ready

Every lead comes with phone, email, a personalization snippet, and four social URLs.

Features

Smart lead scoring

Every business gets a 0–100 score based on whether they have a working website, SSL, reviews, ratings, and a real phone. Color-coded green/yellow/red in the Excel output.

Email extraction

Visits the business homepage, then falls back to /contact and /about. Filters out noreply, placeholders like user@domain.com, and image filenames.

Personalization snippets

Pulls the meta description or first real paragraph from each business site. Drop it straight into your cold-email opener.

Social URL discovery

Scrapes Facebook, Instagram, LinkedIn, and X/Twitter URLs from the site footer. Filters share-intent and namespace URLs.

Multi-source merge

Google Maps is the primary source. Yelp augments matched leads with profile URLs and surfaces businesses that Maps missed.

Website quality detection

Identifies broken sites, Wix landing pages, Linktree-as-website, SSL errors, missing HTTPS, and slow/thin sites — all separately scored.

Live progress streaming

Server-Sent Events stream every search step, every lead found, and every error to the UI in real time. Polling fallback if SSE breaks.

Polished Excel output

23 columns sorted by score, color-coded rows, frozen header, auto-filters, plus a Summary sheet with cross-platform overlap stats.

Quick start

# 1. Clone
git clone https://github.com/mrsudo404/leadhunter.git
cd leadhunter

# 2. Install
pip install -r requirements.txt
playwright install chromium

# 3. Launch
python app.py

Open http://localhost:5000 and you're in.

macOS users: port 5000 is taken by AirPlay Receiver. Either turn it off (System Settings → General → AirDrop & Handoff) or change port=5000 near the bottom of app.py.

First run: start small — 1 category × 1 city × 20 results — to verify everything works. About 3 minutes.

What you get

Column What's in it
Lead Score0–100, color-coded green / yellow / red
Business NameCleaned
PhoneFormatted (`+1 (512) 555-1234`)
EmailExtracted from website (homepage → /contact → /about)
WebsiteDirect URL
Personal SnippetOne-line summary for cold emails
Sourcesgoogle_maps, yelp, or both
Social linksFacebook, Instagram, LinkedIn, X/Twitter
Website StatusNO_WEBSITE / BROKEN_SITE / WEAK_SITE / HAS_SITE / …
Rating + ReviewsFrom Google Maps, augmented with Yelp counts when higher
Yelp & Maps URLsDirect profile links for manual deep-dives
Address & CategoryNormalized

Plus a Summary sheet with totals, hot/warm/cold counts, emails found, and how many businesses appeared on both Google Maps and Yelp vs. just one.

Lead scoring

Website signals Social proof
No website at all+50 50+ reviews+25
Broken site (HTTP 4xx/5xx)+45 20–49 reviews+20
Facebook / Yelp / Linktree as website+40 5–19 reviews+10
Site times out or won't connect+35 Rating 4.0++15
SSL broken+30 Rating 3.5–3.9+10
No HTTPS+25 Valid phone+10
Bare-bones site (<2KB)+25 Email found+5
Slow site (10+ seconds)+20 Site is fine+5
Hot
Your priority list.
Warm
Solid follow-ups.
Cold
Has a decent site already.

Architecture

flowchart LR
    UI[Browser UI<br/>vanilla JS]
    Flask[Flask Server<br/>app.py]
    Worker[Playwright Worker<br/>scraper_core.py]
    GMaps[Google Maps]
    Yelp[Yelp]
    Sites[Business Websites]
    Excel[Excel Export<br/>openpyxl]

    UI -->|POST /api/start| Flask
    Flask -->|background thread| Worker
    Worker -->|scrape listings| GMaps
    Worker -->|scrape + augment| Yelp
    Worker -->|fetch homepage| Sites
    Sites -.->|email, snippet, socials| Worker
    Worker -->|merge + score| Flask
    Flask -.->|SSE stream| UI
    Flask -->|on complete| Excel
    Excel -.->|download| UI

    classDef src fill:#8b5cf6,stroke:#fff,color:#fff
    classDef app fill:#ec4899,stroke:#fff,color:#fff
    classDef out fill:#f97316,stroke:#fff,color:#fff
    class GMaps,Yelp,Sites src
    class UI,Flask,Worker app
    class Excel out
Loading

Who this is for

  • Freelance web developers looking for small businesses with broken or missing websites
  • Digital agencies building targeted outreach lists for a specific niche and city
  • SEO consultants identifying businesses with weak Google presence
  • Cold-email tools that need a free upstream source of business contact data
  • Anyone who'd rather not pay $99 a month for the same lead-gen subscription everyone else uses

A few warnings

Warning What to do
Google will throttle aggressive scraping. Stick to 300–500 businesses per day per IP.
If you get blocked, wait 12–24 hours or switch IP. Or just lower results-per-search.
Don't open the same Excel file twice. Close Excel before re-running — saves fail otherwise.
Selectors break every few months. Fix is in extract_business_details inside scraper_core.py.

Project layout

leadhunter/
├── app.py              Flask server, routes, SSE streaming
├── scraper_core.py     Playwright scraper + website enrichment + Excel export
├── requirements.txt    Python dependencies
├── templates/
│   └── index.html      Single-file vanilla-JS UI
├── assets/
│   └── screenshot.png  README screenshot
├── outputs/            Excel files end up here (auto-created)
├── LICENSE             MIT
└── README.md           You're reading it
Want to watch it work?

By default Chromium runs headless so you don't see anything. To watch it click around (this is genuinely fun the first time):

  1. Open scraper_core.py
  2. Find the launch() call near line 360
  3. Change headless=True to headless=False

A browser window will pop up and you can watch Google Maps being navigated in real time.

Troubleshooting

"Address already in use" on port 5000

On macOS, port 5000 is taken by AirPlay Receiver. Either disable it in System Settings → General → AirDrop & Handoff, kill the process (lsof -i :5000, find the PID, kill <pid>), or change the port in app.py.

Browser opens then immediately closes

Either you set headless=False and your machine has no display, or Chromium didn't install properly:

python -m playwright install chromium
Form submits but nothing happens

Open browser DevTools (F12). Most likely Flask died and the fetch is timing out. Check the terminal you started it in.

Selectors returning empty data

Google changed their HTML. Fix is in extract_business_details inside scraper_core.py. Open a place URL in a real browser, F12, find the new selector.

Yelp listings missing

Yelp sometimes throws Cloudflare bot checks. If you see "Yelp skip: blocked" in the log, retry in a few minutes from a different IP. Google Maps results still work either way.

Roadmap

Stuff I want to add when I have time. PRs welcome.

  • Tech stack fingerprint — detect Wix / Squarespace / GoDaddy / old WordPress from HTML signatures
  • Domain age via WHOIS + "last updated" heuristic from homepage copyright year
  • Mobile responsiveness check — viewport meta + media queries
  • Page speed score — homepage TTFB and full-load timing
  • SQLite history — don't re-export businesses you've already exported
  • CRM-ready CSV exports — Apollo, HubSpot, Pipedrive field mappings
  • Bounding-box search — drop a Google Maps URL with viewport instead of city name
  • Scheduled scrapes — weekly cron, email new hot leads

Contributing

Found a bug or have an idea? Open an issue, or email contact@waqaskhan.com.pk directly. If you build something cool on top of it, let me know.

When sending a PR, keep changes focused — one feature or fix per PR. The selectors break every few months, so PRs that update them are always welcome.

Legal

Scraping public data from Google Maps and Yelp is in a gray zone. US courts have generally allowed public-data scraping (hiQ v. LinkedIn), but both Google's and Yelp's terms of service prohibit it. Practically, what happens if you push too hard is they block your IP for a while.

Don't resell raw scraped data. Don't run this at industrial scale. If you're doing this for a real business, switch to the Google Places API — they give you a $200/month free credit which covers a lot.

License

MIT. Do whatever you want with it. If you ship a product on top of this, a link back is appreciated but not required.




Built by MRSUDO404

Website Email GitHub



If LeadHunter saves you a few hours of cold-email research, the best thank-you is a star on the repo and a link back from your site.

Got a hiring lead, partnership idea, or just want to say hi? Drop me an email.


footer banner

About

A web-based Google Maps lead scraper. Enter categories and cities in a form, watch live progress in your browser, and download Excel reports with scored leads.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors