LemonSqueeze

A web app for scraping Reddit posts and comments, built to make collecting data for research easier — no API keys, no authentication headaches.

What it does

Scrapes posts and full comment threads from any public subreddit
Supports multiple sort modes (new, top, hot) in a single run
Deduplicates posts across sort modes so you don't get repeats
Optional keyword analysis — define your own categories and score posts by relevance
Exports to CSV (opens right in Excel/Google Sheets) and JSON (for Python/R scripts)
Everything runs in the browser, nothing gets stored on a server

How it works

The frontend is a static site hosted on Netlify. When you hit "Squeeze," it sends requests to a serverless function that pulls data from PullPush.io (a public Reddit data mirror). Posts come back to the browser where they get processed, analyzed (if you turned on keywords), and packaged into downloadable files.

No Reddit API credentials needed. No account setup. Just enter a subreddit and go.

Running locally

You need Node.js and the Netlify CLI:

npm install -g netlify-cli
netlify dev

This starts a local dev server at http://localhost:8888 with the serverless functions wired up.

Deploying

The app deploys to Netlify. Push to main and it picks up changes automatically (or run netlify deploy --prod manually).

The config lives in netlify.toml — it publishes the web/ folder and bundles the functions from netlify/functions/.

Project structure

web/
  index.html    — the UI
  app.js        — scraping logic, keyword analysis, CSV/JSON export
  style.css     — styling
  favicon.svg   — lemon icon

netlify/
  functions/
    scrape.mjs  — serverless function that talks to PullPush.io

Keyword analysis (optional)

If you turn on keyword analysis in the UI, you can define categories with lists of keywords. Each post and comment gets scored by how many keywords it matches. The scores and matched categories show up as extra columns in the CSV export.

You can customize the categories to whatever you're researching — the defaults are just examples.

Limits

PullPush.io caps requests at 100 results per call and has rate limits (~15 requests/min)
Very large scrapes (thousands of posts with comments) will take a while since each post's comments need a separate request
PullPush mirrors Reddit data with some delay, so the very latest posts might not show up immediately

Built by

Jonas Heller — Assistant Professor of Marketing, Maastricht University.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.github/workflows		.github/workflows
data		data
netlify/functions		netlify/functions
web		web
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analyzer.py		analyzer.py
config.py		config.py
export.py		export.py
main.py		main.py
netlify.toml		netlify.toml
package.json		package.json
requirements.txt		requirements.txt
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LemonSqueeze

What it does

How it works

Running locally

Deploying

Project structure

Keyword analysis (optional)

Limits

Built by

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LemonSqueeze

What it does

How it works

Running locally

Deploying

Project structure

Keyword analysis (optional)

Limits

Built by

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages