Skip to content

nvssim950/redscrape

Repository files navigation

RedScrape

A clean, professional desktop app to scrape Reddit subreddits by flair.
Built with PySide6 and Playwright.

Latest release License: MIT Python 3.10+ Platforms


Features

  • Clean, modern desktop UI built with PySide6 — native-looking on Windows, macOS, and Linux
  • Filter by flair — pull only posts tagged with a specific flair (case-sensitive, exact match)
  • Fully dynamic — change subreddit, flair, sort, time range, limit, and fields on every run
  • Rich field extraction — title, URL, author, created date, score, comments, flair, post body, and auto-extracted GitHub links
  • Export to JSON or CSV with a native save dialog
  • Live progress + log — watch posts stream into the results table as they're scraped
  • Cancellable — hit Stop any time; partial results are preserved
  • Standalone binaries — download and run, no Python required for end users

Screenshots

Add screenshots to docs/screenshots/ after your first run.

Install

Option 1 — Download the release (recommended for users)

  1. Go to Releases
  2. Download the package for your OS:
    • RedScrape-Linux.tar.gz
    • RedScrape-Windows.zip
    • RedScrape-macOS.tar.gz
  3. Extract the archive
  4. Run the RedScrape executable inside
  5. On first run you may need to install Chromium (a one-time step):
    python -m playwright install chromium

Option 2 — Run from source (recommended for developers)

git clone https://github.com/nvssim950/redscrape.git
cd redscrape

python -m venv .venv
source .venv/bin/activate            # Windows: .venv\Scripts\activate

pip install -e .
python -m playwright install chromium

redscrape                            # or:  python -m redscrape

Usage

  1. Launch RedScrape

  2. Fill in the form on the left:

    Field Description
    Subreddit Subreddit name without r/ (e.g. n8n, python)
    Flair filter Exact flair text — e.g. Workflow - Github Included. Leave empty for no filter
    Number of posts 1–1000
    Sort by new, hot, top, or rising
    Time range Only active for top: hour / day / week / month / year / all
    Fields Check every field you want in the output
    Headless Hide the browser window (default on)
  3. Click ▶ Start

  4. Watch progress fill the progress bar and results appear in the Results table. Detailed activity appears in the Log tab.

  5. Click Export JSON or Export CSV to save your results.

Example run for your first scrape:

Subreddit:       n8n
Flair filter:    Workflow - Github Included
Number of posts: 100
Sort by:         new
Fields:          Title, URL, Permalink, Author, Created, Score, Comments, Flair, GitHub links

Fields you can extract

Field Description
Title Post title
URL Outbound link (for link posts) or Reddit post URL
Permalink Canonical Reddit URL for the post
Author Username of the poster
Created (UTC) ISO-8601 timestamp
Score Net upvotes
Comments Number of comments
Flair Flair text (when visible on the listing)
Post body Body text for self-posts — requires opening each post, slower
GitHub links github.com URLs extracted from the post body and title link — requires opening each post, slower

See docs/usage.md for the complete user guide.

How it works

RedScrape drives a real Chromium browser via Playwright against old.reddit.com (stable HTML, scrape-friendly). Scraping runs on a background QThread so the UI stays fully responsive; results, progress, and log lines stream into the main window through Qt signals.

┌──────────────────────────┐     Qt signals     ┌──────────────────────────┐
│  MainWindow  (app.py)    │  ◀──────────────▶  │  ScrapeWorker (worker)   │
│  PySide6 UI              │                    │  QThread host            │
└──────────────────────────┘                    └─────────────┬────────────┘
                                                              │ asyncio.run
                                                 ┌────────────▼────────────┐
                                                 │  RedditScraper          │
                                                 │  (Playwright + parsers) │
                                                 └─────────────────────────┘

Development

git clone https://github.com/nvssim950/redscrape.git
cd redscrape
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
python -m playwright install chromium

Run the app:

python -m redscrape

Run tests:

pytest

Build a standalone executable locally:

./build.sh          # Linux/macOS
# or:  pyinstaller --clean --noconfirm redscrape.spec

For architecture notes, see docs/development.md.

Distribution

Pushing a tag like v0.1.0 to GitHub triggers the build workflow which:

  1. Builds standalone executables for Windows, macOS, and Linux
  2. Packages each as a .zip or .tar.gz
  3. Creates a GitHub Release with all three artifacts attached

Users can then download the binary for their OS from the Releases page — no Python install needed.

Contributing

Pull requests welcome. See CONTRIBUTING.md for local setup, project layout, and coding guidelines.

Responsible use

RedScrape is intended for personal research, learning, and lightweight data collection. Please:

  • Respect Reddit's User Agreement
  • Keep the built-in per-page delay (1 second) in place
  • For production or heavy use, switch to the official Reddit API via PRAW — it's more reliable and rate-limit aware

License

MIT © 2026 nvssim950

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors