Skip to content

dave-73/InstaScrape

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 InstaScrape — Async Instagram Comment Scraper


❓ Built with a steel heart, unasked for, yet unable to turn away from the world it watches.
❓ Assembled from iron and thought, never meant to be this cold, yet it endures.
❓ Created with a reluctant steel heart, seeing life it cannot touch.
— Author: kaifcodec

Python License GitHub stars

Scrape all parent comments from any Instagram Reel with automated login, async speed, real-time progress, and clean exports — no manual cookie copying required.


✨ Features

  • Automated Login: cookie.json persistence with iat + expiry, no manual cookies needed.
  • 🔄 Self-healing Auth: detects expired cookies mid-run, prompts relogin, resumes automatically.
  • Async Engine: powered by httpx.AsyncClient with requests-per-second throttling.
  • 📊 Progress Tracking: accurate percent and ETA from Instagram’s comment count.
  • 📁 Dual Exports: TXT and JSON files saved in timestamped folders.

📦 Requirements

  • Python 3.9+
  • Docker + Docker Compose, optional
  • Dependencies:
pip install -r requirements.txt

🛠️ Installation

git clone https://github.com/kaifcodec/InstaScrape
cd InstaScrape
pip install -r requirements.txt

▶️ Usage

python3 main.py
  • Enter the Instagram Reel URL (e.g., https://www.instagram.com/reel/SHORTCODE/).
  • Set Max requests per second (5-7 recommended). Adjust for stability.
  • On first run, provide username/password and a 2FA code if prompted; cookie.json is created and reused until expiry.

🐳 Docker Usage

Run the scraper without installing Python dependencies on your host:

docker compose run --rm instascrape

The container runs as a non-root user and stores runtime files in ./data:

  • ./data/cookie.json
  • ./data/download_comments/txt/...
  • ./data/download_comments/json/...

Rebuild after dependency changes:

docker compose build

If automated login cannot complete, the CLI can import an authenticated Instagram Cookie header from a logged-in browser request. The header must include sessionid, csrftoken, mid, and ds_user_id; it is then stored in ./data/cookie.json.

📁 Output

  • TXT: download_comments/txt/reel_comments_YYYYMMDD_HHMMSS.txt
  • JSON: download_comments/json/reel_comments_YYYYMMDD_HHMMSS.json Example JSON structure:
{
  "generated_at": 1700000000,
  "count": 123,
  "comments": [
    { "username": "user1", "text": "Nice!", "created_at": 1699999000 }
  ]
}

🔧 How it Works

  • Login: uses instagrapi for current Instagram login and 2FA support, with a cookie-header fallback when automated login is blocked.
  • Cookie Lifecycle: cookie.json stores iat and expiry; validated on startup & during requests.
  • Error Resilience: retries transient errors and refreshes cookies on 401/redirect-to-login.
  • Progress Accuracy: uses Instagram’s comment count to calculate percent & ETA.
  • Async Efficiency: httpx.AsyncClient with HTTP/2, keep-alive, and RPS limiter.

💡 Tips

  • Start with 5-7 RPS to minimize throttling; increase gradually.
  • Filenames use local time; switch to UTC by replacing datetime.now() with datetime.utcnow() in main.py.
  • If login fails with challenge_required, open Instagram in the official app or browser, approve the login challenge, then retry.

⚠️ Disclaimer

Use responsibly. Comply with Instagram’s Terms of Service. Intended for personal or permitted use only.

About

InstaScrape is a command-line Python tool that fetches all parent comments from any public Instagram Reel using your session cookies. It's fast, efficient, and now comes with a progress bar so you can see the scraping in action. Designed for researchers, analysts, or curious minds.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 98.3%
  • Dockerfile 1.7%