Skip to content

DarkWebInformer/telegram-scraper

 
 

Repository files navigation

TGS: Telegram Scraper, Forwarder & Viewer

A Python tool for scraping Telegram channels into local SQLite databases, with built-in real-time message forwarding between channels and a self-contained offline HTML viewer for browsing the results.

___________________  _________
\__    ___/  _____/ /   _____/
  |    | /   \  ___ \_____  \
  |    | \    \_\  \/        \
  |____|  \______  /_______  /
                 \/        \/

Features

Scraping

  • Scrape any channel you're a member of into a per-channel SQLite database (<channel_id>/<channel_id>.db)
  • Captures message text, sender info, dates, replies, views, forwards, reactions, and post authors
  • Optional media download (photos, videos, documents) with a toggle
  • Continuous (live) scraping mode that keeps the DB up to date as new messages arrive
  • Concurrent media downloads with configurable batch size and automatic FloodWait handling
  • Rescrape and "fix missing media" tools to repair partial downloads

Forwarding

  • Real-time forwarding rules from a source channel to a destination channel
  • Filter by content type (text, images, videos, documents)
  • Two modes: copy (sends as a new message, no "Forwarded from" header) or forward (preserves the header)
  • Toggle rules on/off, edit, or delete without restarting
  • Backfill historical messages over a chosen date range with adjustable delay
  • Combined mode runs scraping and forwarding together in a single session

Search

  • Query scraped databases by keyword/regex, date range, sender/username, or media type
  • Advanced search combines multiple filters
  • Search across one channel, several, or all of them at once

Auth & networking

  • Sign in via QR code (displayed as ASCII art in the terminal) or phone number
  • Two-factor authentication supported
  • Optional SOCKS5, SOCKS4, or HTTP proxy with auth, plus a built-in proxy connectivity test

Export

  • One-shot export of every tracked channel to CSV and JSON, written next to the database

Offline viewer (telegram-viewer.html)

  • Single-file HTML app you can open directly in a browser, no server or install required
  • Loads .db files via sql.js (compiled SQLite running in the browser)
  • Drop in the channel folder to attach scraped media (photos and videos render inline)
  • Load multiple chats at once and switch between them
  • Per-chat search and a global search across all loaded chats
  • Strict CSP, so it runs entirely offline with no network requests after load

Requirements

  • Python 3.9+
  • A Telegram account
  • Telegram API credentials (api_id and api_hash), available at https://my.telegram.org

Python dependencies

pip install telethon qrcode

Optional, only needed if you want to route through a proxy:

pip install pysocks "python-socks[asyncio]"

Setup

  1. Install dependencies (above).
  2. Run the scraper:
    python telegram_scraper_with_forwarding.py
  3. On first launch it will ask for your api_id and api_hash, then walk you through QR or phone login. Credentials and session state are saved to state.json in the working directory.

Main menu

[S] Scrape channels                [W] Forwarding rules
[C] Continuous scraping            [D] Search database
[B] Combined scrape + forward      [P] Proxy settings
[M] Media scraping: ON/OFF         [Q] Quit
[L] List & add channels
[R] Remove channels
[E] Export data (CSV + JSON)
[T] Rescrape media
[F] Fix missing media

Typical first-run flow: L to list and add channels you're in, then S to scrape, or C for live, or B if you also have forwarding rules configured.

Forwarding rules

Open with [W] from the main menu.

  • [A] Add: pick source, pick destination (a tracked channel, a @username, or a raw -100... ID), pick content types, pick copy vs forward.
  • [E] Edit, [T] toggle, [D] delete an existing rule.
  • [S] Start forwarding: listens for new messages and forwards them according to enabled rules.
  • [H] Backfill: replay messages from a date range through a rule (useful when you set up a rule after the fact). The source channel must already be scraped locally.

File layout

After running, your working directory will look something like:

.
├── telegram_scraper_with_forwarding.py
├── telegram-viewer.html
├── state.json                       # config, session state, rules
├── -1001234567890/                  # one folder per channel
│   ├── -1001234567890.db            # SQLite messages DB
│   ├── -1001234567890_<name>.csv    # created by [E] export
│   ├── -1001234567890_<name>.json   # created by [E] export
│   └── media/
│       ├── photos/
│       ├── videos/
│       └── documents/
└── channels_list.csv                # optional, from [L]

Using the offline viewer

  1. Open telegram-viewer.html in any modern browser, or serve it over localhost (see below).
  2. Click Choose Files and select one or more .db files (and optionally exported .json files).
  3. To make media render inline, also point the media picker at the channel folder so the viewer can resolve the relative paths stored in the DB.
  4. Use the per-chat search box to filter messages within a chat, or the top search to search across every chat you've loaded.

The viewer is fully client-side. Nothing is uploaded anywhere; your DB never leaves the browser tab.

Serving over localhost (recommended)

Some browsers restrict what a page opened via file:// can load. sql.js's WebAssembly, the directory picker, and media blobs all behave better when the viewer is served over HTTP. Run a tiny local server in the folder that contains telegram-viewer.html and your channel folders, then visit http://localhost:8000/telegram-viewer.html.

Linux / macOS

cd /path/to/your/scraper-folder
python3 -m http.server 8000

Windows (PowerShell or CMD)

cd C:\path\to\your\scraper-folder
python -m http.server 8000

Stop the server with Ctrl+C. The server only listens on your own machine; nothing is exposed to the network. If port 8000 is taken, pick another (for example, 8080).

If you don't have Python on the path, any other static server works too: npx serve, php -S localhost:8000, VS Code's "Live Server" extension, and so on.

Database schema

Each <channel>/<channel>.db contains a single messages table:

Column Type Notes
message_id INTEGER UNIQUE Telegram message ID
date TEXT ISO timestamp
sender_id INTEGER
first_name, last_name, username TEXT
message TEXT message body
media_type TEXT photo, video, document, etc.
media_path TEXT relative path to the downloaded file
reply_to INTEGER replied-to message id
post_author TEXT for channels with signed posts
views, forwards INTEGER
reactions TEXT serialized reaction summary

The DB uses WAL mode and indexes on message_id and date. The script auto-migrates older DBs that are missing newer columns.

Notes & limits

  • You can only scrape and forward channels your account is a member of, since Telegram enforces this server-side.
  • Telegram applies aggressive rate limits. The scraper handles FloodWaitError automatically by sleeping the requested duration, but very large backfills will still take time.
  • state.json contains your api_id, api_hash, and references to your session, so treat it like a credential file.
  • Be mindful of Telegram's Terms of Service and the rules of any channel you scrape or repost from.

Troubleshooting

  • "Proxy dependencies not installed": run pip install pysocks "python-socks[asyncio]" and restart.
  • Login keeps failing: delete the .session file in the working directory and log in again.
  • Media is missing in the viewer: make sure you also picked the channel folder (or its parent) in the media picker so relative paths resolve.
  • Some messages show no media even after scraping: use [F] Fix missing media from the main menu to retry just the missing files.

About

A powerful Python script that allows you to scrape messages and media from Telegram channels using the Telethon library. Features include real-time continuous scraping, media downloading, and data export capabilities.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • HTML 91.4%
  • Python 8.6%