CSFD Scraper is a high-performance tool for extracting structured movie and TV data from CSFD pages with exceptional speed and efficiency. It solves the problem of slow, resource-heavy crawling by delivering clean, reliable film metadata at scale. Built for developers and data teams who need fast access to movie ratings, reviews, and credits.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for csfd-scraper you've just found your team — Let’s Chat. 👆👆
CSFD Scraper collects detailed information from CSFD movie, series, and user pages and converts it into structured data suitable for analytics and automation. It removes the overhead of rendering and focuses on speed, consistency, and low resource usage. This project is ideal for developers, analysts, and researchers working with movie datasets.
- Optimized for static HTML parsing without rendering
- Supports movies, series, episodes, users, reviews, and ratings
- Handles large-scale URL collections through sitemap discovery
- Designed for predictable memory usage and stable throughput
| Feature | Description |
|---|---|
| Multiple request types | Extract views, reviews, ratings, users, and sitemap URLs. |
| Sitemap crawling | Collects all published CSFD URLs in a single controlled process. |
| Header overrides | Supports global and per-request header customization. |
| High concurrency | Parallel request handling for faster data collection. |
| Structured outputs | Returns normalized JSON objects ready for processing. |
| Field Name | Field Description |
|---|---|
| header_name | Official movie or series title. |
| rating | Percentage rating score. |
| rating_votes_count | Number of user votes. |
| genres | List of associated genres. |
| origin | Country, year, and runtime information. |
| plot_full | Full plot description text. |
| creators | Directors, writers, cast, and crew details. |
| user_name | Username of reviewer or rater. |
| star_rating | User star rating value. |
| comment | Full review or comment text. |
| date | Date of review or rating submission. |
[
{
"request_type": "View",
"url": "https://www.csfd.cz/film/17592-ctyri-svatby-a-jeden-pohreb/prehled/",
"data": {
"header_name": "Čtyři svatby a jeden pohřeb",
"rating": "72%",
"rating_votes_count": 14484,
"genres": ["Komedie", "Romantický", "Drama"],
"origin": "Velká Británie / USA, 1994",
"plot_full": "Snímek vypráví příběh Charlese..."
}
}
]
CSFD Scraper/
├── src/
│ ├── main.rs
│ ├── parser/
│ │ ├── view.rs
│ │ ├── reviews.rs
│ │ ├── ratings.rs
│ │ └── sitemap.rs
│ ├── models/
│ │ ├── movie.rs
│ │ ├── user.rs
│ │ └── review.rs
│ └── config/
│ └── settings.json
├── data/
│ ├── sample_input.json
│ └── sample_output.json
├── Cargo.toml
└── README.md
- Data analysts use it to collect CSFD ratings and reviews, so they can analyze audience sentiment.
- Developers use it to build movie recommendation systems with structured film metadata.
- Researchers use it to study trends in Czech and international cinema.
- Content platforms use it to enrich movie catalogs with ratings, genres, and cast data.
Does this scraper support both movies and series? Yes, it supports movies, series, episodes, and related subpages using dedicated request types.
Can I extract user reviews and ratings separately? Yes, reviews and ratings are handled as independent request types with paginated collection.
How does it handle large datasets? Results are split into manageable parts to maintain stability and consistent output delivery.
Is request order preserved in output? No, output order may differ due to concurrency, but each result includes identifiers for tracking.
Primary Metric: Processes static CSFD pages in milliseconds per request under normal load.
Reliability Metric: Sustains a high success rate across large URL batches with retry handling.
Efficiency Metric: Operates within low memory limits while maintaining high throughput.
Quality Metric: Extracted fields consistently match on-page content with high completeness.
