GitHub - baggasiddhant/PyScraper: Async Python scraper using Playwright that automates account creation to payment flow and captures UPI details, URLs, timestamps, and screenshots.

Notes

Selector extraction: I first inspected the target site and manually extracted all the required selectors for each step :- registration, login, closing popups, navigating to deposit, selecting channels and amounts, and reaching the payment page.
Async Playwright flow: Using these selectors, I implemented the automation in flow.py with Playwright’s async API. Each run creates a new account, logs in, and executes the deposit flow.
Result capture: At the end of the flow, the scraper captures either a UPI ID or marks "QR_ONLY", saves the payment page URL, and takes a screenshot of the QR/UPI section.
Management: In runner.py, I added retry logic, concurrency control, and CSV writing. This ensures that even if some runs fail due to dynamic UI issues, the scraper continues until 7,000 successful records are collected.
Final dataset: The results are written into output.csv with deterministic screenshot paths, producing a reproducible dataset of 7,000 rows.

The scraper is tightly coupled to the current structure of the target site. If the site’s UI or selectors change, the automation will need updates.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
PyScraper		PyScraper
__pycache__		__pycache__
screenshots		screenshots
README.md		README.md
failed_runs.log		failed_runs.log
output.csv		output.csv
requirements.txt		requirements.txt