Skip to content

baggasiddhant/PyScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Notes

  • Built with Python 3.13 and Playwright 1.57.0.
  • Designed to scale to 7,000 successful records.

Approach

  1. Selector extraction: I first inspected the target site and manually extracted all the required selectors for each step :- registration, login, closing popups, navigating to deposit, selecting channels and amounts, and reaching the payment page.
  2. Async Playwright flow: Using these selectors, I implemented the automation in flow.py with Playwright’s async API. Each run creates a new account, logs in, and executes the deposit flow.
  3. Result capture: At the end of the flow, the scraper captures either a UPI ID or marks "QR_ONLY", saves the payment page URL, and takes a screenshot of the QR/UPI section.
  4. Management: In runner.py, I added retry logic, concurrency control, and CSV writing. This ensures that even if some runs fail due to dynamic UI issues, the scraper continues until 7,000 successful records are collected.
  5. Final dataset: The results are written into output.csv with deterministic screenshot paths, producing a reproducible dataset of 7,000 rows.

Limitations

  • The scraper is tightly coupled to the current structure of the target site. If the site’s UI or selectors change, the automation will need updates.

About

Async Python scraper using Playwright that automates account creation to payment flow and captures UPI details, URLs, timestamps, and screenshots.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages