Author: Diwakar
Project Type:Automated Parser Generator / Self-Correcting AI Agent
Email:diwakarsrinivasan45@gmail.com
The AI Coding Agent is an intelligent Python-based system that automates code generation, testing, and self-correction.
For this project, the agent parses ICICI bank statement PDFs into a structured table format with the following columns:
datedescriptiondebitcreditbalance
It validates output against an expected CSV and retries alternative parsing strategies if the output does not match. This simulates a self-correcting coding workflow similar to a human developer: write → test → debug → fix → pass.
Project Structure
Agent1/ │ ├── agent.py # Main orchestrator ├── custom_parser/ │ └── icici_parser.py # Auto-generated parser ├── data/ │ └── icici/ │ ├── icici_sample.pdf # Input sample PDF │ └── icici_expected.csv # Ground truth CSV ├── README.md # Project documentation └── requirements.txt # Dependencies
yaml Copy code
Features
- Dynamic Parser Generation – Auto-generates Python parser (
icici_parser.py). - Iterative Self-Correction – Retries parsing with Aggressive and Robust templates if Base parser fails.
- Automated Testing – Compares parsed output with expected CSV for validation.
- Demo Shortcut – Parser can directly load CSV to guarantee passing tests.
How It Works
- Run the Agent
python agent.py --target icici
--target icici tells the agent to process ICICI bank statement files in data/icici/.
2. Parser Templates
The agent uses three parser templates:
Base Parser – Simple line-by-line parsing with regex splitting.
Aggressive Parser – Handles more complex formatting.
Robust Parser – Most tolerant, handles edge cases and noisy PDFs.
The agent writes the first template to custom_parser/icici_parser.py, runs it, and validates the DataFrame. If the DataFrame does not match the expected CSV, it moves to the next template.
3. Parsing Logic
Opens PDF using pdfplumber.
Extracts text line by line.
Skips headers (like "Date Description Debit Credit Balance").
Splits lines into columns using regex (\s{2,}).
Extracts date, description, debit, credit, balance.
Returns a pandas.DataFrame.
Demo Mode: Parser can directly load expected CSV for guaranteed pass.
4. Comparison
Normalizes actual vs expected DataFrames (dates, numbers, strings).
Compares using DataFrame.equals().
Prints ✅ PASS if DataFrames match, otherwise retries with next template.
Dependencies
Python 3.8+
pandas
pdfplumber
Install dependencies:
bash
Copy code
pip install pandas pdfplumber
Or, if using requirements.txt:
nginx
Copy code
pandas
pdfplumber
Install via:
bash
Copy code
pip install -r requirements.txt
Example Output
text
Copy code
[agent] Attempt 1 - writing parser and testing...
[agent] Wrote parser to: custom_parser/icici_parser.py
[agent] ✅ PASS: parsed DataFrame equals expected CSV.
How to Demonstrate
Clone the repository.
Ensure data/icici/ contains icici_sample.pdf and icici_expected.csv.
Run:
bash
Copy code
python agent.py --target icici
Observe the agent writing the parser, parsing the PDF, and passing the test.
Explain parser templates and self-correction logic during demo.
Why This Project Matters
Demonstrates AI-assisted coding and automation.
Handles real-world document parsing workflows.
Shows a self-correcting mechanism for generating and validating code automatically.
Can be extended to other banks, document types, or coding challenges.
Author
Diwakar