MicroCFO

AI-powered financial compliance platform for Indian MSMEs, featuring automated GST reconciliation, legal compliance monitoring, subsidy discovery, and intelligent vendor negotiations.

🚀 Key Features

✅ Agent A - Visual Auditor

Gemini 2.5 Flash Integration: Real invoice image processing
Fraud Detection: Tampering, handwriting, missing GSTIN detection
Line Item Categorization: Capital Goods, Raw Material, Personal/Entertainment, Service
Orchestrator Triggers: Auto-connects to Agents B & C
Conservative CA-Style Auditing: When in doubt, flag it

✅ Agent B - Legislative Sentinel (Structure-Aware RAG)

Smart Legal Text Splitting: CA-logic based chunking
Vector Database: ChromaDB with semantic search
Context Filtering: Turnover-based compliance filtering
Real-time Monitoring: Government website scraping

✅ Agent C - Subsidy Hunter

Scheme Database: Government subsidy discovery
Benefit Calculation: Estimated subsidy amounts
Sector-Specific: Textile, Manufacturing, Technology focus

✅ Agent D - Negotiator

Router Logic: AI-powered strategy determination
Gemini 3 Flash Integration: Context-aware content generation
A/B Testing: Relationship-focused vs Transactional approaches
Multi-format Output: Telegram messages + Formal emails
Cash Flow Intelligence: Decisions based on financial position

✅ Phase 4: Business Logic & Integration (NEW!)

ERP Adapters: Export to Tally, Zoho Books, CSV, JSON
User Onboarding: Industry and turnover tier selection
Contextual Filtering: Personalized compliance and subsidies
API-First Design: Production-ready integrations

✅ Security & Compliance (NEW!)

Data Encryption: AES-256 encryption at rest
Audit Trails: Comprehensive logging of all actions
Legal Disclaimers: Prominent AI limitation warnings
Guardrails: Draft-only mode, verification required

🎯 Agent A: The Visual Auditor (Complete Implementation)

Key Features

Multimodal Processing: Gemini 2.5 Flash for invoice image analysis
Structured Data Extraction: Vendor, amounts, dates, GSTIN, line items
Fraud Detection: Tampering, handwriting, inconsistencies
Compliance Checking: ITC eligibility, stale invoices, missing documentation
Orchestrator Integration: Auto-triggers Agents B & C based on content

Enhanced Invoice Model

class Invoice(BaseModel):
    vendor_name: str
    invoice_date: str
    total_amount: float
    tax_amount: float
    line_items: List[LineItem]  # With category classification
    gstin: Optional[str] = None
    # Auditor fields
    is_handwritten: bool = False
    tampering_detected: bool = False
    compliance_flags: List[str] = []
    confidence_score: float = 1.0

Line Item Categories

Capital Goods: Machinery, equipment, plant, vehicles
Raw Material: Production inputs, components
Personal/Entertainment: Food, alcohol, personal expenses
Service: Consulting, software, maintenance

Fraud Detection Capabilities

Tampering Detection: Mismatched fonts, blurred numbers, digital manipulation
Handwriting Identification: Reliability scoring for handwritten bills
GSTIN Validation: Tax charged without proper registration
Date Staleness: ITC eligibility for invoices >30 days old
Conservative Flagging: When in doubt, flag for manual review

Orchestrator Triggers

Capital Goods >₹1L: Auto-triggers Agent C (Subsidy Hunter)
Personal Items: Auto-triggers Agent B (Legal Sentinel) for ITC compliance
Proactive Alerts: Adds subsidy and compliance warnings to invoice response

🆕 Agent D: The Negotiator (Complete Implementation)

Architecture: OpenAI Router + Gemini 3 Flash

Phase 1: Router Logic (The Decision Maker)

Smart strategy determination based on financial context:

class NegotiationIntent(str, Enum):
    CREDIT_EXTENSION = "credit_extension"      # Cash flow tight
    PAYMENT_CHASE = "payment_chase"            # Overdue receivables  
    EARLY_PAYMENT_OFFER = "early_payment_offer" # Cash surplus

Decision Logic:

Credit Extension: When projected_cash_balance < upcoming_outflows
Payment Chase: When invoice_due_date < today (Overdue)
Early Payment Offer: When cash_surplus is high (proactive optimization)

Phase 2: Generator Logic (Gemini 3 Flash)

AI-powered content generation with Indian business communication style:

# Context-aware prompts for each intent
"You are the CFO of an Indian MSME. Your goal is to manage cash flow without burning relationships.

Scenario: {intent} for {vendor_name}
Amount: ₹{amount:,.0f}
Style: {tone} (Relationship vs Transactional)

Generate authentic Telegram + Email content referencing Invoice #{invoice_id}"

Phase 3: A/B Testing (The "Novelty" Feature)

Every negotiation generates two variations:

Option A (Relationship Focus): "We value our long-term partnership..."
Option B (Transactional Focus): "Please find attached the invoice overdue by 3 days..."

Complete MCP Tool Implementation

@mcp.tool()
def generate_negotiation_draft(
    counterparty_name: str,
    amount: float,
    transaction_type: str,  # "payable" or "receivable"
    due_date: str,
    current_cash_position: float,
    upcoming_outflows: float = 0,
    invoice_id: str = None
) -> NegotiationDraft

Business Scenarios

Scenario 1: Cash Flow Crunch (Textile MSME)

# Input: ₹8.5L due, only ₹4L available, ₹3.5L outflows
# Router: CREDIT_EXTENSION
# Output: "Hi Gujarat Cotton Mills, need 15 days for Invoice #INV-001 payment. 
#          Cash flow timing issue. Thanks for understanding! 🙏"

Scenario 2: Overdue Payment Chase (IT Services)

# Input: ₹4.8L overdue by 15 days
# Router: PAYMENT_CHASE  
# Output: "Hi MegaCorp Technologies, gentle reminder for Invoice #INV-002 payment.
#          Let us know if any clarification needed. Thanks! 😊"

Scenario 3: Early Payment Opportunity (Manufacturing)

# Input: ₹6.5L due, ₹32L available (strong position)
# Router: EARLY_PAYMENT_OFFER
# Output: "Hi Premium Steel Suppliers, can offer early payment for Invoice #INV-003
#          with 2% discount. Win-win for both! Let me know 😊"

Key Features

Conservative Financial Logic: Protects cash flow while maintaining relationships
Indian Business Context: Appropriate tone and communication style
Multi-format Output: Telegram (160 chars) + Formal email versions
Invoice-specific References: Uses actual invoice numbers for authenticity
Fallback Mode: Works without API keys using template-based generation

🆕 Agent B: The Legislative Sentinel (Structure-Aware RAG)

Features

Smart Legal Text Splitting: CA-logic based chunking that preserves legal structure
Metadata Extraction: Automatic tagging of turnover thresholds, sectors, dates
Vector Database: ChromaDB with semantic search capabilities
Context Filtering: Filters out irrelevant laws based on user profile
Real-time Monitoring: Automated scraping of government websites for new notifications

Architecture

Legal Sentinel System
├── legal_ingestion.py (Phase 1: Structure-Aware Ingestion)
│   ├── LegalTextSplitter (CA-Logic splitting)
│   ├── Metadata extraction (turnover, sector, dates)
│   └── Smart chunking (preserves provisos and sub-clauses)
├── vector_database.py (Phase 2: Vector Storage)
│   ├── ChromaDB integration
│   ├── Semantic search with sentence-transformers
│   └── Hybrid search (keyword + semantic)
├── server.py (Phase 3: MCP Tool Implementation)
│   ├── Context fetching from user profile
│   ├── Turnover-based filtering
│   └── Conservative CA-style responses
└── sentinel_monitor.py (Phase 4: Real-time Alerts)
    ├── Government website monitoring
    ├── User relevance checking
    └── Telegram alert system

Quick Start

1. Environment Setup

python setup.py

2. Activate Virtual Environment

# Windows
venv\Scripts\activate

# Unix/Linux/Mac
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Configure API Keys (IMPORTANT - Security)

⚠️ NEVER commit API keys to git!

Create a .env file from the template:

cp .env.example .env

Edit .env and add your actual API keys:

# Get your Gemini API key from: https://makersuite.google.com/app/apikey
GEMINI_API_KEY=your_actual_gemini_key_here

# Or use OpenRouter: https://openrouter.ai/keys
OPENROUTER_API_KEY=your_actual_openrouter_key_here

Alternative: Set environment variables directly

# Windows PowerShell
$env:GEMINI_API_KEY="your_key_here"

# Unix/Linux/Mac
export GEMINI_API_KEY="your_key_here"

📖 See SECURITY.md for complete security guidelines

5. Initialize Legal Database

python setup_legal_db.py

6. Run Tests

# Test Agent A (Visual Auditor)
python test_gemini_direct.py
python final_agent_a_test.py

# Test Agent B (Legal Sentinel)
python test_legal_sentinel.py

# Test Agent D (Negotiator)
python test_agent_d_simple.py
python demo_agent_d_negotiator.py

# Test complete demo
python demo_complete_agent_a.py

7. Start MCP Server

python server.py

8. Test with MCP Inspector

mcp dev server.py

Testing Agent A - Visual Auditor

Process Real Invoice Images

from server import scan_invoice_document

# Process local image
result = scan_invoice_document('invoice.jpg')

# Process URL
result = scan_invoice_document('https://example.com/invoice.png')

# Process base64 data
result = scan_invoice_document('data:image/jpeg;base64,/9j/4AAQ...')

# Use mock data for testing
result = scan_invoice_document('test', use_mock=True)

Example Scenarios

Scenario 1: Textile Machinery Purchase (₹11.8L)

# Invoice contains: Rapier Loom Machine (₹8L) + Warp Feeder (₹1.5L)
# Agent A detects: Capital Goods >₹1L
# Auto-triggers: Agent C (Subsidy Hunter)
# Result: "TUFS Scheme - Up to 25% subsidy (₹2.37L estimated benefit)"

Scenario 2: Restaurant Bill (₹8.5K)

# Invoice contains: Business Lunch + Alcoholic Beverages
# Agent A detects: Personal/Entertainment items
# Auto-triggers: Agent B (Legal Sentinel)
# Result: "ITC WARNING: Section 17(5) - Food & alcohol not eligible"

Scenario 3: Suspicious Invoice

# Agent A detects: Tampering, Missing GSTIN, Handwritten, Stale (>30 days)
# Flags: "CRITICAL: Manual verification required"
# Recommendation: "Do not claim ITC without proper documentation"

Testing the Enhanced Legal Sentinel

Smart Legal Queries

# Test structure-aware responses
check_compliance_law("Can I claim ITC on office supplies if my turnover is 3 crores?")
# Response: "EXEMPT: Your turnover (< 5Cr) is below the 5 crore threshold..."

check_compliance_law("Section 17(5) blocked credits")
# Response: Finds exact section with metadata

check_compliance_law("What are the penalties for late GST filing?")
# Response: Structure-aware search finds Section 47 with specific penalties

Real-time Monitoring

# Test monitoring once
python sentinel_monitor.py run-once

# Start continuous monitoring
python sentinel_monitor.py

Legal Database Features

1. Structure-Aware Text Splitting

Recognizes Section X, Rule Y, Notification No. patterns
Preserves Provided that... clauses with parent sections
Maintains sub-clause (a), (b), (c) relationships

2. Intelligent Metadata Extraction

Turnover Thresholds: "turnover exceeds 5 crore" → 50000000
Sector Tags: Textile, Manufacturing, Technology, Works Contract
Effective Dates: Automatic date extraction from notifications
Law Types: GST, Income Tax, Companies Act classification

3. Context-Aware Filtering

Filters laws based on user's turnover tier
Sector-specific relevance checking
Conservative CA-style interpretations

4. Hybrid Search Capabilities

Keyword Search: Direct section number lookup
Semantic Search: Concept-based matching
Combined Search: Best of both approaches

🆕 Legal Data Seeding System

The Legal Data Seeding System automates the acquisition and ingestion of foundational Indian legal documents into the MicroCFO vector database. This system establishes the "Base Layer" that powers the Legal Sentinel (Agent B) capabilities.

Overview

The seeding system consists of three main components:

Seed Downloader (scripts/seed_downloader.py): Downloads legal documents from official government sources
Enhanced Legal Ingestion (legal_ingestion.py): Processes PDFs with structure-aware chunking
Seed Data Processor (scripts/seed_data.py): Orchestrates the end-to-end pipeline

Quick Start - Seeding the Database

Step 1: Download Legal Documents

# Download all configured legal documents from government sources
python scripts/seed_downloader.py

# Or specify custom output directory
python scripts/seed_downloader.py --output-dir ./my_legal_docs/

What it downloads:

✅ CGST Act 2017 (from CBIC)
✅ IGST Act 2017 (from CBIC)
✅ Income Tax Act 1961 (from IncomeTaxIndia)
✅ Companies Act 2013 (from India Code)
✅ PLI Textiles Guidelines (from Texprocil)

Features:

Idempotent downloads (skips existing files)
SSL error recovery for government sites
Network timeout handling with exponential backoff
Comprehensive progress reporting

Step 2: Process and Populate Database

# Process downloaded PDFs and populate vector database
python scripts/seed_data.py

# Or specify custom paths
python scripts/seed_data.py --data-dir ./my_legal_docs/ --db-path ./my_legal_db/

What it does:

Scans for PDF files in the data directory
Auto-detects law type from filename
Extracts text with page-by-page progress reporting
Cleans text (removes headers, footers, page numbers)
Splits into structure-aware chunks
Extracts metadata (turnover, sector, dates)
Generates embeddings using sentence transformers
Stores chunks in ChromaDB vector database

Features:

Idempotent processing (skips already-processed documents)
Duplicate detection using file hashes
Detailed progress reporting
Comprehensive error handling
Processing statistics and summary report

Expected Output

Download Phase

INFO - Starting download of 5 documents
INFO - Downloading: Central Goods and Services Tax Act 2017
INFO - URL: https://cbic-gst.gov.in/pdf/cgst-act.pdf
INFO - Successfully downloaded: CGST_Act_2017.pdf (2458624 bytes)
INFO - Downloading: Integrated Goods and Services Tax Act 2017
INFO - Successfully downloaded: IGST_Act_2017.pdf (1234567 bytes)
...
============================================================
Download Summary:
  Total documents: 5
  Successful: 5
  Failed: 0
============================================================

Processing Phase

INFO - SeedDataProcessor initializing...
INFO - ✓ Legal Document Processor initialized successfully
INFO - ✓ Vector Database initialized successfully
INFO - Starting batch document processing
INFO - Found 5 PDF files
[1/5] (20.0%) Document Processing: CGST_Act_2017.pdf
INFO - Processing: CGST_Act_2017.pdf
INFO -   Detected law type: GST
INFO -   Extracting text from 174 pages...
INFO -     Processing page 1/174
INFO -     Processing page 2/174
...
INFO -   ✓ Text extraction complete (174 pages processed)
INFO -   Cleaning extracted text...
INFO -   ✓ Text cleaning complete
INFO -   Chunking text with structure-aware splitting...
INFO -   ✓ Created 245 legal chunks
INFO -   Storing chunks in vector database...
INFO -     Storing chunks 1-25/245 (10.2%)
INFO -     Storing chunks 26-50/245 (20.4%)
...
INFO -   ✓ Storage complete (100%)
INFO -   ✓ Completed in 45.23s

============================================================
LEGAL DATA SEEDING REPORT
============================================================

Summary:
  Total Documents:     5
  Successful:          5
  Failed:              0
  Total Chunks:        1,247
  Total Time:          234.56s

Document Details:
------------------------------------------------------------
  ✓ SUCCESS | CGST_Act_2017.pdf | GST | 245 chunks | 45.23s
  ✓ SUCCESS | IGST_Act_2017.pdf | GST | 198 chunks | 38.12s
  ✓ SUCCESS | Income_Tax_Act_1961.pdf | Income Tax | 512 chunks | 98.45s
  ✓ SUCCESS | Companies_Act_2013.pdf | Corporate Law | 234 chunks | 42.67s
  ✓ SUCCESS | PLI_Textiles_Guidelines.pdf | Subsidy Scheme | 58 chunks | 10.09s
------------------------------------------------------------
Report generated at: 2024-01-15T14:30:45.123456
============================================================

Troubleshooting

SSL Certificate Errors

Government websites often have SSL certificate issues. The downloader automatically retries with SSL verification disabled:

WARNING - SSL verification failed for https://cbic-gst.gov.in/pdf/cgst-act.pdf, retrying without verification

This is expected and handled automatically. The system logs a warning but continues the download.

Network Timeouts

If downloads timeout due to slow government servers, the system retries with exponential backoff:

WARNING - Timeout on attempt 1/3, retrying in 1s
WARNING - Timeout on attempt 2/3, retrying in 2s

The system will retry up to 3 times before giving up on a particular document.

Empty or Malformed PDFs

If a PDF cannot be processed or produces no text:

WARNING - Empty content after extraction and cleaning for CGST_Act_2017.pdf, skipping document

Check the PDF file manually. Some government PDFs may be scanned images without OCR text.

Database Initialization Errors

If the vector database fails to initialize:

ERROR - Failed to initialize Vector Database at './legal_db/': [Errno 13] Permission denied
Common causes:
  - ChromaDB not installed (pip install chromadb)
  - Sentence transformers not installed (pip install sentence-transformers)
  - Insufficient disk space or permissions
  - Corrupted database files (try deleting the legal_db directory)

Follow the suggested fixes:

Ensure all dependencies are installed: pip install -r requirements.txt
Check disk space: df -h (Unix) or dir (Windows)
Check permissions: Ensure write access to the project directory
If corrupted, delete and recreate: rm -rf legal_db/ then re-run

Re-running the Seeding Process

The seeding system is idempotent - you can safely re-run it multiple times:

# Re-running downloads will skip existing files
python scripts/seed_downloader.py
# Output: "File already exists, skipping: CGST_Act_2017.pdf"

# Re-running processing will skip already-processed documents
python scripts/seed_data.py
# Output: "Document CGST_Act_2017.pdf already processed with matching hash"

To force re-processing (e.g., after updating the ingestion logic):

Delete the vector database: rm -rf legal_db/
Re-run the processor: python scripts/seed_data.py

Advanced Configuration

Adding New Legal Sources

Edit scripts/seed_downloader.py and add to the LEGAL_SOURCES list:

LEGAL_SOURCES = [
    # ... existing sources ...
    LegalDocumentSource(
        url="https://example.gov.in/new-act.pdf",
        filename="New_Act_2024.pdf",
        description="New Act 2024",
        law_type="GST"  # or "Income Tax", "Corporate Law", "Subsidy Scheme"
    )
]

Customizing Law Type Detection

The system auto-detects law types from filenames. To customize, edit legal_ingestion.py:

def detect_law_type_from_filename(filename: str) -> str:
    # Add your custom patterns here
    if "CUSTOM" in filename_normalized:
        return "Custom Law Type"
    # ... existing patterns ...

Customizing Metadata Extraction

To extract additional metadata, edit legal_ingestion.py:

def extract_metadata_from_text(text: str, law_type: str = "GST") -> Dict[str, Optional[any]]:
    metadata = {
        'turnover_threshold': None,
        'sector_tag': None,
        'effective_date': None,
        'custom_field': None  # Add your custom field
    }
    # Add extraction logic for custom_field
    # ...

System Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Legal Data Seeding System                 │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
        ┌─────────────────────────────────────────┐
        │   1. Seed Downloader                    │
        │   (scripts/seed_downloader.py)          │
        │                                         │
        │   • Downloads from government sources   │
        │   • SSL error recovery                  │
        │   • Network timeout handling            │
        │   • Idempotent operations               │
        └─────────────────────────────────────────┘
                              │
                              ▼ PDFs saved to ./data/initial_acts/
                              │
        ┌─────────────────────────────────────────┐
        │   2. Seed Data Processor                │
        │   (scripts/seed_data.py)                │
        │                                         │
        │   • Orchestrates processing pipeline    │
        │   • Duplicate detection                 │
        │   • Progress tracking                   │
        │   • Error handling                      │
        └─────────────────────────────────────────┘
                              │
                              ▼
        ┌─────────────────────────────────────────┐
        │   3. Legal Ingestion Pipeline           │
        │   (legal_ingestion.py)                  │
        │                                         │
        │   • Auto-detect law type                │
        │   • Extract & clean text                │
        │   • Structure-aware chunking            │
        │   • Metadata extraction                 │
        └─────────────────────────────────────────┘
                              │
                              ▼ Legal chunks with metadata
                              │
        ┌─────────────────────────────────────────┐
        │   4. Vector Database                    │
        │   (vector_database.py)                  │
        │                                         │
        │   • Generate embeddings                 │
        │   • Store in ChromaDB                   │
        │   • Create search indices               │
        │   • Enable semantic search              │
        └─────────────────────────────────────────┘
                              │
                              ▼ Populated vector database
                              │
        ┌─────────────────────────────────────────┐
        │   5. Legal Sentinel (Agent B)           │
        │   (server.py)                           │
        │                                         │
        │   • Query legal database                │
        │   • Context-aware filtering             │
        │   • Conservative CA responses           │
        └─────────────────────────────────────────┘

Data Flow

Download Phase: Fetch PDFs from official government URLs
Processing Phase: Extract text, clean, and chunk with structure awareness
Chunking Phase: Create LegalChunk objects with metadata
Storage Phase: Generate embeddings and store in ChromaDB
Query Phase: Legal Sentinel retrieves relevant chunks for user queries

Key Features

Idempotent Operations

Download: Skips files that already exist
Processing: Detects already-processed documents using file hashes
Safe Re-execution: Can re-run the entire pipeline without duplicates

Robust Error Handling

SSL Errors: Automatic retry with verification disabled
Network Timeouts: Exponential backoff retry (1s, 2s, 4s)
Malformed PDFs: Graceful handling with detailed error messages
Page Extraction Failures: Continues processing remaining pages

Progress Reporting

Download Progress: File-by-file status with sizes
Page Progress: Page-by-page extraction reporting
Chunk Progress: Number of chunks created per document
Storage Progress: Percentage-based storage progress
Summary Reports: Comprehensive statistics at completion

Metadata Extraction

Turnover Thresholds: Extracts "5 crore", "50 crore" → numeric rupees
Sector Tags: Identifies Textile, Manufacturing, Technology, Trading
Effective Dates: Extracts dates like "w.e.f. 01-04-2023" → ISO format
Section Numbers: Preserves legal section identifiers
Chunk Types: Classifies as main, proviso, or sub_clause

File Locations

MicroCFO-MCP-Server/
├── scripts/
│   ├── seed_downloader.py      # Download legal documents
│   └── seed_data.py            # Process and populate database
├── data/
│   └── initial_acts/           # Downloaded PDF files (created automatically)
│       ├── CGST_Act_2017.pdf
│       ├── IGST_Act_2017.pdf
│       ├── Income_Tax_Act_1961.pdf
│       ├── Companies_Act_2013.pdf
│       └── PLI_Textiles_Guidelines.pdf
├── legal_db/                   # Vector database storage (created automatically)
│   ├── chroma.sqlite3          # ChromaDB database
│   └── embeddings/             # Vector embeddings
├── legal_ingestion.py          # Enhanced with seeding support
└── vector_database.py          # Vector database operations

Integration with Legal Sentinel

Once the database is seeded, the Legal Sentinel (Agent B) automatically benefits from the expanded base layer:

# Query the seeded database
result = check_compliance_law(
    "What are the ITC eligibility rules for capital goods?",
    user_context={"turnover": 30000000, "sector": "Manufacturing"}
)

# The system will:
# 1. Search the seeded legal chunks
# 2. Filter by turnover (3 crore < 5 crore threshold)
# 3. Apply sector-specific relevance
# 4. Return structure-aware legal guidance

Performance Considerations

First Run: Initial seeding takes 3-5 minutes for all 5 documents
Embedding Generation: Most time-consuming step (sentence transformers)
Subsequent Runs: Idempotent operations make re-runs very fast
Database Size: Expect ~50-100 MB for the complete legal database
Memory Usage: Peak ~2 GB during embedding generation

Best Practices

Run seeding on a stable network connection - Government sites can be slow
Ensure sufficient disk space - At least 500 MB free
Don't interrupt processing - Let it complete for proper metadata storage
Check the summary report - Verify all documents processed successfully
Test queries after seeding - Ensure Legal Sentinel returns relevant results

Next Steps After Seeding

Once the database is seeded, you can:

Test Legal Sentinel queries: python test_legal_sentinel.py
Start the MCP server: python server.py
Enable real-time monitoring: python sentinel_monitor.py
Query via API: Use the check_compliance_law tool

The Legal Data Seeding System ensures your MicroCFO instance has a comprehensive, up-to-date foundation of Indian legal knowledge for accurate compliance guidance!

Real-time Legal Monitoring

Government Sources Monitored

CBIC (GST notifications)
MCA (Companies Act updates)
Income Tax Department notifications

Alert System

Automatic relevance checking against user profiles
Telegram Bot API integration (configurable)
Daily monitoring schedule

File Structure

MicroCFO-MCP-Server/
├── server.py                    # Main MCP server with all 4 agents
├── legal_ingestion.py           # Phase 1: Smart legal text processing
├── vector_database.py           # Phase 2: Vector storage & search
├── sentinel_monitor.py          # Phase 4: Real-time monitoring
├── setup_legal_db.py            # Database initialization
├── test_legal_sentinel.py       # Legal Sentinel test suite
├── requirements.txt             # Dependencies (includes google-generativeai)
├── setup.py                     # Environment setup
├── README.md                    # This file
│
├── Agent A Tests & Demos:
├── test_gemini_direct.py        # Direct Gemini 2.5 Flash test
├── final_agent_a_test.py        # Complete Agent A integration test
├── demo_complete_agent_a.py     # Full workflow demonstration
├── test_visual_auditor.py       # Comprehensive test suite
├── demo_visual_auditor.py       # Interactive demo script
│
├── Agent D Tests & Demos:
├── test_agent_d_simple.py       # Direct function testing
├── test_agent_d_negotiator.py   # Comprehensive test suite
└── demo_agent_d_negotiator.py   # Interactive negotiation demo

🎯 Complete MicroCFO Workflow

1. Invoice Processing (Agent A)

📸 User uploads invoice image
    ↓
🔍 Gemini 2.5 Flash processes image
    ↓
📊 Structured data extracted (vendor, amounts, items)
    ↓
🕵️ Fraud detection (tampering, handwriting, GSTIN)
    ↓
⚠️ Compliance flags (ITC eligibility, stale invoices)
    ↓
🎯 Orchestrator triggers other agents

2. Automatic Triggers

Capital Goods >₹1L → Agent C (Subsidy Hunter)
Personal Items → Agent B (Legal Sentinel)
Compliance Issues → Conservative warnings

3. Integrated Response

📄 Structured invoice data
🚨 Fraud alerts and compliance warnings
🎯 Proactive subsidy opportunities
📋 Legal compliance guidance
💼 Professional communication templates

Production Deployment

MCP Server Integration

# Your AI assistant can now call:
scan_invoice_document(image_url)           # Agent A - Visual processing
check_compliance_law(query, context)      # Agent B - Legal guidance  
find_applicable_subsidies(sector, amount) # Agent C - Subsidy discovery
generate_negotiation_draft(counterparty, amount, type, due_date, cash_position) # Agent D - Negotiation

API Capabilities

Real-time invoice processing with Gemini 2.5 Flash
Structure-aware legal RAG with ChromaDB
Proactive subsidy discovery with benefit calculation
AI-powered negotiation with cash flow intelligence
Conservative CA-style compliance checking
Automated orchestration between agents

🆕 Phase 4: Business Logic & Integration

ERP Adapters

Export invoices directly to your accounting system:

Supported Formats:

Tally ERP 9 / Tally Prime: XML (single) and CSV (batch)
Zoho Books: JSON API payload
Standard CSV: For Excel and generic accounting software
JSON: Complete data export for custom integrations

Usage:

# Export to Tally CSV
curl -X POST http://localhost:8000/api/v1/erp-export/export \
  -H "Content-Type: application/json" \
  -d '{"invoice_ids": ["inv-001"], "format": "tally_csv"}'

# Get supported formats
curl http://localhost:8000/api/v1/erp-export/formats

User Onboarding

Capture user context for personalized experience:

12 Industry Types:

Textile & Apparel, Manufacturing, Technology & IT
Trading, Services, Retail, Construction
Healthcare, Education, Hospitality, Agriculture, Other

4 Turnover Tiers:

Micro: < ₹5 Crore (Composition scheme eligible)
Small: ₹5-20 Crore (MSME benefits)
Medium: ₹20-50 Crore (PLI schemes)
Large: > ₹50 Crore (Full compliance)

Benefits:

Agent B filters legal compliance by turnover tier
Agent C shows industry-specific subsidies
Targeted recommendations and alerts

Usage:

# Start onboarding
curl -X POST http://localhost:8000/api/v1/onboarding/start

# Get industries
curl http://localhost:8000/api/v1/onboarding/industries

# Submit industry selection
curl -X POST http://localhost:8000/api/v1/onboarding/step \
  -H "Content-Type: application/json" \
  -d '{"step": "industry_selection", "data": {"industry_type": "textile"}}'

🔒 Security & Compliance

Data Encryption

At Rest: AES-256 encryption for sensitive database columns
In Transit: HTTPS/TLS for all API communication
S3 Storage: Server-side encryption (SSE-S3/SSE-KMS)
Key Management: Secure key storage with rotation support

Audit Trails

Comprehensive Logging: Who, What, When, Where (IP), How
30+ Action Types: All operations tracked
Query & Export: API endpoints for audit log access
Retention: Configurable retention policies

Legal Disclaimers

Prominent Warnings: "AI assistant, not a chartered accountant"
Specific Disclaimers: Legal, financial, tax, negotiation, invoice, subsidy
Guardrails: Draft-only mode, verification required, no auto-send
User Acceptance: Session-based disclaimer tracking

Guardrails

Negotiator: NEVER auto-sends emails (draft-only mode enforced)
Invoice Processing: Verification required, no auto-approval
Legal Queries: No legal advice, always recommend professionals
High-Amount Flagging: Transactions over ₹50,000 flagged

🚀 Ready for Production!

✅ Agent A: Gemini 2.5 Flash vision processing with fraud detection
✅ Agent B: Structure-aware legal RAG with turnover filtering
✅ Agent C: Subsidy discovery with benefit calculation
✅ Agent D: Professional communication generation
✅ Orchestrator: Automatic agent triggers and workflow
✅ ERP Integration: Export to Tally, Zoho Books, CSV, JSON
✅ User Onboarding: Industry and turnover tier selection
✅ Security: Encryption, audit trails, legal disclaimers
✅ Compliance: Guardrails, verification, professional recommendations

The MicroCFO MCP Server is now a complete, production-ready autonomous CFO assistant with enterprise-grade security and ERP connectivity!

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.continue/mcpServers		.continue/mcpServers
.github		.github
alembic		alembic
backend		backend
config		config
data/initial_acts		data/initial_acts
docs		docs
frontend		frontend
hackathon		hackathon
legal_db		legal_db
middleware		middleware
routers		routers
scheme_db		scheme_db
scripts		scripts
src		src
tasks		tasks
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CODE_REVIEW_FIXES.md		CODE_REVIEW_FIXES.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
P0_IMPLEMENTATION_SUMMARY.md		P0_IMPLEMENTATION_SUMMARY.md
PRIVACY_POLICY.md		PRIVACY_POLICY.md
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
README.md		README.md
README_COMPLETE.md		README_COMPLETE.md
SETUP.md		SETUP.md
SETUP_COMPLETE.md		SETUP_COMPLETE.md
TERMS_OF_SERVICE.md		TERMS_OF_SERVICE.md
build.sh		build.sh
design.md		design.md
docker-compose.yml		docker-compose.yml
main.py		main.py
nixpacks.toml		nixpacks.toml
pyproject.toml		pyproject.toml
railway.toml		railway.toml
requirements.md		requirements.md
requirements.txt		requirements.txt
setup.bat		setup.bat
setup.py		setup.py
setup.sh		setup.sh
test_p0.py		test_p0.py
test_quick.py		test_quick.py

Folders and files

Latest commit

History

Repository files navigation

MicroCFO

🚀 Key Features

🎯 Agent A: The Visual Auditor (Complete Implementation)

Key Features

Enhanced Invoice Model

Line Item Categories

Fraud Detection Capabilities

Orchestrator Triggers

🆕 Agent D: The Negotiator (Complete Implementation)

Architecture: OpenAI Router + Gemini 3 Flash

Phase 1: Router Logic (The Decision Maker)

Phase 2: Generator Logic (Gemini 3 Flash)

Phase 3: A/B Testing (The "Novelty" Feature)

Complete MCP Tool Implementation

Business Scenarios

Scenario 1: Cash Flow Crunch (Textile MSME)

Scenario 2: Overdue Payment Chase (IT Services)

Scenario 3: Early Payment Opportunity (Manufacturing)

Key Features

🆕 Agent B: The Legislative Sentinel (Structure-Aware RAG)

Features

Architecture

Quick Start

1. Environment Setup

2. Activate Virtual Environment

3. Install Dependencies

4. Configure API Keys (IMPORTANT - Security)

5. Initialize Legal Database

6. Run Tests

7. Start MCP Server

8. Test with MCP Inspector

Testing Agent A - Visual Auditor

Process Real Invoice Images

Example Scenarios

Scenario 1: Textile Machinery Purchase (₹11.8L)

Scenario 2: Restaurant Bill (₹8.5K)

Scenario 3: Suspicious Invoice

Testing the Enhanced Legal Sentinel

Smart Legal Queries

Real-time Monitoring

Legal Database Features

1. Structure-Aware Text Splitting

2. Intelligent Metadata Extraction

3. Context-Aware Filtering

4. Hybrid Search Capabilities

🆕 Legal Data Seeding System

Overview

Quick Start - Seeding the Database

Step 1: Download Legal Documents

Step 2: Process and Populate Database

Expected Output

Download Phase

Processing Phase

Troubleshooting

SSL Certificate Errors

Network Timeouts

Empty or Malformed PDFs

Database Initialization Errors

Re-running the Seeding Process

Advanced Configuration

Adding New Legal Sources

Customizing Law Type Detection

Customizing Metadata Extraction

System Architecture

Data Flow

Key Features

Idempotent Operations

Robust Error Handling

Progress Reporting

Metadata Extraction

File Locations

Integration with Legal Sentinel

Performance Considerations

Best Practices

Next Steps After Seeding

Real-time Legal Monitoring

Packages