DataWeb Hackathon
DataIntel AI is a full-stack conversational platform that lets users query any CSV dataset using plain English. It uses a Text-to-SQL architecture powered by Google Gemini 2.5 Flash to convert natural language questions into SQL, execute them on a temporary SQLite database, and return verified, hallucination-free insights with auto-generated visualizations.
The goal of DataIntel AI is to democratize data analysis — no SQL, Python, or technical expertise required.
- Plain English Input: Ask questions like "What is the churn rate by gender?" and get accurate answers.
- Text-to-SQL Engine: Gemini converts your question into an optimized SQLite SELECT query.
- Zero Hallucination: Every answer is derived from real SQL execution on your actual data.
- Dynamic Charts: Bar, Line, and Pie charts rendered automatically using Recharts.
- Smart Detection: Gemini analyzes query results and generates chart data only when a visualization is appropriate.
- Null Handling: If data isn't chartable, the chart section is gracefully skipped.
- Instant Profiling: On CSV upload, get row/column counts, missing values, duplicate rows, and column types.
- Expandable Details: View per-column data types and missing value counts.
- Backend-Powered:
/api/analyzeendpoint performs deep Pandas analysis.
- Dataset-Aware: On upload, Gemini generates 5 smart questions tailored to your dataset's schema.
- Click-to-Query: Click any suggestion to instantly ask it.
- Token-Efficient: Only column names and types are sent to the LLM (no raw data).
- CSV Export: Spreadsheet-friendly format for further analysis.
- Excel (.xlsx): Formatted workbook powered by OpenPyXL.
- PDF Report: Styled Q&A report generated with ReportLab.
- SQLite on Disk: CSV is ingested into a temporary SQLite database — data lives on disk, not in RAM.
- Auto-Cleanup: Database file is deleted after every request. No data persists on server.
- Handles Large Datasets: Unlike in-memory Pandas, SQLite can process millions of rows without crashing.
| Domain | Tech Used |
|---|---|
| Frontend | React 19 (Vite), Tailwind CSS v4 |
| Charts | Recharts |
| Icons | Lucide React |
| Routing | React Router DOM |
| Backend | Python FastAPI |
| Database | SQLite (on-disk, temporary) |
| AI / LLM | Google Gemini 2.5 Flash (via LangChain) |
| Data Ingestion | Pandas (CSV → SQLite only) |
| Export | OpenPyXL (Excel), ReportLab (PDF) |
| Tools | Git, GitHub, Vite, Uvicorn |
- Frontend: React SPA with landing page (
/) and dashboard (/app). Communicates with the backend via REST APIs. - Backend: FastAPI server handles file uploads, SQL generation, query execution, and export.
- AI Pipeline (2 LLM Calls per Query):
- Call #1: Gemini receives the database schema and generates a raw SQLite SELECT query.
- Call #2: Gemini receives the SQL results and produces a structured JSON response (answer + explanation + chart data).
- Data Flow:
- CSV → Pandas (ingestion only) → SQLite on disk → DataFrame deleted from RAM.
- SQL executed via Python's built-in
sqlite3→ results formatted by Gemini → response sent to frontend. - SQLite DB file deleted after every request.
DataIntel-AI/
├── backend/ # Python FastAPI backend
│ ├── main.py # API endpoints & AI pipeline
│ └── requirements.txt # Python dependencies
│
├── frontend/ # React frontend
│ ├── src/
│ │ ├── components/
│ │ │ ├── LandingPage.jsx # Landing page with hero & features
│ │ │ ├── Sidebar.jsx # File upload, data quality, suggestions
│ │ │ ├── ChatArea.jsx # Message thread, input, export
│ │ │ ├── AIMessage.jsx # AI response with charts
│ │ │ ├── UserMessage.jsx # User message bubble
│ │ │ └── FileUpload.jsx # Drag & drop CSV upload
│ │ ├── App.jsx # Dashboard (main app)
│ │ ├── main.jsx # Router setup
│ │ └── index.css # Global styles & animations
│ ├── package.json
│ └── vite.config.js
│
├── WORKFLOW.md # Platform workflow for judges
├── .gitignore
└── README.md
- Node.js (v18 or higher)
- Python 3.10+
- Google Gemini API Key (Get one here)
-
Clone the Repository
git clone https://github.com/thisisAtharv/DataIntel-AI.git cd DataIntel-AI -
Backend Setup
cd backend pip install -r requirements.txtUpdate the
GOOGLE_API_KEYinmain.pywith your Gemini API key. -
Frontend Setup
cd frontend npm install -
Run the Application
# Terminal 1 — Backend cd backend python main.py # Terminal 2 — Frontend cd frontend npm run dev
-
Open
http://localhost:5173in your browser.
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/chat |
Send a query + CSV, get AI response with chart data |
POST |
/api/analyze |
Upload CSV, get data quality info (missing, duplicates, types) |
POST |
/api/suggest |
Upload CSV, get 5 AI-generated suggested questions |
POST |
/api/export |
Export chat results as CSV, XLSX, or PDF |
GET |
/ |
Health check |
- Building a Text-to-SQL pipeline with real code execution and zero hallucination.
- Designing a 2-call optimized LLM architecture to minimize API usage.
- Implementing scalable data processing with SQLite on disk instead of in-memory Pandas.
- Creating dynamic data visualizations with conditional chart rendering.
- Building a production-grade React frontend with skeleton loaders, error handling, and export functionality.
- 🔗 Multi-Table Support: Upload multiple CSVs and query across joined tables.
- 🔐 User Authentication: Login system with saved datasets and query history.
- ☁️ Cloud Deployment: Deploy backend on AWS/GCP with persistent database.
- 📱 Mobile Responsiveness: Optimized UI for tablets and smartphones.