Skip to content

thisisAtharv/DataIntel-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 DataIntel AI – Conversational Data Intelligence Platform

DataWeb Hackathon

DataIntel AI is a full-stack conversational platform that lets users query any CSV dataset using plain English. It uses a Text-to-SQL architecture powered by Google Gemini 2.5 Flash to convert natural language questions into SQL, execute them on a temporary SQLite database, and return verified, hallucination-free insights with auto-generated visualizations.

The goal of DataIntel AI is to democratize data analysis — no SQL, Python, or technical expertise required.


✨ Features

💬 Natural Language Querying

  • Plain English Input: Ask questions like "What is the churn rate by gender?" and get accurate answers.
  • Text-to-SQL Engine: Gemini converts your question into an optimized SQLite SELECT query.
  • Zero Hallucination: Every answer is derived from real SQL execution on your actual data.

📊 Auto-Generated Visualizations

  • Dynamic Charts: Bar, Line, and Pie charts rendered automatically using Recharts.
  • Smart Detection: Gemini analyzes query results and generates chart data only when a visualization is appropriate.
  • Null Handling: If data isn't chartable, the chart section is gracefully skipped.

🔍 Data Quality Analysis

  • Instant Profiling: On CSV upload, get row/column counts, missing values, duplicate rows, and column types.
  • Expandable Details: View per-column data types and missing value counts.
  • Backend-Powered: /api/analyze endpoint performs deep Pandas analysis.

🧠 AI-Suggested Questions

  • Dataset-Aware: On upload, Gemini generates 5 smart questions tailored to your dataset's schema.
  • Click-to-Query: Click any suggestion to instantly ask it.
  • Token-Efficient: Only column names and types are sent to the LLM (no raw data).

📥 Export Results

  • CSV Export: Spreadsheet-friendly format for further analysis.
  • Excel (.xlsx): Formatted workbook powered by OpenPyXL.
  • PDF Report: Styled Q&A report generated with ReportLab.

🔒 Scalability & Privacy

  • SQLite on Disk: CSV is ingested into a temporary SQLite database — data lives on disk, not in RAM.
  • Auto-Cleanup: Database file is deleted after every request. No data persists on server.
  • Handles Large Datasets: Unlike in-memory Pandas, SQLite can process millions of rows without crashing.

🛠️ Technology Stack

Domain Tech Used
Frontend React 19 (Vite), Tailwind CSS v4
Charts Recharts
Icons Lucide React
Routing React Router DOM
Backend Python FastAPI
Database SQLite (on-disk, temporary)
AI / LLM Google Gemini 2.5 Flash (via LangChain)
Data Ingestion Pandas (CSV → SQLite only)
Export OpenPyXL (Excel), ReportLab (PDF)
Tools Git, GitHub, Vite, Uvicorn

🏗️ System Architecture

  1. Frontend: React SPA with landing page (/) and dashboard (/app). Communicates with the backend via REST APIs.
  2. Backend: FastAPI server handles file uploads, SQL generation, query execution, and export.
  3. AI Pipeline (2 LLM Calls per Query):
    • Call #1: Gemini receives the database schema and generates a raw SQLite SELECT query.
    • Call #2: Gemini receives the SQL results and produces a structured JSON response (answer + explanation + chart data).
  4. Data Flow:
    • CSV → Pandas (ingestion only) → SQLite on disk → DataFrame deleted from RAM.
    • SQL executed via Python's built-in sqlite3 → results formatted by Gemini → response sent to frontend.
    • SQLite DB file deleted after every request.

📁 Project Structure

DataIntel-AI/
├── backend/                        # Python FastAPI backend
│   ├── main.py                     # API endpoints & AI pipeline
│   └── requirements.txt            # Python dependencies
│
├── frontend/                       # React frontend
│   ├── src/
│   │   ├── components/
│   │   │   ├── LandingPage.jsx     # Landing page with hero & features
│   │   │   ├── Sidebar.jsx         # File upload, data quality, suggestions
│   │   │   ├── ChatArea.jsx        # Message thread, input, export
│   │   │   ├── AIMessage.jsx       # AI response with charts
│   │   │   ├── UserMessage.jsx     # User message bubble
│   │   │   └── FileUpload.jsx      # Drag & drop CSV upload
│   │   ├── App.jsx                 # Dashboard (main app)
│   │   ├── main.jsx                # Router setup
│   │   └── index.css               # Global styles & animations
│   ├── package.json
│   └── vite.config.js
│
├── WORKFLOW.md                     # Platform workflow for judges
├── .gitignore
└── README.md

🚀 Getting Started

Prerequisites

  • Node.js (v18 or higher)
  • Python 3.10+
  • Google Gemini API Key (Get one here)

Installation

  1. Clone the Repository

    git clone https://github.com/thisisAtharv/DataIntel-AI.git
    cd DataIntel-AI
  2. Backend Setup

    cd backend
    pip install -r requirements.txt

    Update the GOOGLE_API_KEY in main.py with your Gemini API key.

  3. Frontend Setup

    cd frontend
    npm install
  4. Run the Application

    # Terminal 1 — Backend
    cd backend
    python main.py
    
    # Terminal 2 — Frontend
    cd frontend
    npm run dev
  5. Open http://localhost:5173 in your browser.


🔄 API Endpoints

Method Endpoint Description
POST /api/chat Send a query + CSV, get AI response with chart data
POST /api/analyze Upload CSV, get data quality info (missing, duplicates, types)
POST /api/suggest Upload CSV, get 5 AI-generated suggested questions
POST /api/export Export chat results as CSV, XLSX, or PDF
GET / Health check

🧠 Learning Outcomes

  • Building a Text-to-SQL pipeline with real code execution and zero hallucination.
  • Designing a 2-call optimized LLM architecture to minimize API usage.
  • Implementing scalable data processing with SQLite on disk instead of in-memory Pandas.
  • Creating dynamic data visualizations with conditional chart rendering.
  • Building a production-grade React frontend with skeleton loaders, error handling, and export functionality.

🔮 Future Enhancements

  • 🔗 Multi-Table Support: Upload multiple CSVs and query across joined tables.
  • 🔐 User Authentication: Login system with saved datasets and query history.
  • ☁️ Cloud Deployment: Deploy backend on AWS/GCP with persistent database.
  • 📱 Mobile Responsiveness: Optimized UI for tablets and smartphones.

About

Conversational Data Intelligence Platform - upload any CSV, ask questions in plain English, get verified SQL-backed answers with auto-generated charts.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors