Numerical Reasoning Assistant
An AI-powered assistant designed to solve complex mathematical reasoning problems using RAG (Retrieval-Augmented Generation). It leverages the GSM8K dataset, Google Gemini for both embeddings and reasoning, and provides a Python execution environment to verify results.
Features
- RAG-based Reasoning: Uses relevant examples from the GSM8K dataset to improve solution accuracy.
- Python Execution: Automatically extracts and runs Python code generated by the LLM to verify numerical answers.
- Quota Management: Automatically falls back to direct solving if embedding API limits are reached.
- Interactive UI: Built with Streamlit for a seamless user experience.
Prerequisites
- Python 3.9+
- A Google API Key (for Gemini)
Installation
-
Clone the repository:
-
Set up a virtual environment:
python -m venv .venv .\.venv\Scripts\Activate.ps1 #windows
-
Install dependencies:
pip install -r requirements.txt
-
Configure Environment Variables: Create a
.envfile in the root directory and add your Google API key:GOOGLE_API_KEY=your_api_key_here
Data Ingestion
Before running the app, you need to populate the vector database with the GSM8K dataset. You can downlaod the GSM8K dataset from here.
- Ensure the dataset (
train-00000-of-00001.parquet) is available at the path specified indata_ingestion.py. - Run the ingestion script:
Note: This process uses Gemini embeddings and includes sleeps to respect rate limits.
python data_ingestion.py
Running the Application
Start the Streamlit dashboard:
streamlit run app.pyOpen the provided URL in your browser to start solving math problems!
Project Structure
app.py: The main Streamlit application.rag_chain.py: Core logic for RAG and LLM integration.data_ingestion.py: Script to load and persist data to ChromaDB.requirements.txt: List of Python dependencies.chroma_db/: Directory containing the persisted vector database..env: Environment variables (API keys).