Smart command-line AI assistant with caching, multi-provider support, and context memory.
Get instant responses from cached queries, or enhanced responses using RAG from similar past conversations. Automatically routes to the best AI provider for each task type.
# Clone and setup
git clone <repository-url>
cd ai-cli-tool
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Start vector database
cd docker
docker compose up -d
# Configure API keys
cp .env.example .env
# Edit .env with your API keys (need at least one):
# OPENAI_API_KEY=your_key
# CLAUDE_API_KEY=your_key
# GEMINI_API_KEY=your_key# Always activate virtual environment first
source venv/bin/activate
# Ask any question - automatic provider selection
python main.py ask "What is machine learning?"
# Coding questions β routes to coding provider
python main.py ask "Write a Python function to reverse a string"
# Analysis questions β routes to reasoning provider
python main.py ask "Compare Python vs JavaScript for web development"
# General conversation β routes to general provider
python main.py ask "Hello! How are you today?"# First time - calls AI provider
$ python main.py ask "What is artificial intelligence?"
π€ AI RESPONSE
π Provider: gemini/gemini-1.5-flash | Category: general
π° API Call: Made fresh request to gemini/gemini-1.5-flash
# Exact same question - instant cache hit
$ python main.py ask "What is artificial intelligence?"
π CACHE HIT (PERFECT) - Similarity: 1.000
π¦ Original Source: gemini/gemini-1.5-flash | Category: general
β‘ Response Time: Instant (0 API calls)# Similar but different question gets RAG context
$ python main.py ask "Explain AI and machine learning differences"
π€ AI RESPONSE + RAG Context
π Provider: openai/gpt-4o | Category: reasoning
π° API Call: Made fresh request to openai/gpt-4o
π Enhanced: Using context from previous interactions# Coding β Claude (if configured) or OpenAI
python main.py ask "Debug this Python code: def hello(): print('hi'"
# Analysis β OpenAI GPT-4o
python main.py ask "Why is quantum computing important?"
# General chat β Gemini (fast and cost-effective)
python main.py ask "Good morning! Nice weather today"python main.py ask "Write a function" --provider openai
python main.py ask "Explain quantum physics" --provider gemini --model gemini-1.5-propython main.py ask "What's in this image?" --image photo.jpg
python main.py ask "Summarize this document" --file report.pdfpython main.py ask "Current weather" --no-cache # Skip cache
python main.py ask "Question" --threshold 0.95 # Higher similarity neededpython main.py providers # List available AI providers
python main.py stats # Usage statistics
python main.py configure # Check configurationπ CACHE HIT (PERFECT) - Similarity: 1.000
π¦ Original Source: openai/gpt-4o | Category: reasoning
β‘ Response Time: Instant (0 API calls)
π€ AI RESPONSE + RAG Context
π Provider: anthropic/claude-3-5-sonnet | Category: coding
π° API Call: Made fresh request to anthropic/claude-3-5-sonnet
π Enhanced: Using context from previous interactions
- π/π€ = Cache hit vs Fresh AI response
- Provider Used = Which AI service answered
- Category = How the question was classified (reasoning/coding/general)
- RAG Context = Whether similar past conversations enhanced the response
- Similarity Score = How closely it matched cached responses (0-1)
# At least one API key required:
OPENAI_API_KEY=your_openai_key
CLAUDE_API_KEY=your_claude_key
GEMINI_API_KEY=your_gemini_key
# Vector database (auto-configured):
QDRANT_HOST=127.0.0.1
QDRANT_PORT=6333# Override default provider assignments:
REASONING_PROVIDER=openai # For analysis questions
CODING_PROVIDER=anthropic # For programming questions
GENERAL_PROVIDER=gemini # For general conversation
# Specify models:
REASONING_MODEL=gpt-4o
CODING_MODEL=claude-3-5-sonnet-20241022
GENERAL_MODEL=gemini-1.5-flash# Code help with context memory
python main.py ask "Write a Python REST API using FastAPI"
python main.py ask "Add authentication to that API" # Uses previous context
python main.py ask "Write tests for the auth endpoints" # Enhanced with context# Build knowledge progressively
python main.py ask "What is machine learning?"
python main.py ask "What are neural networks?"
python main.py ask "How do neural networks relate to machine learning?" # Gets context# Frequently asked questions get cached
python main.py ask "How to install Python packages?" # First time: calls AI
python main.py ask "How to install Python packages?" # Second time: instant cache- β‘ Instant Responses - Cached answers in milliseconds
- π° Cost Savings - Avoid duplicate API calls
- π§ Context Memory - Enhanced responses using conversation history
- π― Smart Routing - Best AI for each question type
- π Usage Analytics - Track your AI usage patterns
# Qdrant not running?
docker run -d -p 6333:6333 qdrant/qdrant
# Missing dependencies?
source venv/bin/activate
pip install -r requirements.txt
# API key issues?
python main.py configure # Check configuration status
# Clear cache?
rm -rf qdrant_storage/ # Reset all cached datapython main.py --help # Main help
python main.py ask --help # Ask command options
python main.py stats # Usage statistics- Complete Documentation - Comprehensive guide with all features
- Architecture Details - How it works under the hood
- Development Guide - Contributing and extending
- Docker Deployment - Production setup
β
Saves Time - Instant answers for repeated questions
β
Saves Money - Cached responses reduce API costs
β
Context Aware - Remembers and uses conversation history
β
Multi-Provider - Best AI for each task automatically
β
Developer Friendly - CLI tool that integrates into workflows
β
Open Source - Fully customizable and extensible
Get started in 5 minutes and experience AI with memory! π