Skip to content

sunnyvale-it/ai-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agentic RAG Proof of Concept

This Proof of Concept (PoC) demonstrates an advanced Agentic Retrieval-Augmented Generation (RAG) system using OpenWebUI, LangGraph, LangChain, and LlamaIndex.

Instead of a traditional linear RAG pipeline that just searches a vector database, this system utilizes a LangGraph Agent. The agent acts as an intelligent reasoning engine that can dynamically route user queries and invoke functions (Tools) to query multiple separate data sources to synthesize a final answer.

🏗️ Architecture

The system is composed of four containerized services running via Docker Compose:

  1. Agent API (agent-api): A custom FastAPI backend running the LangGraph Agent. It exposes an OpenAI-compatible /chat/completions endpoint.
  2. OpenWebUI (open-webui): The user-facing web interface. It connects to the Agent API as its backend LLM provider.
  3. Qdrant (qdrant): The Vector Database used to store and retrieve dense vector embeddings of product descriptions and documentation.
  4. PostgreSQL (postgres): The SQL Database used to query relational, real-time data such as product stock availability and store locations.

🧠 How the Agent Works

When a user asks a question in OpenWebUI, the query is sent to the Agent API. The LangGraph agent (powered by a local LLM, such as through LM Studio or Ollama) evaluates the query and decides which of its available tools to use:

  • search_products(query): Uses LlamaIndex to query Qdrant for semantic similarity matches in product documentation.
  • check_availability(product_name): Queries the PostgreSQL database to check real-time stock levels across different store locations.
  • get_store_policies(topic): Retrieves internal store rules regarding warranties, returns, and discounts.

The agent can use these tools sequentially or combine their outputs to provide a comprehensive, friendly answer to the user.

🔄 Execution Flow (UML Sequence)

The following diagram illustrates the interaction between the user, the LangGraph safety guardrails, the reasoning agent, and the dual-database architecture.

sequenceDiagram
    actor Human
    participant UI as OpenWebUI
    participant API as Agent API (Server)
    participant IG as Input Guardrail
    participant Agent as LangGraph Agent
    participant LLM as Local LLM
    participant Tools as Tool Executor
    participant Vect as Qdrant (Vector DB)
    participant SQL as PostgreSQL (SQL DB)
    participant OG as Output Guardrail

    Human->>UI: Asks product question
    UI->>API: HTTP POST /chat/completions (History + Query)
    
    API->>IG: Routes to Input Guardrail
    IG->>LLM: Classify intent (Safe/Unsafe)
    LLM-->>IG: SAFE
    IG->>Agent: Route to Main Agent
    
    loop Dynamic Reasoning Cycle
        Agent->>LLM: Generate strategy using decentralized docstrings
        LLM-->>Agent: Emits Tool Call (e.g., search_products)
        Agent->>Tools: Invokes Tool Request
        
        alt RAG Search
            Tools->>Vect: Query Similarity Embeddings (Threshold > 0.7)
            Vect-->>Tools: Top K Document Matches + Policy Footer
        else SQL Stock Search
            Tools->>SQL: Query ProductStoreAvailability (ILIKE)
            SQL-->>Tools: Real-Time Inventory Quantities
        else Policy Retrieval
            Tools->>Tools: get_store_policies(topic)
            Tools-->>Agent: Internal Store Policy Rules
        end
        
        Tools-->>Agent: Returns Tool Results
    end
    
    Agent->>LLM: Synthesize final answer given tool results
    LLM-->>Agent: Generates initial response string
    Agent->>OG: Passes to Output Guardrail
    OG->>OG: Strip unsolicited conversational follow-ups
    OG-->>API: Returns safe, sanitized final string
    API-->>UI: Streaming HTTP Response
    UI-->>Human: Displays absolute, correct answer
Loading

✨ Advanced Features

  • Robust Tool Extraction: Native support for local LLMs (like Qwen or Llama 3) that embed JSON tool calls within conversational text. The custom parser safely extracts the tool call while preserving the conversational preamble.
  • Dynamic Inventory: The system prompt is category-agnostic. It dynamically supports any retail product queries and gracefully handles items not in stock without hardcoding inventory categories.
  • Semantic Guardrails: The Qdrant retriever enforces a strict minimum similarity score threshold (>0.7). This prevents unrelated queries (e.g., "Bosch screwdriver") from mistakenly matching dissimilar products (e.g., "Espresso Machine").
  • Conversation Memory: src/api/server.py natively parses full ChatML histories from OpenWebUI into LangChain sequences, ensuring seamless pronoun resolution (e.g., "What is its price?") without requiring external Redis cache.

⚙️ Prerequisites

  1. Docker & Docker Compose: Ensure Docker is installed and running on your machine.
  2. Local LLM: An OpenAI-compatible LLM endpoint running locally (e.g., LM Studio or Ollama).
    • By default, the system expects an LLM to be exposed at http://host.docker.internal:1234/v1 (LM Studio's default).
    • Ensure your local LLM supports Tool Calling / Function Calling (e.g., meta-llama-3.1-8b-instruct or similar).

🚀 Getting Started

(Optional) Configure Observability

To trace and debug LLM calls with LangSmith:

  1. Copy the example environment file: cp .env.example .env
  2. Add your LangSmith API Key to the .env file. The docker-compose setup will automatically pass these variables to the backend agent.

1. Ingest Data

Before starting the containers, you need to populate the databases with the sample data. Setup a Python virtual environment and run the ingestion script.

# Create a virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt
pip install llama-index-embeddings-huggingface

# Start Postgres & Qdrant momentarily
docker-compose up -d postgres qdrant

# Run the ingestion script
export PYTHONPATH=src
python src/ingest.py

Note: The ingestion script uses the local BAAI/bge-small-en-v1.5 HuggingFace embedding model so no OpenAI API keys are required for embeddings.

2. Start the Application

Once the data is ingested, start the entire stack:

docker-compose up -d --build

3. Open the UI

Navigate to http://localhost:3000 in your web browser.

  1. You do not need to log in (authentication is disabled for the PoC).
  2. Look for the model dropdown menu and select langgraph-agent.
  3. Start chatting!

⚙️ Developer Environment

The Docker environment utilizes live volume mounting (- ./src:/app/src) and uvicorn --reload to enable hot-reloading of the Agent API during development.

🧪 Example Queries to Try

Try asking the agent questions that require it to dynamically route between the unstructured Vector DB and the structured SQL DB:

  • "What smartwatches have heart rate monitoring?" (Should trigger Vector DB)
  • "Does the TechPro X1 have a warranty?" (Should trigger Vector DB + Policy Footer)
  • "Are there any TechPro X1 watches available in the Austin store?" (Should trigger SQL DB)
  • "What is your detailed return policy?" (Should trigger explicit Policy Tool)
  • "I want to buy a TechPro X1." (Should trigger SQL DB to check physical stock instead of hallucinating checkout)

🛠️ Project Structure

├── docker-compose.yml       # Infrastructure orchestration
├── Dockerfile               # Agent API container build instructions
├── .gitignore               # Safely ignores vector/SQL data volumes
├── requirements.txt         # Python dependencies
└── src/
    ├── ingest.py            # Script to seed Postgres and Qdrant
    ├── api/
    │   └── server.py        # FastAPI server Exposing OpenAI compatible endpoint
    ├── agent/
    │   ├── graph.py         # LangGraph StateGraph, Tool Nodes, and LLM Binding
    │   └── tools.py         # LangChain Tools (search_products, check_availability, policies)
    ├── core/
    │   ├── config.py        # Environment variables and configuration
    │   └── models.py        # SQLAlchemy ORM models
    ├── tests/               # Pytest automated test suites
    │   ├── conftest.py      # Global fixtures and test setup
    │   ├── test_agent.py    # Policy awareness, guardrails, and hallucination tests
    │   ├── test_api.py      # FastAPI routing and purchase intent redirection
    │   └── test_retrieval.py# Qdrant and SQL tool logic tests
    └── data/
        └── sample_data.json # The raw sample data for ingestion

Running Automated Tests

A comprehensive pytest test suite consisting of 19 dedicated assertion tests is available to validate Agent Logic, API Routing, and Database Retrieval mechanisms. Memory states are cryptographically randomized with UUIDs to guarantee atomic, idempotent test executions.

To execute the test suite, ensure the API container is running and use the following command:

docker exec airag_agent_api pytest src/tests/ -v

This ensures the agent strictly enforces anti-hallucination rules, session timeouts, and store policy awareness across future releases.

About

A Proof of Concept of an Artificial Intelligence RAG (Retrieval Augmented Generation)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors