This Proof of Concept (PoC) demonstrates an advanced Agentic Retrieval-Augmented Generation (RAG) system using OpenWebUI, LangGraph, LangChain, and LlamaIndex.
Instead of a traditional linear RAG pipeline that just searches a vector database, this system utilizes a LangGraph Agent. The agent acts as an intelligent reasoning engine that can dynamically route user queries and invoke functions (Tools) to query multiple separate data sources to synthesize a final answer.
The system is composed of four containerized services running via Docker Compose:
- Agent API (
agent-api): A custom FastAPI backend running the LangGraph Agent. It exposes an OpenAI-compatible/chat/completionsendpoint. - OpenWebUI (
open-webui): The user-facing web interface. It connects to the Agent API as its backend LLM provider. - Qdrant (
qdrant): The Vector Database used to store and retrieve dense vector embeddings of product descriptions and documentation. - PostgreSQL (
postgres): The SQL Database used to query relational, real-time data such as product stock availability and store locations.
When a user asks a question in OpenWebUI, the query is sent to the Agent API. The LangGraph agent (powered by a local LLM, such as through LM Studio or Ollama) evaluates the query and decides which of its available tools to use:
search_products(query): Uses LlamaIndex to query Qdrant for semantic similarity matches in product documentation.check_availability(product_name): Queries the PostgreSQL database to check real-time stock levels across different store locations.get_store_policies(topic): Retrieves internal store rules regarding warranties, returns, and discounts.
The agent can use these tools sequentially or combine their outputs to provide a comprehensive, friendly answer to the user.
The following diagram illustrates the interaction between the user, the LangGraph safety guardrails, the reasoning agent, and the dual-database architecture.
sequenceDiagram
actor Human
participant UI as OpenWebUI
participant API as Agent API (Server)
participant IG as Input Guardrail
participant Agent as LangGraph Agent
participant LLM as Local LLM
participant Tools as Tool Executor
participant Vect as Qdrant (Vector DB)
participant SQL as PostgreSQL (SQL DB)
participant OG as Output Guardrail
Human->>UI: Asks product question
UI->>API: HTTP POST /chat/completions (History + Query)
API->>IG: Routes to Input Guardrail
IG->>LLM: Classify intent (Safe/Unsafe)
LLM-->>IG: SAFE
IG->>Agent: Route to Main Agent
loop Dynamic Reasoning Cycle
Agent->>LLM: Generate strategy using decentralized docstrings
LLM-->>Agent: Emits Tool Call (e.g., search_products)
Agent->>Tools: Invokes Tool Request
alt RAG Search
Tools->>Vect: Query Similarity Embeddings (Threshold > 0.7)
Vect-->>Tools: Top K Document Matches + Policy Footer
else SQL Stock Search
Tools->>SQL: Query ProductStoreAvailability (ILIKE)
SQL-->>Tools: Real-Time Inventory Quantities
else Policy Retrieval
Tools->>Tools: get_store_policies(topic)
Tools-->>Agent: Internal Store Policy Rules
end
Tools-->>Agent: Returns Tool Results
end
Agent->>LLM: Synthesize final answer given tool results
LLM-->>Agent: Generates initial response string
Agent->>OG: Passes to Output Guardrail
OG->>OG: Strip unsolicited conversational follow-ups
OG-->>API: Returns safe, sanitized final string
API-->>UI: Streaming HTTP Response
UI-->>Human: Displays absolute, correct answer
- Robust Tool Extraction: Native support for local LLMs (like Qwen or Llama 3) that embed JSON tool calls within conversational text. The custom parser safely extracts the tool call while preserving the conversational preamble.
- Dynamic Inventory: The system prompt is category-agnostic. It dynamically supports any retail product queries and gracefully handles items not in stock without hardcoding inventory categories.
- Semantic Guardrails: The Qdrant retriever enforces a strict minimum similarity score threshold
(>0.7). This prevents unrelated queries (e.g., "Bosch screwdriver") from mistakenly matching dissimilar products (e.g., "Espresso Machine"). - Conversation Memory:
src/api/server.pynatively parses full ChatML histories from OpenWebUI into LangChain sequences, ensuring seamless pronoun resolution (e.g., "What is its price?") without requiring external Redis cache.
- Docker & Docker Compose: Ensure Docker is installed and running on your machine.
- Local LLM: An OpenAI-compatible LLM endpoint running locally (e.g., LM Studio or Ollama).
- By default, the system expects an LLM to be exposed at
http://host.docker.internal:1234/v1(LM Studio's default). - Ensure your local LLM supports Tool Calling / Function Calling (e.g.,
meta-llama-3.1-8b-instructor similar).
- By default, the system expects an LLM to be exposed at
To trace and debug LLM calls with LangSmith:
- Copy the example environment file:
cp .env.example .env - Add your LangSmith API Key to the
.envfile. Thedocker-composesetup will automatically pass these variables to the backend agent.
Before starting the containers, you need to populate the databases with the sample data. Setup a Python virtual environment and run the ingestion script.
# Create a virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
pip install llama-index-embeddings-huggingface
# Start Postgres & Qdrant momentarily
docker-compose up -d postgres qdrant
# Run the ingestion script
export PYTHONPATH=src
python src/ingest.pyNote: The ingestion script uses the local BAAI/bge-small-en-v1.5 HuggingFace embedding model so no OpenAI API keys are required for embeddings.
Once the data is ingested, start the entire stack:
docker-compose up -d --buildNavigate to http://localhost:3000 in your web browser.
- You do not need to log in (authentication is disabled for the PoC).
- Look for the model dropdown menu and select
langgraph-agent. - Start chatting!
The Docker environment utilizes live volume mounting (- ./src:/app/src) and uvicorn --reload to enable hot-reloading of the Agent API during development.
Try asking the agent questions that require it to dynamically route between the unstructured Vector DB and the structured SQL DB:
- "What smartwatches have heart rate monitoring?" (Should trigger Vector DB)
- "Does the TechPro X1 have a warranty?" (Should trigger Vector DB + Policy Footer)
- "Are there any TechPro X1 watches available in the Austin store?" (Should trigger SQL DB)
- "What is your detailed return policy?" (Should trigger explicit Policy Tool)
- "I want to buy a TechPro X1." (Should trigger SQL DB to check physical stock instead of hallucinating checkout)
├── docker-compose.yml # Infrastructure orchestration
├── Dockerfile # Agent API container build instructions
├── .gitignore # Safely ignores vector/SQL data volumes
├── requirements.txt # Python dependencies
└── src/
├── ingest.py # Script to seed Postgres and Qdrant
├── api/
│ └── server.py # FastAPI server Exposing OpenAI compatible endpoint
├── agent/
│ ├── graph.py # LangGraph StateGraph, Tool Nodes, and LLM Binding
│ └── tools.py # LangChain Tools (search_products, check_availability, policies)
├── core/
│ ├── config.py # Environment variables and configuration
│ └── models.py # SQLAlchemy ORM models
├── tests/ # Pytest automated test suites
│ ├── conftest.py # Global fixtures and test setup
│ ├── test_agent.py # Policy awareness, guardrails, and hallucination tests
│ ├── test_api.py # FastAPI routing and purchase intent redirection
│ └── test_retrieval.py# Qdrant and SQL tool logic tests
└── data/
└── sample_data.json # The raw sample data for ingestion
A comprehensive pytest test suite consisting of 19 dedicated assertion tests is available to validate Agent Logic, API Routing, and Database Retrieval mechanisms. Memory states are cryptographically randomized with UUIDs to guarantee atomic, idempotent test executions.
To execute the test suite, ensure the API container is running and use the following command:
docker exec airag_agent_api pytest src/tests/ -vThis ensures the agent strictly enforces anti-hallucination rules, session timeouts, and store policy awareness across future releases.