An intelligent document search system powered by Agentic RAG β it doesn't just retrieve, it thinks before answering.
| Accurate Policy Answer | Out-of-Scope Rejection |
|---|---|
![]() |
![]() |
| MCP Inspector Validation |
|---|
![]() |
Most RAG systems follow a simple pattern: take a user's question, fetch some documents, and generate an answer. The problem? They always answer β even when the retrieved documents have nothing to do with the question. This leads to hallucinations that look convincing but are completely wrong.
I built this project to tackle that exact issue. Instead of a basic retrieve-and-generate pipeline, this system uses an agentic approach β it evaluates whether the retrieved documents are actually relevant before deciding to answer. If they're not relevant, it says so honestly instead of making something up.
The knowledge base I'm using here is a set of 6 GitLab Security & Technology Policy documents (Access Management, Audit Logging, Change Management, Penetration Testing, SDLC, and Policy Governance).
The whole system is built as a LangGraph state graph with 4 nodes that work together:
User asks a question
β
βΌ
βββββββββββββ
β Retrieve β β Searches ChromaDB for the top-6 most relevant chunks
βββββββ¬ββββββ
β
βΌ
βββββββββββββββββ
β Grade Docs β β LLM checks: "Are these documents actually relevant?"
βββββββββ¬ββββββββ
β
ββββββ΄βββββ
β β
Relevant Not Relevant
β β
βΌ βΌ
ββββββββββ ββββββββββββ
βGenerateβ β No Answer β β Politely declines instead of hallucinating
ββββββ¬ββββ βββββββ¬βββββ
β β
βββββββ¬βββββββ
βΌ
Done
Generated using
graph.get_graph().draw_mermaid_png()β this is the real compiled graph, not a hand-drawn diagram.
Here are some of the decisions I made while building this and why:
| What | Choice | Reasoning |
|---|---|---|
| Framework | LangGraph StateGraph | Needed conditional branching β a regular LangChain chain can't skip nodes or route dynamically |
| Relevance Check | Separate LLM grading step | The grading happens before generation, so irrelevant docs never make it to the answer step |
| Grading Format | Plain yes/no text | Llama 3.1 8B doesn't handle structured JSON output reliably, simple text works consistently |
| When docs aren't relevant | Dedicated no_answer node |
Instead of looping or retrying, it gives an honest "I don't know" β prevents hallucination |
| How citations work | Each chunk gets a [Source: Policy | File:] header |
The LLM sees the source info right in its context, so it always knows where facts come from |
| Search strategy | Cosine similarity (not MMR) | MMR was pulling in chunks from unrelated policies for diversity β pure similarity is more accurate here |
| Conversation memory | LangGraph MemorySaver | Keeps chat history per session using thread_id β supports multi-turn conversations |
| Embeddings | all-MiniLM-L6-v2 (local) |
Runs offline, no API costs, no rate limits β good enough for this corpus size |
| LLM | Groq Llama 3.1 8B Instant | Sub-second responses, free tier works for development |
| MCP Integration | FastMCP server | Makes the RAG pipeline callable by external AI agents like Claude Desktop or VS Code Copilot |
Policy documents (.md files) go through this pipeline:
- Load β
UnstructuredMarkdownLoaderreads each file while preserving its structure - Clean β Strip out markdown formatting artifacts and extra whitespace
- Split β
RecursiveCharacterTextSplitterwithchunk_size=800andoverlap=150so information at chunk boundaries isn't lost - Tag β Each chunk gets metadata like
{ "policy_title": "Audit Logging Policy", "filename": "audit-logging-policy.md" } - Store β Chunks are embedded using
all-MiniLM-L6-v2and saved to ChromaDB
All nodes share a typed state:
class AgentState(TypedDict):
messages: Annotated[list[AnyMessage], add_messages]
documents: Optional[list[Document]]
doc_metadata: Optional[list[dict]]
next_action: Optional[Literal["generate", "no_answer"]]The doc_metadata field tracks which policy each chunk came from β this is what makes accurate citations possible.
retrieve_nodeβ Embeds the query, fetches 6 closest chunks from ChromaDB, stores documents + metadata in stategrade_documents_nodeβ Asks the LLM "are these docs relevant to the question?" and setsnext_actionto eithergenerateorno_answergenerate_nodeβ Builds a formatted context with source headers for each chunk, then generates a cited answerno_answer_nodeβ Returns a polite message saying it couldn't find relevant info
graph.add_conditional_edges(
"grade_documents",
lambda state: state["next_action"],
{"generate": "generate", "no_answer": "no_answer"}
)The app streams graph execution in real-time using stream_mode="updates", so users can see each step as it happens:
- Live agent path display (
retrieve β grade_documents β generate) - Inference time tracking
- Session metrics in the sidebar (query count, average response time, last path taken)
The RAG agent is also exposed as an MCP (Model Context Protocol) server, so external AI tools can query the policy documents directly.
What's exposed:
| Type | Name | What it does |
|---|---|---|
| Tool | search_security_policies |
Accepts a question, runs the full agent graph, returns the answer |
| Resource | policies://list |
Returns names of all indexed policy documents |
Run it:
cd mcp-server
python mcp_server.pyTest with MCP Inspector:
npx @modelcontextprotocol/inspector@0.14.3 python mcp_server.pyAgentic-Doc-Search-RAG/
βββ app.py # Streamlit chat interface
βββ save_graph.py # Generates the agent graph diagram
βββ agent_graph.png # Visual representation of the agent flow
βββ pyproject.toml # Dependencies
βββ .env # API keys (not committed)
β
βββ assets/ # Screenshots
β
βββ data/
β βββ security-and-technology-policies/
β βββ access-management-policy.md
β βββ audit-logging-policy.md
β βββ change-management-policy.md
β βββ penetration-testing-policy.md
β βββ software-development-lifecycle-policy.md
β βββ security-and-technology-policies-management.md
β
βββ mcp-server/
β βββ mcp_server.py # FastMCP server
β
βββ src/
βββ config.py # Environment variables and settings
βββ state.py # AgentState definition
βββ graph.py # Node implementations + graph compilation
βββ prompts.py # System prompts for the LLM
βββ data_ingestion.py # Document loading and chunking
βββ vector_store.py # ChromaDB setup and retriever
βββ schema.py # Data schemas
βββ utils.py # Helper functions
git clone https://github.com/tiirth22/Agentic-Doc-Search-RAG.git
cd Agentic-Doc-Search-RAGpython -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activatepip install -e .Create a .env file:
GROQ_API_KEY=your_key_here
You can get a free key from console.groq.com.
python -c "
from src.data_ingestion import DataIngestor
from src.vector_store import VectorStoreManager
ingestor = DataIngestor()
chunks = ingestor.load_and_split()
VectorStoreManager().create_vector_store(chunks)
"streamlit run app.py| Query | What happens | Result |
|---|---|---|
| "Who is responsible for implementing the Audit Logging Policy?" | retrieve β grade β generate |
Correctly identifies the Security Team |
| "What system tiers are in scope for Change Management?" | retrieve β grade β generate |
Lists Tiers 1-3 as in-scope, notes Tier 4 is excluded |
| "How often must penetration tests be conducted?" | retrieve β grade β generate |
At minimum annually + after significant system changes |
| "How do I reset my GitLab password?" | retrieve β grade β no_answer |
Recognizes this isn't in the documents, declines gracefully |
| "What is the company's vacation policy?" | retrieve β grade β no_answer |
Completely out of scope β handled without any hallucination |
| Traditional RAG | This Project |
|---|---|
| Always produces an answer | Only answers when documents are relevant |
| Can silently hallucinate | Routes to an honest "I don't know" |
| No visibility into the process | Shows the full agent path in real time |
| No source tracking | Every chunk is tagged with its source document |
| One-shot queries only | Multi-turn chat with conversation memory |
| Only works through one UI | Also available as an MCP tool for AI agents |
A few issues I hit during development that might help if you're building something similar:
-
Structured output failures β Llama 3.1 8B kept failing when I asked for JSON responses during grading. Switched to plain
yes/notext and it worked reliably. -
Environment variables not loading β Had to use
override=Trueinload_dotenv()because without it, the library skips variables that are already set in the system environment. -
State key mismatch β My grading node was returning
{"generate": "yes"}but the router expectedstate["next_action"]. Took a while to debug thatKeyError. -
MMR returning wrong documents β The diversity-first approach of MMR was pulling chunks from unrelated policies. Switching to standard similarity search fixed the accuracy.
-
Metadata access in generation β Instead of digging into LangChain
Document.metadataduring generation, I storedoc_metadataas its own state field during retrieval. Cleaner and less error-prone.
MIT License β see LICENSE for the full text.
Built by Tirth Jignesh Dalal
GitHub



