🐯 TigerGraph RAG Benchmark Arena

A comprehensive benchmarking platform that evaluates and compares the reasoning, latency, and accuracy of different LLM inference paradigms. Built with TigerGraph, Groq, and FastAPI, this project directly compares LLM-only, Vector RAG, and GraphRAG using a dense corpus of medical research papers on Diabetes.

Architecture

graph TD
    %% Styling
    classDef frontend fill:#1E293B,stroke:#4ADE80,stroke-width:2px,color:white;
    classDef backend fill:#1E293B,stroke:#F472B6,stroke-width:2px,color:white;
    classDef database fill:#0F172A,stroke:#38BDF8,stroke-width:2px,color:white;
    classDef llm fill:#451A03,stroke:#FBBF24,stroke-width:2px,color:white;
    classDef evaluator fill:#14532D,stroke:#A3E635,stroke-width:2px,color:white;

    %% Components
    UI["🖥️ React / Vite Dashboard"]:::frontend
    API["⚡ FastAPI Backend"]:::backend
    
    %% Databases
    VectorDB[("📚 ChromaDB Vector Store")]:::database
    TG[("🐯 TigerGraph Cloud")]:::database
    
    %% LLMs
    Groq1["🧠 Groq Llama 3.1"]:::llm
    Groq2["🧠 Groq Llama 3.1"]:::llm
    Groq3["🧠 Groq Llama 3.1"]:::llm

    %% Evaluators
    Judge["⚖️ Gemini 2.5 Flash Judge"]:::evaluator
    Bert["📊 BERTScore Evaluator"]:::evaluator

    %% Flow: Request
    UI -- "User Query" --> API
    
    %% Flow: Parallel Pipelines
    API -- "Pipeline 1: No Context" ---> Groq1
    
    API -- "Pipeline 2: Semantic Search" --> VectorDB
    VectorDB -- "Context Chunks" --> Groq2
    
    API -- "Pipeline 3: Graph Traversal" --> TG
    TG -- "Multi-hop Graph Context" --> Groq3

    %% Flow: Evaluation
    Groq1 -- "Answer 1" --> Judge
    Groq2 -- "Answer 2" --> Judge
    Groq3 -- "Answer 3" --> Judge
    
    Groq1 -- "Answer 1" --> Bert
    Groq2 -- "Answer 2" --> Bert
    Groq3 -- "Answer 3" --> Bert

    %% Flow: Response
    Judge -- "Pass/Fail" --> API
    Bert -- "Score" --> API
    API -- "Aggregated Metrics & Results" --> UI

🌟 Key Features

Three Reasoning Engines Head-to-Head:
- LLM-Only: Parametric memory only, no retrieval.
- Basic RAG: Standard vector retrieval using sentence embeddings over the document corpus.
- GraphRAG: TigerGraph-powered semantic multi-hop retrieval traversing entities, relationships, and context for unparalleled accuracy.
Blazing Fast Inference: Powered by Groq's LPU inference engine utilizing llama-3.1-8b-instant.
Live LLM Judge: Every answer is aggressively graded by an automated LLM Judge (Gemini 2.5 Flash) for factual consistency and hallucination detection.
BERTScore Evaluation: Measures semantic similarity against a highly grounded reference answer.
Beautiful Dashboard: A stunning, modern React/Vite UI that renders answers, latencies, token counts, and judge evaluations in real-time.

🛠️ Tech Stack

Graph Database: TigerGraph Cloud
LLM Engine: Groq (Llama 3.1)
Evaluator Judge: Google Gemini (2.5 Flash)
Backend API: FastAPI / Python
Frontend UI: React + Vite + TailwindCSS + Framer Motion
Embeddings: SentenceTransformers / ChromaDB (for basic RAG)

🚀 Getting Started

1. Prerequisites

You will need API keys for Groq and Gemini, as well as a TigerGraph Cloud instance.

2. Environment Setup

Create a .env file in the root directory with the following variables:

# TigerGraph Configuration
TG_HOST=https://your-tigergraph-domain.i.tgcloud.io
TG_USERNAME=your_email@domain.com
TG_TGCLOUD=true
TG_SECRET=your_secret_key
TG_GRAPH=your_graph_name

# LLM Providers
GROQ_API_KEY=gsk_your_groq_key_here
GEMINI_API_KEY=your_gemini_key_here

# Model Selection (Defaults provided)
LLM_MODEL=llama-3.1-8b-instant
JUDGE_MODEL=gemini-2.5-flash

3. Backend Setup

Set up a virtual environment and run the backend API.

# Create and activate virtual environment
python -m venv .venv
.\.venv\Scripts\activate  # Windows
# source .venv/bin/activate # Mac/Linux

# Install dependencies (ensure fastapi, uvicorn, groq, google-genai, etc. are installed)
pip install -r requirements.txt

# Start the FastAPI benchmark server
python -m uvicorn benchmark_api:app --host 127.0.0.1 --port 8010

4. Frontend Setup

Navigate to the UI folder, install dependencies, and start the development server.

cd graphrag/graphrag-ui
npm install
npm run dev

Visit http://localhost:5173 to access the Benchmark Dashboard and run your first comparison!

🧪 How the Benchmarking Works

User Query: The user asks a complex question (e.g., "How does insulin resistance relate to obesity?").
Parallel Processing: The FastAPI backend routes the query to three pipelines simultaneously:
- Direct LLM inference.
- Vector Search -> LLM synthesis.
- TigerGraph searchDocuments -> LLM synthesis.
Reference Generation: A highly grounded reference answer is generated.
Evaluation: Gemini 2.5 Flash acts as a judge, assigning a strict PASS/FAIL based on material correctness. BERTScore measures semantic similarity to the reference.
Results Rendering: The UI displays latency, token cost, text output, and the judge's verdict side-by-side.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
graphrag		graphrag
img		img
ingestion		ingestion
papers		papers
.gitignore		.gitignore
RAG_vs_GraphRAG_Literature_Review.md		RAG_vs_GraphRAG_Literature_Review.md
README.md		README.md
apply_schema.py		apply_schema.py
apply_vector_schema.py		apply_vector_schema.py
backfill_graphrag_entities.py		backfill_graphrag_entities.py
basic_rag_demo.py		basic_rag_demo.py
basic_rag_pipeline.py		basic_rag_pipeline.py
benchmark_api.py		benchmark_api.py
diabetes_graphrag_pipeline.py		diabetes_graphrag_pipeline.py
dwdpubhealth.py		dwdpubhealth.py
er_mapping.json		er_mapping.json
estimate_ingestion.py		estimate_ingestion.py
files		files
graph_retriever.py		graph_retriever.py
ingest_diabetes.py		ingest_diabetes.py
ingest_tb_20.py		ingest_tb_20.py
probe_py.txt		probe_py.txt
process_sample.py		process_sample.py
run_graphrag_rebuild.py		run_graphrag_rebuild.py
simple_rag.py		simple_rag.py
test.py		test.py
test_file.py		test_file.py
test_retrieval.py		test_retrieval.py
try_graphs.py		try_graphs.py
verify_counts.py		verify_counts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐯 TigerGraph RAG Benchmark Arena

Architecture

🌟 Key Features

🛠️ Tech Stack

🚀 Getting Started

1. Prerequisites

2. Environment Setup

3. Backend Setup

4. Frontend Setup

🧪 How the Benchmarking Works

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🐯 TigerGraph RAG Benchmark Arena

Architecture

🌟 Key Features

🛠️ Tech Stack

🚀 Getting Started

1. Prerequisites

2. Environment Setup

3. Backend Setup

4. Frontend Setup

🧪 How the Benchmarking Works

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages