Skip to content

ChampionNinja/tigergraph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🐯 TigerGraph RAG Benchmark Arena

A comprehensive benchmarking platform that evaluates and compares the reasoning, latency, and accuracy of different LLM inference paradigms. Built with TigerGraph, Groq, and FastAPI, this project directly compares LLM-only, Vector RAG, and GraphRAG using a dense corpus of medical research papers on Diabetes.

Benchmark UI

Architecture

graph TD
    %% Styling
    classDef frontend fill:#1E293B,stroke:#4ADE80,stroke-width:2px,color:white;
    classDef backend fill:#1E293B,stroke:#F472B6,stroke-width:2px,color:white;
    classDef database fill:#0F172A,stroke:#38BDF8,stroke-width:2px,color:white;
    classDef llm fill:#451A03,stroke:#FBBF24,stroke-width:2px,color:white;
    classDef evaluator fill:#14532D,stroke:#A3E635,stroke-width:2px,color:white;

    %% Components
    UI["🖥️ React / Vite Dashboard"]:::frontend
    API["⚡ FastAPI Backend"]:::backend
    
    %% Databases
    VectorDB[("📚 ChromaDB Vector Store")]:::database
    TG[("🐯 TigerGraph Cloud")]:::database
    
    %% LLMs
    Groq1["🧠 Groq Llama 3.1"]:::llm
    Groq2["🧠 Groq Llama 3.1"]:::llm
    Groq3["🧠 Groq Llama 3.1"]:::llm

    %% Evaluators
    Judge["⚖️ Gemini 2.5 Flash Judge"]:::evaluator
    Bert["📊 BERTScore Evaluator"]:::evaluator

    %% Flow: Request
    UI -- "User Query" --> API
    
    %% Flow: Parallel Pipelines
    API -- "Pipeline 1: No Context" ---> Groq1
    
    API -- "Pipeline 2: Semantic Search" --> VectorDB
    VectorDB -- "Context Chunks" --> Groq2
    
    API -- "Pipeline 3: Graph Traversal" --> TG
    TG -- "Multi-hop Graph Context" --> Groq3

    %% Flow: Evaluation
    Groq1 -- "Answer 1" --> Judge
    Groq2 -- "Answer 2" --> Judge
    Groq3 -- "Answer 3" --> Judge
    
    Groq1 -- "Answer 1" --> Bert
    Groq2 -- "Answer 2" --> Bert
    Groq3 -- "Answer 3" --> Bert

    %% Flow: Response
    Judge -- "Pass/Fail" --> API
    Bert -- "Score" --> API
    API -- "Aggregated Metrics & Results" --> UI
Loading

🌟 Key Features

  • Three Reasoning Engines Head-to-Head:
    • LLM-Only: Parametric memory only, no retrieval.
    • Basic RAG: Standard vector retrieval using sentence embeddings over the document corpus.
    • GraphRAG: TigerGraph-powered semantic multi-hop retrieval traversing entities, relationships, and context for unparalleled accuracy.
  • Blazing Fast Inference: Powered by Groq's LPU inference engine utilizing llama-3.1-8b-instant.
  • Live LLM Judge: Every answer is aggressively graded by an automated LLM Judge (Gemini 2.5 Flash) for factual consistency and hallucination detection.
  • BERTScore Evaluation: Measures semantic similarity against a highly grounded reference answer.
  • Beautiful Dashboard: A stunning, modern React/Vite UI that renders answers, latencies, token counts, and judge evaluations in real-time.

🛠️ Tech Stack

  • Graph Database: TigerGraph Cloud
  • LLM Engine: Groq (Llama 3.1)
  • Evaluator Judge: Google Gemini (2.5 Flash)
  • Backend API: FastAPI / Python
  • Frontend UI: React + Vite + TailwindCSS + Framer Motion
  • Embeddings: SentenceTransformers / ChromaDB (for basic RAG)

🚀 Getting Started

1. Prerequisites

You will need API keys for Groq and Gemini, as well as a TigerGraph Cloud instance.

2. Environment Setup

Create a .env file in the root directory with the following variables:

# TigerGraph Configuration
TG_HOST=https://your-tigergraph-domain.i.tgcloud.io
TG_USERNAME=your_email@domain.com
TG_TGCLOUD=true
TG_SECRET=your_secret_key
TG_GRAPH=your_graph_name

# LLM Providers
GROQ_API_KEY=gsk_your_groq_key_here
GEMINI_API_KEY=your_gemini_key_here

# Model Selection (Defaults provided)
LLM_MODEL=llama-3.1-8b-instant
JUDGE_MODEL=gemini-2.5-flash

3. Backend Setup

Set up a virtual environment and run the backend API.

# Create and activate virtual environment
python -m venv .venv
.\.venv\Scripts\activate  # Windows
# source .venv/bin/activate # Mac/Linux

# Install dependencies (ensure fastapi, uvicorn, groq, google-genai, etc. are installed)
pip install -r requirements.txt

# Start the FastAPI benchmark server
python -m uvicorn benchmark_api:app --host 127.0.0.1 --port 8010

4. Frontend Setup

Navigate to the UI folder, install dependencies, and start the development server.

cd graphrag/graphrag-ui
npm install
npm run dev

Visit http://localhost:5173 to access the Benchmark Dashboard and run your first comparison!

🧪 How the Benchmarking Works

  1. User Query: The user asks a complex question (e.g., "How does insulin resistance relate to obesity?").
  2. Parallel Processing: The FastAPI backend routes the query to three pipelines simultaneously:
    • Direct LLM inference.
    • Vector Search -> LLM synthesis.
    • TigerGraph searchDocuments -> LLM synthesis.
  3. Reference Generation: A highly grounded reference answer is generated.
  4. Evaluation: Gemini 2.5 Flash acts as a judge, assigning a strict PASS/FAIL based on material correctness. BERTScore measures semantic similarity to the reference.
  5. Results Rendering: The UI displays latency, token cost, text output, and the judge's verdict side-by-side.

About

A benchmarking platform for comparing LLM-only, Vector RAG, and GraphRAG inference pipelines across reasoning, latency, and accuracy using diabetes-focused medical research papers. Built with TigerGraph, Groq, and FastAPI.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors