feat(rag): add hybrid search using RRF score fusion by nancysangani · Pull Request #492 · param20h/PDF-Assistant-RAG

nancysangani · 2026-06-06T06:40:20Z

🔗 Related Issue

Closes #440

📝 What does this PR do?

Replaces the fake RRF approximation in retriever.py with a correct
Reciprocal Rank Fusion implementation and removes the EnsembleRetriever
dependency.

backend/app/rag/retriever.py:

Adds rrf_merge(vector_results, bm25_results, k) — implements the standard
RRF formula score(d) = Σ 1/(k + rank) across both ranked lists, deduplicates
by content key, and returns chunks sorted by descending RRF score.
Removes EnsembleRetriever / CustomVectorRetriever / CustomBM25Retriever
LangChain wrapper classes — query_chunks and query_bm25 are called directly,
giving full control over each ranked list before fusion.
retrieve() now calls embed_query → query_chunks → query_bm25 →
rrf_merge per query variant, then promotes rrf_score → score before
passing candidates to the cross-encoder reranker. Existing reranking and
confidence normalisation logic is unchanged.
Falls back to vector-only when USE_HYBRID_SEARCH=False or BM25 raises.

backend/app/config.py:

Adds USE_HYBRID_SEARCH: bool = True — toggle hybrid search without
redeploying.
Adds RRF_K: int = 60 — exposes the RRF smoothing constant; 60 is the
value from the original RRF paper and the standard production default.

🗂️ Type of Change

✨ New feature
🔧 Refactor / code cleanup

🧪 How was this tested?

Ran the backend locally (uvicorn app.main:app --reload)
Queried a multi-document collection; confirmed RRF scores present on
returned chunks and that chunks appearing in both lists score higher
than single-list results
Set USE_HYBRID_SEARCH=False; confirmed vector-only path runs and
query_bm25 is never called
Removed rank_bm25 from env; confirmed graceful fallback to
vector-only via the except guard
Confirmed reranker and confidence normalisation are unaffected

✅ Self-Review Checklist

My branch is based on dev, not main
I have not added any secrets / API keys
I have not modified main branch or any HuggingFace deployment config
My code follows the existing style (no unnecessary formatting changes)
I have updated relevant docs / comments if needed

nancysangani · 2026-06-06T06:42:37Z

Hi @param20h, I have opened this PR to fix the issue #440. Please review it when you get a chance. Thanks!

feat(rag): add hybrid search using RRF score fusion

f3269ed

nancysangani requested a review from param20h as a code owner June 6, 2026 06:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rag): add hybrid search using RRF score fusion#492

feat(rag): add hybrid search using RRF score fusion#492
nancysangani wants to merge 1 commit into
param20h:devfrom
nancysangani:feat/hybrid-search-rrf

nancysangani commented Jun 6, 2026

Uh oh!

nancysangani commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nancysangani commented Jun 6, 2026

🔗 Related Issue

📝 What does this PR do?

🗂️ Type of Change

🧪 How was this tested?

✅ Self-Review Checklist

Uh oh!

nancysangani commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant