feat(rag): implement Multi-Query Expansion for BM25 search by Xenon010101 · Pull Request #331 · param20h/PDF-Assistant-RAG

Xenon010101 · 2026-06-01T11:04:17Z

Summary

Adds Multi-Query Expansion to improve BM25 retrieval in the RAG pipeline. The module generates paraphrased query variants via the LLM, runs BM25 search for each, and merges results using Reciprocal Rank Fusion (RRF).

Changes

`backend/app/rag/multi_query.py` (new)

BM25 class — Pure-Python BM25Okapi implementation (no new dependencies)
generate_query_variations() — Uses InferenceClient to generate 4 paraphrased query variants
reciprocal_rank_fusion() — Merges multiple ranked result lists with RRF (k=60)
multi_query_retrieve() — End-to-end pipeline: fetch chunks from ChromaDB, build BM25 index, generate variants, search each, RRF merge, top-K

`backend/app/rag/retriever.py`

When MULTI_QUERY_ENABLED=True (default), Stage 1 uses multi_query_retrieve instead of the ChromaDB embedding search
Falls back to the existing embedding search when multi-query is disabled
Cross-encoder reranking (Stage 2) runs on the BM25 results as before

`backend/app/config.py`

Added MULTI_QUERY_ENABLED: bool = True setting

How it works

User sends a query
LLM generates 4 paraphrased variants
Each variant searches the BM25 index built from the user's document chunks
Results are merged via RRF (deduplicated, reranked)
Top-10 results proceed to the cross-encoder reranker
Final top-5 are returned to the LLM for answer generation

Design decisions

No new dependencies — BM25 implemented inline (~30 lines); uses existing huggingface_hub and chromadb packages
Graceful degradation — If LLM call fails, only the original query is used; if no chunks exist, returns empty
Configurable — Set MULTI_QUERY_ENABLED=false to restore original ChromaDB-only retrieval

Closes #283

Add a multi-query expansion module that: - Generates 3-5 paraphrased query variants via LLM (InferenceClient) - Runs BM25 search for each variant using a pure-Python BM25Okapi - Merges results with Reciprocal Rank Fusion (RRF) - Returns top-K deduplicated results Integrated into retriever.py as the first retrieval stage when MULTI_QUERY_ENABLED is True (default). Falls back to the existing ChromaDB embedding search when disabled. Closes param20h#283

Xenon010101 requested a review from param20h as a code owner June 1, 2026 11:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rag): implement Multi-Query Expansion for BM25 search#331

feat(rag): implement Multi-Query Expansion for BM25 search#331
Xenon010101 wants to merge 1 commit into
param20h:mainfrom
Xenon010101:feat/multi-query-expansion

Xenon010101 commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Xenon010101 commented Jun 1, 2026

Summary

Changes

backend/app/rag/multi_query.py (new)

backend/app/rag/retriever.py

backend/app/config.py

How it works

Design decisions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`backend/app/rag/multi_query.py` (new)

`backend/app/rag/retriever.py`

`backend/app/config.py`