Skip to content

feat(rag): implement Multi-Query Expansion for BM25 search#331

Open
Xenon010101 wants to merge 1 commit into
param20h:mainfrom
Xenon010101:feat/multi-query-expansion
Open

feat(rag): implement Multi-Query Expansion for BM25 search#331
Xenon010101 wants to merge 1 commit into
param20h:mainfrom
Xenon010101:feat/multi-query-expansion

Conversation

@Xenon010101
Copy link
Copy Markdown
Contributor

Summary

Adds Multi-Query Expansion to improve BM25 retrieval in the RAG pipeline. The module generates paraphrased query variants via the LLM, runs BM25 search for each, and merges results using Reciprocal Rank Fusion (RRF).

Changes

backend/app/rag/multi_query.py (new)

  • BM25 class — Pure-Python BM25Okapi implementation (no new dependencies)
  • generate_query_variations() — Uses InferenceClient to generate 4 paraphrased query variants
  • reciprocal_rank_fusion() — Merges multiple ranked result lists with RRF (k=60)
  • multi_query_retrieve() — End-to-end pipeline: fetch chunks from ChromaDB, build BM25 index, generate variants, search each, RRF merge, top-K

backend/app/rag/retriever.py

  • When MULTI_QUERY_ENABLED=True (default), Stage 1 uses multi_query_retrieve instead of the ChromaDB embedding search
  • Falls back to the existing embedding search when multi-query is disabled
  • Cross-encoder reranking (Stage 2) runs on the BM25 results as before

backend/app/config.py

  • Added MULTI_QUERY_ENABLED: bool = True setting

How it works

  1. User sends a query
  2. LLM generates 4 paraphrased variants
  3. Each variant searches the BM25 index built from the user's document chunks
  4. Results are merged via RRF (deduplicated, reranked)
  5. Top-10 results proceed to the cross-encoder reranker
  6. Final top-5 are returned to the LLM for answer generation

Design decisions

  • No new dependencies — BM25 implemented inline (~30 lines); uses existing huggingface_hub and chromadb packages
  • Graceful degradation — If LLM call fails, only the original query is used; if no chunks exist, returns empty
  • Configurable — Set MULTI_QUERY_ENABLED=false to restore original ChromaDB-only retrieval

Closes #283

Add a multi-query expansion module that:
- Generates 3-5 paraphrased query variants via LLM (InferenceClient)
- Runs BM25 search for each variant using a pure-Python BM25Okapi
- Merges results with Reciprocal Rank Fusion (RRF)
- Returns top-K deduplicated results

Integrated into retriever.py as the first retrieval stage when
MULTI_QUERY_ENABLED is True (default). Falls back to the existing
ChromaDB embedding search when disabled.

Closes param20h#283
@Xenon010101 Xenon010101 requested a review from param20h as a code owner June 1, 2026 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant