← Back to Series / Day 12 of 20
🔍
RAG Series · Day 12

Multi Query Retriever

One vague query misses too much. Multi Query Retriever generates multiple search variants to dramatically broaden coverage.

The Problem
🌫️

One Vague Query Is Not Enough

Real users ask broad, vague questions. "How to stay healthy?" could mean diet, exercise, sleep, or mental health. A single vector search on that phrase captures only one interpretation — and misses everything else. The result is an incomplete, narrow answer.

💡 One vague query = limited retrieval = incomplete context = mediocre answer. Multi Query Retriever breaks this chain.
What Multi Query Does
🔍

Generate Multiple Queries, Search Each One

An LLM automatically rewrites the original question into 3–5 different formulations, each from a different angle. The system searches the vector store with each query separately and merges the results.

Example transformation
Original: How to stay healthy?
Query 1: What foods should I eat for good long-term health?
Query 2: How often should I exercise to maintain physical fitness?
Query 3: What lifestyle habits improve mental and physical wellbeing?
Step by Step
⚙️

How It Works Internally

1
Original query sent to LLMLLM generates 3–5 diverse reformulations of the question
2
Each query searches the vector storeEvery variant retrieves its own set of relevant documents
3
Results merged and deduplicatedAll retrieved documents combined, exact duplicates removed
4
Rich diverse context for the LLMMultiple perspectives — much more complete final answer
When to Use It

Ideal Use Cases

Your users tend to ask broad, open-ended questions
A topic has multiple sub-aspects that should all be covered
Standard retrieval keeps missing relevant documents you know exist
Answer quality is mediocre despite having the right documents indexed
⚠️ Trade-off: requires one extra LLM call to generate query variants. Slightly higher latency and cost — worth it for complex queries.

Multi Query Retriever attacks the single biggest weakness of vector search: vague queries that only capture one interpretation. By generating multiple query variants and searching with all of them, it dramatically broadens retrieval coverage — giving your LLM a richer, more complete context to work from.