The Problem
🌫️
One Vague Query Is Not Enough
Real users ask broad, vague questions. "How to stay healthy?" could mean diet, exercise, sleep, or mental health. A single vector search on that phrase captures only one interpretation — and misses everything else. The result is an incomplete, narrow answer.
💡 One vague query = limited retrieval = incomplete context = mediocre answer. Multi Query Retriever breaks this chain.
What Multi Query Does
🔍
Generate Multiple Queries, Search Each One
An LLM automatically rewrites the original question into 3–5 different formulations, each from a different angle. The system searches the vector store with each query separately and merges the results.
Example transformation
→Original: How to stay healthy?
✓Query 1: What foods should I eat for good long-term health?
✓Query 2: How often should I exercise to maintain physical fitness?
✓Query 3: What lifestyle habits improve mental and physical wellbeing?
Step by Step
⚙️
How It Works Internally
1
Original query sent to LLMLLM generates 3–5 diverse reformulations of the question
2
Each query searches the vector storeEvery variant retrieves its own set of relevant documents
3
Results merged and deduplicatedAll retrieved documents combined, exact duplicates removed
4
Rich diverse context for the LLMMultiple perspectives — much more complete final answer
When to Use It
✨
Ideal Use Cases
✓Your users tend to ask broad, open-ended questions
✓A topic has multiple sub-aspects that should all be covered
✓Standard retrieval keeps missing relevant documents you know exist
✓Answer quality is mediocre despite having the right documents indexed
⚠️ Trade-off: requires one extra LLM call to generate query variants. Slightly higher latency and cost — worth it for complex queries.
✦
Multi Query Retriever attacks the single biggest weakness of vector search: vague queries that only capture one interpretation. By generating multiple query variants and searching with all of them, it dramatically broadens retrieval coverage — giving your LLM a richer, more complete context to work from.