The Problem
🔁
When All Your Results Say the Same Thing
Standard similarity search has one predictable failure: it returns documents that are very similar to the query — and also very similar to each other. You get five variations of the same fact, wasting your context window and missing broader perspectives entirely.
Example — all three say the same thing
✕"Climate change is causing glaciers to melt rapidly in the Arctic"
✕"Arctic glaciers are melting at an alarming rate due to rising temperatures"
✕"Global warming is accelerating glacier loss across Arctic regions"
What MMR Does
🎯
Maximum Marginal Relevance
MMR optimizes for two goals simultaneously: relevance to the query AND diversity among selected results. Every document it picks must be relevant to the question and meaningfully different from documents already selected.
💡 MMR formula: Score = λ × relevance_to_query − (1−λ) × similarity_to_already_selected_docs
Selection Process
⚙️
How MMR Picks Documents
1
First documentSelect the one with highest similarity to the query
2
Second documentMust be relevant to query AND maximally different from document 1
3
Each subsequentBalance relevance to query against similarity to all already-selected docs
Lambda Parameter
⚖️
Controlling the Balance
·lambda = 1.0 — Pure similarity search — diversity completely ignored
·lambda = 0.5 — Balanced — relevance and diversity weighted equally
·lambda = 0.0 — Maximum diversity — relevance becomes secondary
💡 Good default: lambda_mult=0.5. One line to activate: vectorstore.as_retriever(search_type="mmr", search_kwargs={"k":4,"lambda_mult":0.5})
✦
MMR solves one of the most common RAG quality problems — redundant retrieval. By optimizing for both relevance and diversity, it ensures the LLM receives multiple perspectives rather than five restatements of the same fact. One parameter change, dramatically better results.