The Noise Problem
📜
Retrieved Documents Contain Irrelevant Content
Text splitting is imperfect. A single chunk often covers more than one topic because chunking by character count ignores semantic boundaries. You ask about photosynthesis and the retrieved document also mentions the Grand Canyon. That noise fills your context window and confuses the LLM.
⚠️ If 60% of your retrieved context is irrelevant, you are wasting 60% of your LLM's attention — degrading answer quality significantly.
The Solution
🗜️
Contextual Compression Retriever
This retriever runs standard retrieval first, then passes every retrieved document through an LLM compressor that extracts only the sentences relevant to the query. Everything irrelevant is discarded. The LLM receives clean, focused context.
✕Before: Full 1000-character document — 70% about unrelated topics
✓After: 2–3 sentences — only what directly answers the question
Architecture
⚙️
Two Components Working Together
1
Base RetrieverAny retriever — similarity, MMR, multi-query — fetches initial documents
2
Compressor (LLM)Reviews each document against the query — keeps only relevant sentences
💡 The base retriever can be any retriever you already use. Contextual Compression wraps around it as a post-processing layer.
When to Use It
🎯
Best Situations for Compression
✓Documents are long and cover multiple topics per chunk
✓Context window is getting full — you need more relevant info in less space
✓LLM answers are getting confused or off-topic despite good retrieval
✓You want to dramatically improve answer precision
✕Requires an extra LLM call per retrieved document — adds latency and cost
✓Significantly better answer accuracy — almost always worth it for complex queries
✦
Contextual Compression Retriever strips retrieved documents down to only the content that actually matters for the question. The LLM receives clean, high-signal context — and produces more accurate, grounded answers as a direct result. It is one of the most impactful quality improvements you can make to a RAG system.