← Back to Series / Day 13 of 20
🗜️
RAG Series · Day 13

Contextual Compression

Retrieved documents contain noise. This retriever extracts only the sentences that actually answer the question.

The Noise Problem
📜

Retrieved Documents Contain Irrelevant Content

Text splitting is imperfect. A single chunk often covers more than one topic because chunking by character count ignores semantic boundaries. You ask about photosynthesis and the retrieved document also mentions the Grand Canyon. That noise fills your context window and confuses the LLM.

⚠️ If 60% of your retrieved context is irrelevant, you are wasting 60% of your LLM's attention — degrading answer quality significantly.
The Solution
🗜️

Contextual Compression Retriever

This retriever runs standard retrieval first, then passes every retrieved document through an LLM compressor that extracts only the sentences relevant to the query. Everything irrelevant is discarded. The LLM receives clean, focused context.

Before: Full 1000-character document — 70% about unrelated topics
After: 2–3 sentences — only what directly answers the question
Architecture
⚙️

Two Components Working Together

1
Base RetrieverAny retriever — similarity, MMR, multi-query — fetches initial documents
2
Compressor (LLM)Reviews each document against the query — keeps only relevant sentences
💡 The base retriever can be any retriever you already use. Contextual Compression wraps around it as a post-processing layer.
When to Use It
🎯

Best Situations for Compression

Documents are long and cover multiple topics per chunk
Context window is getting full — you need more relevant info in less space
LLM answers are getting confused or off-topic despite good retrieval
You want to dramatically improve answer precision
Requires an extra LLM call per retrieved document — adds latency and cost
Significantly better answer accuracy — almost always worth it for complex queries

Contextual Compression Retriever strips retrieved documents down to only the content that actually matters for the question. The LLM receives clean, high-signal context — and produces more accurate, grounded answers as a direct result. It is one of the most impactful quality improvements you can make to a RAG system.