What Is a Retriever
❤️
The Bridge Between Question and Documents
A retriever is the component that connects the user's question to your stored documents. It has one simple contract: take a query as input, return a list of relevant documents as output. Everything else — how it searches, what strategy it uses — lives inside that function.
💡 Retriever = Input: User Query → Output: List of Relevant Document Objects. That is the entire contract.
Retrievers Are Runnables
🔗
Why They Fit Perfectly into Chains
In LangChain, every retriever implements .invoke() — making it a Runnable. This single property lets you plug retrievers directly into any chain using the pipe operator, without any adapter code.
✓Call it directly with .invoke('your question')
✓Plug into LangChain chains using the pipe operator
✓Async support built in for production performance
By Data Source
📦
Retrievers Categorized by Where They Search
·Vector Store Retriever — Most common. Semantic search against your own vector store.
·Wikipedia Retriever — Queries the Wikipedia API and returns relevant articles
·ArXiv Retriever — Retrieves research papers directly from ArXiv
·Web Research Retriever — Performs real-time internet searches
By Search Strategy
🎯
Retrievers Categorized by How They Search
·MMR Retriever — Returns diverse results — prevents redundant documents
·Multi Query Retriever — Generates multiple query variants for broader coverage
·Contextual Compression Retriever — Extracts only relevant parts from documents
·Self Query Retriever — Converts natural language into structured metadata filters
✦
Retrievers are the intelligence layer of RAG — they decide what context the LLM receives. As Runnables they slot cleanly into any LangChain chain. Start with the basic Vector Store Retriever, then upgrade to MMR or Multi Query as you need better quality. The interface never changes — only the strategy does.