ποΈ
RAG Series · Day 4
RAG 4-Step Architecture
RAG is a complete 4-step system β Indexing, Retrieval, Augmentation, and Generation. Every step explained in detail.
Big Picture
πΊοΈ
Two Phases of RAG
RAG has two distinct phases. Indexing happens once when you set up the system. Query time happens repeatedly β every time a user asks a question. Understanding this separation is key to building efficient RAG systems.
π‘ Indexing is a one-time setup cost. Query time is where the real-time magic happens β on every single request.
Step 1 β Indexing
π₯
Building the Knowledge Base
This phase runs once when you set up your system. It prepares your documents for efficient search later.
1a
Document LoadingLoad from any source β PDF, website, CSV, database. LangChain has hundreds of loaders.
1b
Text SplittingBreak documents into smaller chunks so the LLM can process them within its context window.
1c
Embed and StoreConvert each chunk into a vector. Store vectors and text in a vector store like FAISS or Pinecone.
Step 2 β Retrieval
π
Finding Relevant Chunks
This runs every time a user submits a question. The system finds the most relevant chunks from your indexed knowledge base.
·The user's query is converted into an embedding vector
·A semantic search finds the closest vectors in the store
·The top 3β5 most relevant chunks are extracted
·These chunks become the context for your LLM prompt
Step 3 β Augmentation
π§
Building the Prompt
Combine the retrieved chunks with the user's question into a structured prompt for the LLM.
·System instruction β You are a helpful assistant. Answer ONLY from the context below.
·Context β The retrieved chunks as plain text
·Question β The user's original query
·Constraint β If the context is insufficient, say I don't know.
Step 4 β Generation
β¨
The Final Answer
The complete prompt is sent to the LLM. It reads the context and question together and returns a grounded, accurate answer using its language capability.
✓Context-grounded answers β no random hallucination
✓Natural language quality β the LLM structures the answer clearly
✓Real-time β fresh retrieval happens for every single query
β¦
RAG's four-step pipeline β Load, Split, Embed, Store during Indexing, then Retrieve, Augment, Generate at Query Time β transforms a generic LLM into a precise, private, always-current assistant. Each step is modular and independently improvable.