Day 4 — RAG 4-Step Architecture

Big Picture

🗺️

Two Phases of RAG

RAG has two distinct phases. Indexing happens once when you set up the system. Query time happens repeatedly — every time a user asks a question. Understanding this separation is key to building efficient RAG systems.

💡 Indexing is a one-time setup cost. Query time is where the real-time magic happens — on every single request.

Step 1 — Indexing

📥

Building the Knowledge Base

This phase runs once when you set up your system. It prepares your documents for efficient search later.

Document LoadingLoad from any source — PDF, website, CSV, database. LangChain has hundreds of loaders.

Text SplittingBreak documents into smaller chunks so the LLM can process them within its context window.

Embed and StoreConvert each chunk into a vector. Store vectors and text in a vector store like FAISS or Pinecone.

Step 2 — Retrieval

🔍

Finding Relevant Chunks

This runs every time a user submits a question. The system finds the most relevant chunks from your indexed knowledge base.

·The user's query is converted into an embedding vector

·A semantic search finds the closest vectors in the store

·The top 3–5 most relevant chunks are extracted

·These chunks become the context for your LLM prompt

Step 3 — Augmentation

🔧

Building the Prompt

Combine the retrieved chunks with the user's question into a structured prompt for the LLM.

·System instruction — You are a helpful assistant. Answer ONLY from the context below.

·Context — The retrieved chunks as plain text

·Question — The user's original query

·Constraint — If the context is insufficient, say I don't know.

Step 4 — Generation

✨

The Final Answer

The complete prompt is sent to the LLM. It reads the context and question together and returns a grounded, accurate answer using its language capability.

✓Context-grounded answers — no random hallucination

✓Natural language quality — the LLM structures the answer clearly

✓Real-time — fresh retrieval happens for every single query

✦

RAG's four-step pipeline — Load, Split, Embed, Store during Indexing, then Retrieve, Augment, Generate at Query Time — transforms a generic LLM into a precise, private, always-current assistant. Each step is modular and independently improvable.

← Day 3 Day 4 / 20 Day 5 →