Learning RAG Systems My Journey from Beginner to Real Understanding

6 min read

When I first started exploring AI, I kept seeing words like RAG, embeddings, and vector databases. Everyone seemed to throw these terms around as if everyone magically understood them. The tutorials I found would show a simple code snippet, maybe a quick chatbot demo, and call it a day.

But no one really explained how RAG systems actually work under the hood.

I didn’t want to just use AI. I wanted to actually understand it. And if you’re anything like me, you probably feel the same way.

Why Most AI Articles Felt Empty

Most AI articles today either:

  • Show one tiny example without explaining how the pieces connect
  • Talk in buzzwords that don’t actually teach you anything real

I realized very quickly that understanding AI (especially RAG) means going deeper. Not just copying code.

So I decided to slow down and ask simple but real questions:

  • What is chunking?
  • What is an embedding?
  • How do vector databases actually work?
  • How do all these fit together to make a RAG system?

What I Learned About RAG (Retrieval-Augmented Generation)

Here’s the real flow, now that I understand it:

Knowledge Base

Start with your documents (Markdown files in my case).

const EMBED_MODEL = 'nomic-embed-text';
const CHAT_MODEL = 'mistral';
const DOCS_DIR = './docs';

async function loadDocs() {
  const files = await fs.readdir(DOCS_DIR);
  const db = [];

  for (const file of files) {
    const content = await fs.readFile(path.join(DOCS_DIR, file), 'utf-8');
    const chunks = chunkText(content, 500);

    for (const chunk of chunks) {
      const embedding = await getEmbedding(chunk);
      db.push({ chunk, embedding });
    }
  }
  console.log(`Embedded ${db.length} text chunks.`)
  return db;
}

Chunking

Break those documents into small, meaningful parts.

export function chunkText(text: string, size = 500) {
    const chunks = [];
    for (let i = 0; i < text.length; i += size) {
        chunks.push(text.slice(i, i + size));
    }
    return chunks;
}

Embeddings

Use an embedding model to turn each chunk into a vector (a list of numbers that captures the meaning).

async function getEmbedding(text: string) {
  const res = await axios.post('http://localhost:11434/api/embeddings', {
    model: EMBED_MODEL,
    prompt: text,
  });
  return res.data.embedding;
}

How Embedding Works

An embedding model reads a piece of text and converts it into a list of numbers that capture the meaning of that text. In simple terms:

TextVector (Example Numbers)
Reset password[0.24, 0.51, 0.13, …]
Recover account password[0.23, 0.50, 0.15, …]
Launch a rocket[-0.64, 0.19, 0.93, …]
  • Similar meanings (like “Reset password” and “Recover account password”) produce similar vectors.
  • Different meanings (like “Launch a rocket”) produce different vectors.

This way, we can find meaning-based matches, not just exact word matches.

Store Embeddings

Save these vectors into a vector database (like ChromaDB or even a local JSON file for now). In my case, I’m storing the embeddings in memory.

Querying

Now, we can search for similar chunks based on the vectors.

async function findRelevantChunks(db: string[], query: string) {
  const queryEmbedding = await getEmbedding(query);
  const scored = db.map(entry => ({
    chunk: entry.chunk,
    score: cosineSimilarity(queryEmbedding, entry.embedding),
  }));
  scored.sort((a, b) => b.score - a.score);
  return scored.slice(0, 3).map(x => x.chunk);
}

User Question

When a user asks something, embed the question too.


async function askMistral(context: string, question: string) {
  const prompt = `Use the following context to answer:\n\n${context.join('\n\n')}\n\nQuestion: ${question}`;
  const res = await axios.post('http://localhost:11434/api/generate', {
    model: CHAT_MODEL,
    prompt: prompt,
    stream: false,
  });
  return res.data.response.trim();
}

const context = await findRelevantChunks(db, question);
const answer = await askMistral(context, question);

Find the most similar chunks from the vector database (JSON file in my case).

function cosineSimilarity(vecA: number[], vecB: number[]): number {
  const dot = vecA.reduce((sum, a, idx) => sum + a * vecB[idx], 0);
  const normA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));
  const normB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));
  return dot / (normA * normB);
}

In this simple code, we’re manually calculating the cosine similarity between vectors to find the best matches. This is exactly what a real vector database like ChromaDB, Pinecone, or Milvus would do internally. The only difference is:

What we’re doing manuallyWhat a Vector Database does
Calculate cosine similarity between vectorsSame (but optimized, super fast)
Sort by score and pick top resultsSame
Works for small projectsScales to millions of vectors

In short:

cosineSimilarity = Brain of vector search.
Manual search = Fine for small apps.
Vector DB = Needed for big, fast production apps.

Answer

async function generateAnswer(context: string[], question: string): Promise<string> {
  const prompt = `Use the following context to answer:\n${context.join('\n')}\n\nQuestion: ${question}`;
  const res = await axios.post('http://localhost:11434/api/generate', {
    model: 'mistral',
    prompt,
    stream: false,
  });
  return res.data.response.trim();
}

In simple words

Markdown Docs --> Chunking --> Embedding --> Save to Vector DB

User asks Question --> Embed Question --> Search DB --> Retrieve Chunks --> Give to Model --> Get Answer

This flow is now so clear to me that it’s honestly shocking how badly most tutorials explain it.

Why Chunking and Embeddings Matter More Than People Think

What I also realized:

  • If your chunking is bad (random cuts, huge blocks), your retrieval will be bad.

  • If your embeddings are low quality (bad models), even good chunks won’t match properly.

Garbage chunks = Garbage retrieval = Garbage answers.

  • Good chunking + good embeddings = sharp, precise answers, even without a fancy model.

  • Most of the real “magic” in RAG systems isn’t in the model. It’s in how you prepare and retrieve the right knowledge.

Where I’m Heading Next

Now that I understand the pieces properly, I’m focusing on building:

  • Smart document RAG systems using Markdown files as the knowledge base.
  • Clean chunking strategies (splitting by sections, logical grouping).
  • Embedding everything into a fast vector database like ChromaDB.

I’m also working on a simple tool to help you:

Final Thoughts

If you’re learning AI today, my advice is simple:

Slow down. Understand the basics first.
Don't get distracted by shiny new tools.
Understand what **chunking** and **embedding** really mean.
Build small systems, validate your learning, and THEN scale.

(This was a personal journey into RAG systems. I’ll be writing more soon about the small real-world apps I’m building around this knowledge.)


Sai Umesh profile picture

Engineering leader with 10+ years building scalable, secure platforms. Specializing in cloud infrastructure, AI-powered tools (RAG, MCP), and full-stack development. Expert in AWS, Node.js, TypeScript, React, Go, and Rust.