Module Review: RAG

Key Takeaways

  • RAG = Retrieval + Generation: It solves LLM hallucinations and knowledge cutoffs by providing external context.
  • Embeddings: Vectors that represent semantic meaning. Similar concepts are close in vector space.
  • Vector Databases: Specialized stores (Pinecone, ChromaDB) optimized for high-dimensional similarity search using ANN (HNSW).
  • Chunking Matters: How you split text affects retrieval quality. Recursive chunking is generally better than fixed-size.
  • Hybrid Search: Combining Keyword Search (BM25) and Vector Search yields the best results (Recall).
  • Re-ranking: A second pass using a Cross-Encoder drastically improves precision.
  • Production RAG: Is not a linear pipeline but a complex system with query expansion, routing, and self-correction.

Interactive Flashcards

Test your knowledge by flipping the cards.

What are the two main problems RAG solves?

(Click to flip)

1. Hallucinations (making up facts)

2. Knowledge Cutoffs (outdated data)

What is an Embedding?

A vector (list of numbers) representing the semantic meaning of text.

Which distance metric is most common for text similarity?

Cosine Similarity (measures the angle between vectors).

What is the trade-off of Re-ranking?

It improves accuracy (precision) but increases latency (slower) and cost.

What does HNSW stand for?

Hierarchical Navigable Small World (an algorithm for fast approximate nearest neighbor search).

RAG Cheat Sheet

Common Hyperparameters

Parameter Recommended Start Description
Chunk Size 512 - 1024 tokens Size of each text block.
Chunk Overlap 10% - 20% Characters shared between chunks to preserve context.
Top K 3 - 5 Number of documents to retrieve.
Temperature 0.0 - 0.3 Lower temperature reduces hallucinations in RAG.

RAG Components

Component Popular Tools
Orchestration LangChain, LlamaIndex
Vector DB Pinecone, ChromaDB, Weaviate, pgvector
Embeddings OpenAI text-embedding-3, HuggingFace all-MiniLM-L6-v2
Evaluation RAGAS, TruLens

Next Steps

Now that you understand how to augment LLMs with external data, letโ€™s learn how to permanently teach them new skills.

Module 04: Fine-Tuning (Coming Soon)

Gen AI Glossary