Gen AI Glossary

A comprehensive list of terms used in Generative AI and Large Language Models.

## A ### Attention Mechanism The core innovation of the Transformer architecture. It allows the model to weigh the importance of different words in a sentence relative to each other, enabling it to understand context and relationships regardless of distance. * **Self-Attention**: The mechanism where words in a sentence attend to other words in the same sentence to resolve ambiguity (e.g., "bank" attending to "river"). ## C ### Chunking The process of splitting a large document into smaller, manageable pieces (chunks) before embedding them. This is crucial for RAG to ensure that the context window is not exceeded and that retrieved information is precise. ### Context Window The maximum amount of text (measured in tokens) that an LLM can process at once. This includes the prompt and the generated response. * **Example**: GPT-4 Turbo has a context window of 128k tokens (approx. 300 pages of text). ### Cosine Similarity A metric used to measure the similarity between two vectors. It measures the cosine of the angle between them. If the vectors point in the same direction, the similarity is 1; if they are opposite, it is -1. It is the standard metric for semantic search. ## E ### Embedding A vector (list of numbers) representation of a token or concept. Embeddings capture semantic meaning, such that words with similar meanings are close together in vector space. * **Example**: The vector for "King" minus "Man" plus "Woman" results in a vector close to "Queen". ## F ### Fine-Tuning The process of taking a pre-trained model and training it further on a specific dataset to improve its performance on a particular task or domain. * **RLHF (Reinforcement Learning from Human Feedback)**: A fine-tuning technique where the model is rewarded for generating responses that align with human preferences. ## H ### Hallucination When an LLM generates text that is grammatically correct and confident but factually incorrect or nonsensical. This happens because the model is predicting the next statistically probable word, not accessing a database of facts. ### Hybrid Search A retrieval strategy that combines Keyword Search (BM25) and Vector Search. It uses Reciprocal Rank Fusion (RRF) to merge the results, providing the best of both worlds (exact matches and semantic understanding). ## I ### Inference The process of using a trained model to generate predictions (text). In LLMs, this involves running the input prompt through the model to produce the output. ## L ### LLM (Large Language Model) A deep learning model trained on massive amounts of text data to generate human-like text. It predicts the next token in a sequence based on the context of previous tokens. * **Parameters**: The internal variables (weights) learned during training. GPT-4 has an estimated 1.8 trillion parameters. ## P ### Prompt The input text provided to an LLM to guide its output. * **Prompt Engineering**: The art of crafting prompts to get the best possible output from an LLM. ## R ### RAG (Retrieval-Augmented Generation) A technique that enhances LLM accuracy by retrieving facts from an external knowledge base and providing them as context. It solves the problems of hallucinations and outdated training data. ### Re-ranking The process of re-scoring the top results retrieved by a vector database using a more accurate (but slower) model, typically a Cross-Encoder. This improves the precision of the context passed to the LLM. ## T ### Temperature A hyperparameter that controls the randomness of the model's output. * **Low Temperature (0.1)**: Deterministic, focused, and conservative. Good for coding or factual answers. * **High Temperature (0.8+)**: Creative, diverse, and unpredictable. Good for creative writing. ### Token The basic unit of text for an LLM. A token can be a word, part of a word, or a character. On average, 1 token ≈ 0.75 words. * **Tokenization**: The process of converting raw text into a sequence of tokens (integers). ### Transformer The neural network architecture introduced by Google in 2017 ("Attention Is All You Need") that serves as the foundation for modern LLMs. It relies entirely on the Attention mechanism and allows for parallel processing of data. ## V ### Vector Database A database optimized for storing and searching embeddings. It allows for efficient Similarity Search rather than just exact keyword matching. Examples include Pinecone, ChromaDB, and Weaviate.