Fine-Tuning Basics

Fine-tuning is the process of taking a pre-trained Large Language Model (LLM)—which has already learned a vast amount of general knowledge from massive text corpora—and training it further on a smaller, domain-specific, or task-specific dataset.

Why Fine-Tune?

Pre-trained foundation models are excellent text completors, but they are not inherently helpful assistants. Without fine-tuning, an LLM prompted with a question might simply generate a similar question rather than an answer.

Fine-tuning bridges this gap by adapting the model to specific formats (like Q&A or chat), domains (like medical or legal text), and behavioral guidelines (safety, harmlessness).

Supervised Fine-Tuning (SFT)

The most common initial step is Supervised Fine-Tuning. This involves training the model on high-quality pairs of (prompt, response). The model is trained to minimize the cross-entropy loss over the generated response tokens, given the prompt tokens.

Instruction Tuning

A specific flavor of SFT is Instruction Tuning. Here, the dataset consists of various tasks phrased as instructions (e.g., “Summarize the following text:…”, “Translate to French:…”). This enables the model to generalize to unseen tasks simply by following instructions, rather than needing task-specific heads.

The Risk: Catastrophic Forgetting

A major challenge in fine-tuning is Catastrophic Forgetting. When a neural network is heavily trained on a new, narrow task, it aggressively updates its weights to minimize loss on that specific data. This can disrupt the delicate balance of weights that encode previously learned general knowledge.

If you fine-tune an LLM exclusively on a specialized corpus (like Python code) without regularizing or mixing in general data, it might become excellent at Python but lose its ability to write coherent essays or answer basic history questions.

Interactive Visualization: Catastrophic Forgetting