Common Stages
While there are dozens of aggregation stages, you will spend 90% of your time using just five of them. Mastering these “Big 5” is the key to becoming proficient.
Analogy: The Factory Assembly Line Think of the Aggregation Pipeline as a car factory assembly line. Your raw data (raw materials) enters the factory. At each “stage” (workstation), a specific operation is performed—filtering out bad parts (
$match), grouping components together ($group), or reshaping the final product ($project). The output of one stage becomes the input for the next, until the final result is ready.
1. $match (Filter)
Analogy: Think of $match as a strict bouncer at a club—it checks IDs at the door and only lets the right documents into the rest of the pipeline.
The $match stage filters documents to pass only those that match the specified condition(s). It is the Aggregation equivalent of the find() command or SQL WHERE clause.
// Filter for active users over 21
{
$match: {
status: "active",
age: { $gt: 21 }
}
}
[!IMPORTANT] Performance Rule #1: Always place
$matchas early as possible (ideally first).
- It can use indexes to find documents efficiently.
- It reduces the number of documents subsequent stages have to process.
Index Scan vs. Collection Scan
Index Scan (Good)
Collection Scan (Bad)
2. $group (Aggregate)
Analogy: Think of $group as sorting a massive pile of loose coins into separate buckets for quarters, dimes, and nickels, and then counting the total value in each bucket.
The $group stage groups input documents by a specified _id expression and applies accumulators to each group. This is your SQL GROUP BY.
The _id Field
The _id field is mandatory. It determines the “bucket” that documents fall into.
_id: "$category": Group by category field._id: { region: "$region", year: "$year" }: Group by region AND year._id: null: Group all documents into one single bucket (useful for global totals).
Accumulators
You can calculate values for each group using accumulators:
$sum: Adds numeric values (or counts documents if you use$sum: 1).$avg: Calculates the average.$min/$max: Finds extreme values.$push: Creates an array of values from the group.$addToSet: Creates an array of unique values.
{
$group: {
_id: "$department", // Group by department
totalBudget: { $sum: "$budget" }, // Sum budget
avgSalary: { $avg: "$salary" }, // Average salary
employees: { $push: "$name" } // List of employee names
}
}
3. Interactive: $group Bucket Visualizer
Watch how raw items are sorted into buckets based on the grouping key.
Input Stream
Buckets
4. $project (Reshape)
Analogy: Think of $project as a packaging department. It takes the raw product, removes unnecessary wrapping, slaps a new label on it, and sends out a polished final item.
The $project stage passes along the documents with the requested fields to the next stage. It can:
- Select fields (like SQL
SELECT). - Rename fields.
- Compute new fields using expressions.
- Hide sensitive fields (e.g., exclude
password).
{
$project: {
_id: 0, // Exclude _id
fullName: "$name", // Rename 'name' to 'fullName'
status: 1, // Include 'status'
isAdult: { $gte: ["$age", 18] } // Compute boolean field
}
}
5. $unwind (Expand)
Analogy: Think of $unwind as unboxing a multi-pack. If a document is a 6-pack of soda (an array), $unwind breaks it open and sends 6 individual cans down the conveyor belt.
$unwind is unique to document databases. It deals with arrays. It “deconstructs” an array field from the input documents to output a document for each element.
Example:
Input: { id: 1, tags: ["A", "B"] }
Output:
{ id: 1, tags: "A" }{ id: 1, tags: "B" }
This is crucial when you want to group or filter by individual array elements.
6. $sort (Order)
Analogy: Think of $sort as a mailroom clerk organizing packages by zip code before they go out for delivery.
The $sort stage reorders the document stream.
1: Ascending (A-Z, 0-9)-1: Descending (Z-A, 9-0)
{ $sort: { age: -1, name: 1 } } // Sort by age desc, then name asc
[!WARNING] Memory Limit Alert:
$sortis a blocking stage. If you are sorting a large number of documents (more than 100MB of data), the query will fail unless you:
- Use
{ allowDiskUse: true }(slower, writes to temporary files).- Ensure the sort is covered by an index and placed early in the pipeline (preferred).