Advanced Patterns
Once you’ve mastered the basics, the Aggregation Framework opens up a world of complex data analysis capabilities. To truly understand these patterns, we need a concrete engineering scenario.
The Scenario: “E-Commerce Analytics Dashboard” Imagine you are a Senior Engineer tasked with building the backend for an E-Commerce seller dashboard. The dashboard needs to display a list of recent orders (with the customer’s full name attached), a demographic breakdown of the customers, and a paginated list of products alongside aggregate stats (like average price). Doing this in application code would require dozens of round-trips to the database. Instead, we’ll push this complex computation down to the database using advanced aggregation patterns.
We’ll approach this using elements of the PEDALS framework—specifically focusing on the Data Model and Architecture of our queries.
1. $lookup: The Power of Joins (Architecture & Data Model)
MongoDB is a document database, so we usually encourage embedding data (denormalization) to optimize read performance. However, in a real-world e-commerce system, embedding every order inside a User document leads to unbounded document growth (the 16MB limit anti-pattern).
Therefore, we keep Users and Orders in separate collections. But when the dashboard requests data, we need to stitch them back together. $lookup performs a Left Outer Join to another collection in the same database.
{
$lookup: {
from: "orders", // Target collection
localField: "_id", // Field in THIS collection (users)
foreignField: "userId", // Field in TARGET collection (orders)
as: "orderHistory" // Output array field name
}
}
[!NOTE] The result of
$lookupis always an array, even if only one document matches. You often need to$unwindit if you want to merge the fields.
Users Collection (from)
Orders Collection (target)
"_id": 1,
"name": "Alice",
"orderHistory": [
{ "id": 101, "user_id": 1, "total": 50 },
{ "id": 102, "user_id": 1, "total": 120 }
]
}
2. $bucket (Histograms)
Grouping by exact values is great, but sometimes you want to group by ranges. $bucket automatically categorizes data into ranges, perfect for histograms.
{
$bucket: {
groupBy: "$age", // Field to group by
boundaries: [0, 18, 30, 50, 80], // Define ranges: 0-17, 18-29, 30-49, 50-79
default: "Other", // Where to put outliers (80+)
output: {
count: { $sum: 1 },
names: { $push: "$name" }
}
}
}
Customer Age Demographics ($bucket)
3. $facet (Multi-Pipeline)
$facet is a game-changer for dashboards. It allows you to run multiple parallel aggregations on the same input dataset within a single query.
Imagine loading a product search page. You need:
- The list of products (paginated).
- The total count of products.
- A breakdown of products by category (for sidebar filters).
With $facet, you do this in one go:
{
$facet: {
// Pipeline 1: Get actual data
"products": [
{ $match: { price: { $lt: 100 } } },
{ $skip: 0 },
{ $limit: 10 }
],
// Pipeline 2: Get stats
"stats": [
{ $match: { price: { $lt: 100 } } },
{ $group: { _id: null, avgPrice: { $avg: "$price" } } }
]
}
}
4. Conditional Logic ($cond)
In our e-commerce dashboard, we don’t just want to show raw numbers; we want to provide actionable insights. If a product’s inventory drops below 10, the UI needs to display a red “Low Stock” warning.
You can use conditional logic inside $project or $addFields to create dynamic, derived fields directly in the database. It functions exactly like a programming ternary operator: (if ? then : else).
{
$addFields: {
// Dynamically calculate the inventory status string
inventoryStatus: {
$cond: {
if: { $gte: ["$stockQuantity", 10] },
then: "Healthy",
else: "Critical Low Stock" // Triggers UI warning
}
}
}
}
Advanced: Nested Conditions
You aren’t limited to a single if/else. You can nest $cond or use $switch for complex logic (e.g., categorizing stock into “High”, “Medium”, “Low”, and “Out of Stock”).
5. Performance Pitfalls
With great power comes great responsibility. Watch out for these common issues:
- The Cartesian Product: If you
$unwinda large array, you multiply the number of documents in your pipeline. 100 documents with an array of 100 items each becomes 10,000 documents! - $lookup on Unindexed Fields: Always ensure the
foreignFieldin your$lookupis indexed. Otherwise, MongoDB has to scan the entire target collection for every input document. - Memory Limits: Remember the 100MB limit for blocking stages. Use indexes to avoid sorting in memory whenever possible.