Module Review: Scaling & Operations

[!NOTE] This module explores the core principles of Module Review: Scaling & Operations, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

You’ve mastered the art of Kubernetes Scaling. From pod-level resizing to cluster-wide expansion, let’s solidify these concepts.

Key Takeaways

HPA Scales Out: It adds replicas based on utilization = current / request. It needs resources.requests to be set.
VPA Scales Up: It changes pod requests based on historical usage. It requires pod restart (unless updateMode: Off).
CA Scales Nodes: It reacts to Pending Pods, not high CPU. It adds nodes when pods can’t schedule.
Metrics Server is Critical: It’s the source of truth for HPA/VPA. It holds no history (last 60s only).
DaemonSets are Unique: They bypass the scheduler’s replica count logic to ensure one pod per node.

1. Flashcards

Test your recall. Click to flip.

What triggers the Cluster Autoscaler?

(Click to reveal)

Pending Pods

CA only scales up when a pod cannot be scheduled due to insufficient resources. It does NOT scale on high CPU usage alone.

Why is HPA + VPA on CPU dangerous?

(Click to reveal)

Feedback Loop

HPA adds pods on high CPU. VPA increases requests on high CPU. This leads to larger pods AND more replicas, wasting resources.

What happens if `resources.requests` is missing?

(Click to reveal)

HPA Fails

HPA cannot calculate utilization percentage without a request value. CPU scaling will not work.

Does Metrics Server store history?

(Click to reveal)

No

It only stores the latest scrape (window of ~60s). For history, you need Prometheus.

How do you run a pod on the Master node?

(Click to reveal)

Tolerations

You must add a `toleration` for the `node-role.kubernetes.io/master:NoSchedule` taint.

2. Cheat Sheet: The Scaling Triad

Feature	HPA	VPA	Cluster Autoscaler
Direction	Horizontal (More Replicas)	Vertical (Larger Replicas)	Infrastructure (More Nodes)
Trigger	CPU/Mem Utilization > Target	Historical Usage > Request	Pending Pods
Action	Updates `replicas` in Deployment	Updates `requests` in Pod Spec	Calls Cloud Provider API
Downtime	None (Zero downtime)	Yes (Pod Restart)	None (for existing pods)
Best For	Stateless Microservices	Java Apps / Databases / Monoliths	Any Cluster

3. Quick Revision

Horizontal Pod Autoscaler (HPA) scales applications outwards (more replicas) by tracking metrics like CPU, Memory, or custom values. Requires resource requests to be set.
Vertical Pod Autoscaler (VPA) scales applications upwards (larger resources) by observing historical data and modifying requests/limits. Can cause pod restarts if not in Off mode.
Cluster Autoscaler (CA) manages infrastructure elasticity, expanding node counts when pods are stuck in Pending state and shrinking them when nodes are under-utilized.
Metrics Server acts as an aggregator of current resource usage across the cluster; it holds no historical data and provides the foundation for HPA and VPA operations.
DaemonSets differ from Deployments by guaranteeing one pod instance runs on every target node. Useful for deploying monitoring agents, log collectors, or storage daemons.

4. Next Steps

Now that your cluster scales perfectly, how do you secure it? Moving forward, you will explore Security in Kubernetes.

Next Chapter: RBAC

5. Glossary Link

Review all essential terms related to Kubernetes scaling in the Kubernetes Glossary.

Module Review: Scaling & Operations

Module Review: Scaling & Operations

Key Takeaways

DaemonSets are Unique: They bypass the scheduler’s replica count logic to ensure one pod per node.

1. Flashcards

What triggers the Cluster Autoscaler?

Pending Pods

Why is HPA + VPA on CPU dangerous?

Feedback Loop

What happens if `resources.requests` is missing?

HPA Fails

Does Metrics Server store history?

No

How do you run a pod on the Master node?

Tolerations

2. Cheat Sheet: The Scaling Triad

3. Quick Revision

4. Next Steps

5. Glossary Link

Found this lesson helpful?