Module 17 Review: Ops Excellence
Congratulations! You have mastered the “Day 2” operations that keep systems alive. Before moving to the Final Assessment, let’s review.
1. Interactive Flashcards
Click on a card to reveal the definition.
High Cardinality
Why it kills metrics:
Tagging metrics with unique IDs (e.g., `user_id`) creates millions of time series, causing TSDB memory exhaustion (OOM).
Tagging metrics with unique IDs (e.g., `user_id`) creates millions of time series, causing TSDB memory exhaustion (OOM).
Circuit Breaker
Fail Fast:
Prevents cascading failure by stopping requests to a failing service. States: Closed (OK), Open (Blocked), Half-Open (Testing).
Prevents cascading failure by stopping requests to a failing service. States: Closed (OK), Open (Blocked), Half-Open (Testing).
mTLS
Zero Trust:
Mutual TLS. Both Client and Server present certificates to authenticate each other. Prevents internal attackers.
Mutual TLS. Both Client and Server present certificates to authenticate each other. Prevents internal attackers.
Canary Deployment
Low Risk Release:
Rolling out a new version to a small % of users (e.g., 1%) to test stability before full rollout.
Rolling out a new version to a small % of users (e.g., 1%) to test stability before full rollout.
Trace Context
Propagation:
Passing `trace_id` and `span_id` headers (W3C standard) to downstream services to link logs across microservices.
Passing `trace_id` and `span_id` headers (W3C standard) to downstream services to link logs across microservices.
Idempotency Key
Safe Retries:
A unique ID sent with requests (e.g., payments) so the server can ignore duplicate requests if a network retry happens.
A unique ID sent with requests (e.g., payments) so the server can ignore duplicate requests if a network retry happens.
2. Interactive Scenario: The Panic Button
It’s 3 AM. You are on-call. The system is down. What do you do?
⚠️ PAGERDUTY ALERT ⚠️
"High Latency Detected on Payment Service (p99 > 5s)"
3. System Design Cheat Sheet
| Category | Concept | Key Takeaway |
|:--------|:--------|:--------|
| **Observability** | **Logs** | Structured (JSON) for querying specific events. |
| | **Metrics** | Aggregates for trends. Watch out for Cardinality Explosion (no `user_id`). |
| | **Tracing** | Follow request across microservices. Use Sampling (Head/Tail). |
| | **OTel** | Vendor-neutral standard. Use SDK + Collector. |
| **Reliability** | **Circuit Breaker** | Stop cascading failures. States: Closed, Open, Half-Open. |
| | **Retry** | Only for transient errors. Always use **Exponential Backoff + Jitter**. |
| | **Idempotency** | Ensure `f(f(x)) = f(x)`. Use `Idempotency-Key` header. |
| **Security** | **TLS 1.3** | Encrypts transit. 1-RTT handshake. Forward Secrecy. |
| | **OAuth 2.0** | Authorization (Valet Key). Flows: Auth Code (User), Client Creds (Service). |
| | **mTLS** | Mutual TLS. Zero Trust for service-to-service calls. |
| | **JWT** | Stateless token. `Header.Payload.Signature`. |
| **Deployment** | **Rolling** | Low cost, K8s default. Slow rollback. |
| | **Blue/Green** | Safe, instant rollback, 2x cost. |
| | **Canary** | Test in production with real users (1% -> 100%). Lowest risk. |
| | **GitOps** | Infrastructure as Code + Automated Sync (ArgoCD). |
4. What’s Next?
You have completed the core technical modules! You are now ready for the Final Boss.
The next module is Module 18: Final Assessment, where we will simulate a real System Design Interview with a full Mock Scenario.