Functional vs Non-Functional Requirements
The “Golf Cart” Problem
You walk into a car dealership and ask for a car that “has 4 wheels, an engine, and drives forward.” The dealer hands you a golf cart. Technically, it meets your requirements. But you can’t drive it on the highway, it’s not safe in a crash, and it has zero storage.
In System Design:
- Functional Requirements (FRs): “It has 4 wheels” (The Verb - What it does).
- Non-Functional Requirements (NFRs): “It drives at 70mph” (The Adjective - How well it does it).
If you build a system that works (Features) but is slow, insecure, or fragile (NFRs), you have built a golf cart for a highway.
1. The Requirement Hierarchy
To design a world-class system, you must understand how high-level business goals trickle down into technical constraints.
Gold Standard: The Architecture Traceability Map
This diagram shows how a single “Business Need” creates a chain of requirements.
2. Interactive: The Requirement Sorter Game
Can you distinguish between Functional (FR) and Non-Functional (NFR) requirements? Sort the requirements into the correct bucket!
Requirement Sorter
Click FR (Features) or NFR (Quality) for the current item.
3. Measuring Reliability: SLO vs SLA vs SLI
Google’s SRE book defines these three distinct terms. Understanding the difference is the hallmark of a Senior Engineer.
- SLI: The raw metric (e.g., “Latency is 142ms”).
- SLO: The internal threshold (e.g., “99.9% of requests must be < 200ms”).
- SLA: The contract (e.g., “If availability drops < 99.9%, we pay you back”).
For a deep dive into Observability and Metrics, see Module 17: Observability. Also critical is Security, another major NFR.
Critical Distinction: Availability vs. Reliability
Many candidates confuse these terms.
| Metric | Definition | Example |
|---|---|---|
| Availability | Is the system reachable? (Uptime) | “The site loads.” |
| Reliability | Does the system work correctly? (Success Rate) | “The site loads AND the payment processes correctly.” |
[!TIP] A system can be Available (returns HTTP 500 Errors instantly) but not Reliable (all requests fail).
Interactive: The Error Budget Calculator
If your SLO is 99.9%, how many minutes of downtime can you afford before you must stop shipping features?
The CAP & PACELC Trade-off Matrix
Standard System Design interviews center on the CAP Theorem, but the PACELC Theorem is what separates Senior from Intermediate engineers.
Interactive: The Decision Tree
Answer 3 questions to find your ideal database architecture.
1. The Partition State (P)
When the network breaks, do you prefer Availability (Return old data) or Consistency (Return an error)?
- AP: DynamoDB, Cassandra, CouchDB.
- CP: HBase, MongoDB, Redis (Strong Consistency config).
2. The Normal State (E - “Else”)
When the network is fine, do you prefer Consistency (Wait for all replicas) or Latency (Return fast)?
- EC: Standard SQL (Synchronous Replication).
- EL: DynamoDB, Redis (Asynchronous Replication).
Visual: Consistency Models Explained
Not all Consistency is created equal.
Strong Consistency
Every read receives the most recent write or an error. (e.g., Banking)
Read(X) -> 1
Eventual Consistency
Reads might be stale for a moment. Guaranteed to converge. (e.g., DNS, Likes)
Read(X) -> 0 ... 1
Causal Consistency
Operations that are causally related are seen in order. (e.g., Comments)
You see B then A
Interactive: The PACELC Slider
Adjust the slider to see how tuning for Latency affects Consistency.
[!IMPORTANT] Summary: In an interview, don’t just pick “the best” NFR. Pick the NFR that matches the Business Problem. High-frequency trading (Consistent & Low Latency) vs. Global Social Media (Available & Partition Tolerant).
The Interview Gauntlet
- “Can a system be both CA (Consistent & Available)?”
- Ans: Only in worlds where the network never fails. On the internet, P is mandatory, so you must choose CP or AP.
- “Why is the SLO always lower than the SLA?”
- Ans: The SLO is your internal target. You set it higher so you can detect issues before you break the legal contract (SLA).
- “What happens to Latency in a CP system?”
- Ans: Latency increases because the system must wait for a quorum/majority of nodes to agree before confirming a write.