Design Typeahead (Autocomplete)
1. What is Typeahead?
Typeahead (or Autocomplete) is the feature where a system suggests completions for a query as the user types.
- Examples: Google Search, Amazon Product Search, VS Code IntelliSense.
- Goal: Save user keystrokes and guide them to popular queries.
Try it yourself: Go to Google. Type ‘sys’. See ‘system of a down’, ‘system design’. Type ‘xyz’. See how suggestions change instantly. Open Network tab to see the JSON response.
2. Requirements & Goals
Functional Requirements
- Suggest: Return top 5 terms starting with the prefix.
- Ranking: Suggestions must be ranked by popularity (frequency) or personalization.
Non-Functional Requirements
- Ultra Low Latency: < 100ms. The user is typing fast; results must appear instantly.
- High Availability: Critical for UX.
- Eventual Consistency: It’s okay if a new trending term takes 5 minutes to appear.
3. Capacity Estimation
- DAU: 500 Million.
- Searches: 10 searches/user/day = 5 Billion searches.
- Keystrokes: Avg query length = 4 chars.
- QPS: 5 Billion × 4 = 20 Billion requests/day.
- Peak QPS: ~400,000 QPS.
- This is a Read-Heavy system (Ratio 20:1 or higher).
RAM Estimation (Trie Size)
- 100 Million unique terms.
- Avg term length: 10 chars.
- Raw Data: 1 GB.
- Trie Overhead: Nodes, Pointers, Top-K Lists.
- Estimate: ~15 GB to 30 GB. This fits in the RAM of a single modern server!
4. System APIs
GET /suggestions?prefix=ama&limit=5
Response:
{
"prefix": "ama",
"results": [
{ "term": "amazon", "score": 98 },
{ "term": "amazon prime", "score": 95 },
{ "term": "amazing spiderman", "score": 88 },
{ "term": "amazon music", "score": 85 },
{ "term": "amanda seyfried", "score": 80 }
]
}
5. Database Design
We cannot use a standard SQL DB (SELECT * FROM queries WHERE text LIKE 'ama%' ORDER BY freq DESC). This is O(N) and too slow for 400k QPS.
We need a specialized data structure: The Trie (Prefix Tree).
Storage Hierarchy
- L1 Cache (Browser): Cache results for “ama” for 1 hour.
- L2 Cache (In-Memory Trie): Redis or Custom Memcached storing the Trie.
- L3 Persistence (Cassandra / DynamoDB): Stores the raw search logs and aggregated frequencies for offline processing.
6. High-Level Design
Typeahead System Architecture.
Freq Count
Weekly Job
- Client: Sends keystrokes to API Gateway.
- Trie Service Cluster: In-memory Trie nodes return Top-K results.
- Spell Checker: Fallback if no prefix match found.
- Async Data Pipeline:
- Kafka Logs: Buffers query usage.
- Spark Aggregator: Calculates frequencies.
- Trie Builder: Serializes Trie to Trie DB (S3).
7. Component Design (Deep Dive)
The Trie Data Structure
- Root -> A -> M -> A.
- Node
A-M-Astores:- Children:
Z(for amazon),Z(for amazing). - Cached Top-K: A list of the top 5 most popular queries that start with “AMA”.
- Children:
Why Cache Top-K at Every Node?
If we don’t cache, we have to traverse the entire subtree of “AMA” to find the most popular ones. This is too slow.
- Write Time: Expensive (Updating frequency means bubbling up changes).
- Read Time: O(1). Just read the list at the node.
Trie Serialization
How do we save the Trie to disk or send it over the network?
- DFS Traversal: Save as
Node(Char, List<Children>, TopK). - Protocol Buffers: Very compact binary format.
- Optimization: Since the Trie fits in RAM, we can serialize the whole object graph to a file and load it on startup.
Trie Optimization: Radix Tree
A standard Trie can be memory inefficient if we have long branches with no branching (e.g., b->e->s->t).
- Radix Tree (Compressed Trie): We merge nodes with single children.
- Instead of
b -> e -> s -> t, we storebestin a single node. - This significantly reduces the number of nodes and pointer overhead.
Fuzzy Search
What if the user types “amzon” (missing ‘a’)?
- We can use Levenshtein Distance (Edit Distance) to find prefixes that are “1 edit away” from the user’s input.
- However, this is expensive (O(LM)). In practice, we usually rely on a separate Spell Check Service to correct the query *before sending it to the Typeahead service.
Data Collection Pipeline
Visualized in the Async Data Pipeline zone of the diagram. How do we update the Trie?
- Log Service: Buffers search logs to Kafka.
- Aggregator (Spark): Every hour, calculate frequencies of all terms.
- Trie Builder: Rebuilds the Trie from the aggregated data.
- Deployment: Push new Trie to Trie Servers (Load Snapshot).
8. Requirements Traceability
| Requirement | Design Decision | Justification |
|---|---|---|
| Ultra Low Latency | In-Memory Trie | RAM access (100ns) vs Disk (10ms). Ensures < 20ms processing time. |
| Availability | Replication | Replicate the full Trie on 100+ servers. No sharding complexity needed. |
| Scalability | L1 Browser Cache | Offloads 30-50% of requests (backspaces/repeats) to the client. |
| Freshness | Async Pipeline | Decouples write path (Logs) from read path (Trie). Eventual consistency is fine. |
9. Observability & Metrics
We optimize for p99 Latency.
Key Metrics
- End-to-End Latency: Time from keypress to UI render. Target < 100ms.
- Server Processing Time: Time to traverse Trie. Target < 5ms.
- Cache Hit Ratio: L1 (Browser) and L2 (Trie).
- Trie Memory Usage: Alert if RAM > 80% (Need more servers or compression).
Logs
- Sample 1% of queries for analysis. Do not log PII (Personal Identifiable Information) unless sanitized.
10. Deployment Strategy
Snapshot Rollout (Atomic Switch)
We rebuild the Trie offline every week/day.
- Build: Trie Builder creates
trie_v2.binand uploads to S3. - Download: All Typeahead servers download
trie_v2.binto a temp folder. - Hot Swap: The server loads v2 into RAM (double memory spike). Once loaded, it flips a pointer
current_trie = trie_v2and freestrie_v1. - No Downtime: Requests are never dropped.
11. Interview Gauntlet
Rapid Fire Questions
- Why not use a database like Redis? Redis Keys are O(1), but we need Prefix matching. Redis
SCANis too slow. We need a custom Trie in process memory to avoid network RTT for every keystroke. - How do you support ‘Personalization’? We can’t store a Trie per user. We store a small “User History List” in a separate DB (or cookie). We merge User History + Global Top-K at runtime.
- What if the Trie grows too big for RAM? 1. Radix Tree (Compress chains). 2. Sharding (A-M, N-Z) - complex but necessary for billions of terms. 3. Quantization (Store top-k scores as 1 byte instead of 4).
- How do you handle ‘profanity’? A Blocklist filter during the Trie Builder phase. We drop terms that match the blocklist so they never enter the production Trie.
12. Interactive Decision Visualizer: The Trie & Top-K
Visualize how the Trie stores paths. As you type, the system traverses the nodes. Notice how each node already “knows” its most popular completions (Cached Top-K), enabling O(1) retrieval.
O(1) Cached Top-K Results
13. Summary (Whiteboard)
Scale & Latency
- < 100ms End-to-End Latency
- In-Memory Trie (RAM) is mandatory
- Replication (Not Sharding) fits in RAM
Data Structures
- Trie: Prefix Tree O(L)
- Radix Tree: Compressed Trie
- Top-K Cache: Store results at each node
Architecture
- Write Path: Async (Logs -> Spark -> Trie)
- Read Path: Sync (Browser -> Trie)
- Deployment: Hot Swap (Double Buffering)
Challenges
- Memory: 32GB RAM limit? (Use Sharding)
- Updates: Real-time trends (Use separate 'Trending' Trie)
- Typo: Spell check fallback