Design Typeahead (Autocomplete)

1. What is Typeahead?

Typeahead (or Autocomplete) is the feature where a system suggests completions for a query as the user types.

Examples: Google Search, Amazon Product Search, VS Code IntelliSense.
Goal: Save user keystrokes and guide them to popular queries.

Try it yourself: Go to Google. Type ‘sys’. See ‘system of a down’, ‘system design’. Type ‘xyz’. See how suggestions change instantly. Open Network tab to see the JSON response.

2. Requirements & Goals

Functional Requirements

Suggest: Return top 5 terms starting with the prefix.
Ranking: Suggestions must be ranked by popularity (frequency) or personalization.

Non-Functional Requirements

Ultra Low Latency: < 100ms. The user is typing fast; results must appear instantly.
High Availability: Critical for UX.
Eventual Consistency: It’s okay if a new trending term takes 5 minutes to appear.

3. Capacity Estimation

DAU: 500 Million.
Searches: 10 searches/user/day = 5 Billion searches.
Keystrokes: Avg query length = 4 chars.
QPS: 5 Billion × 4 = 20 Billion requests/day.
Peak QPS: ~400,000 QPS.
This is a Read-Heavy system (Ratio 20:1 or higher).

RAM Estimation (Trie Size)

100 Million unique terms.
Avg term length: 10 chars.
Raw Data: 1 GB.
Trie Overhead: Nodes, Pointers, Top-K Lists.
Estimate: ~15 GB to 30 GB. This fits in the RAM of a single modern server!

4. System APIs

GET /suggestions?prefix=ama&limit=5

Response:

{
  "prefix": "ama",
  "results": [
    { "term": "amazon", "score": 98 },
    { "term": "amazon prime", "score": 95 },
    { "term": "amazing spiderman", "score": 88 },
    { "term": "amazon music", "score": 85 },
    { "term": "amanda seyfried", "score": 80 }
  ]
}

5. Database Design

We cannot use a standard SQL DB (SELECT * FROM queries WHERE text LIKE 'ama%' ORDER BY freq DESC). This is O(N) and too slow for 400k QPS.

We need a specialized data structure: The Trie (Prefix Tree).

Storage Hierarchy

L1 Cache (Browser): Cache results for “ama” for 1 hour.
L2 Cache (In-Memory Trie): Redis or Custom Memcached storing the Trie.
L3 Persistence (Cassandra / DynamoDB): Stores the raw search logs and aggregated frequencies for offline processing.

6. High-Level Design

Typeahead System Architecture.

System Architecture: Real-Time Typeahead

L1/L2 Caching | In-Memory Trie | Async Data Pipeline

Query Flow

Data Pipeline

Update Flow

Client

Real-Time Service

Async Data Pipeline

Browser

L1 Cache (Local)

API Gateway

Trie Service Cluster

Node A RAM Trie

Node B RAM Trie

Node C RAM Trie

Spell Checker

Levenshtein Dist

Kafka Logs

Aggregator

MapReduce
Freq Count

Trie Builder

Serialize
Weekly Job

Trie DB (S3)

Snapshots

Client: Sends keystrokes to API Gateway.
Trie Service Cluster: In-memory Trie nodes return Top-K results.
Spell Checker: Fallback if no prefix match found.
Async Data Pipeline:
- Kafka Logs: Buffers query usage.
- Spark Aggregator: Calculates frequencies.
- Trie Builder: Serializes Trie to Trie DB (S3).

7. Component Design (Deep Dive)

The Trie Data Structure

Root -> A -> M -> A.
Node A-M-A stores:
1. Children: Z (for amazon), Z (for amazing).
2. Cached Top-K: A list of the top 5 most popular queries that start with “AMA”.

Why Cache Top-K at Every Node?

If we don’t cache, we have to traverse the entire subtree of “AMA” to find the most popular ones. This is too slow.

Write Time: Expensive (Updating frequency means bubbling up changes).
Read Time: O(1). Just read the list at the node.

Trie Serialization

How do we save the Trie to disk or send it over the network?

DFS Traversal: Save as Node(Char, List<Children>, TopK).
Protocol Buffers: Very compact binary format.
Optimization: Since the Trie fits in RAM, we can serialize the whole object graph to a file and load it on startup.

Trie Optimization: Radix Tree

A standard Trie can be memory inefficient if we have long branches with no branching (e.g., b->e->s->t).

Radix Tree (Compressed Trie): We merge nodes with single children.
Instead of b -> e -> s -> t, we store best in a single node.
This significantly reduces the number of nodes and pointer overhead.

Fuzzy Search

What if the user types “amzon” (missing ‘a’)?

We can use Levenshtein Distance (Edit Distance) to find prefixes that are “1 edit away” from the user’s input.
However, this is expensive (O(LM)). In practice, we usually rely on a separate Spell Check Service to correct the query *before sending it to the Typeahead service.

Data Collection Pipeline

Visualized in the Async Data Pipeline zone of the diagram. How do we update the Trie?

Log Service: Buffers search logs to Kafka.
Aggregator (Spark): Every hour, calculate frequencies of all terms.
Trie Builder: Rebuilds the Trie from the aggregated data.
Deployment: Push new Trie to Trie Servers (Load Snapshot).

8. Requirements Traceability

Requirement	Design Decision	Justification
Ultra Low Latency	In-Memory Trie	RAM access (100ns) vs Disk (10ms). Ensures < 20ms processing time.
Availability	Replication	Replicate the full Trie on 100+ servers. No sharding complexity needed.
Scalability	L1 Browser Cache	Offloads 30-50% of requests (backspaces/repeats) to the client.
Freshness	Async Pipeline	Decouples write path (Logs) from read path (Trie). Eventual consistency is fine.

9. Observability & Metrics

We optimize for p99 Latency.

Key Metrics

End-to-End Latency: Time from keypress to UI render. Target < 100ms.
Server Processing Time: Time to traverse Trie. Target < 5ms.
Cache Hit Ratio: L1 (Browser) and L2 (Trie).
Trie Memory Usage: Alert if RAM > 80% (Need more servers or compression).

Logs

Sample 1% of queries for analysis. Do not log PII (Personal Identifiable Information) unless sanitized.

10. Deployment Strategy

Snapshot Rollout (Atomic Switch)

We rebuild the Trie offline every week/day.

Build: Trie Builder creates trie_v2.bin and uploads to S3.
Download: All Typeahead servers download trie_v2.bin to a temp folder.
Hot Swap: The server loads v2 into RAM (double memory spike). Once loaded, it flips a pointer current_trie = trie_v2 and frees trie_v1.
No Downtime: Requests are never dropped.

11. Interview Gauntlet

Rapid Fire Questions

Why not use a database like Redis? Redis Keys are O(1), but we need Prefix matching. Redis SCAN is too slow. We need a custom Trie in process memory to avoid network RTT for every keystroke.
How do you support ‘Personalization’? We can’t store a Trie per user. We store a small “User History List” in a separate DB (or cookie). We merge User History + Global Top-K at runtime.
What if the Trie grows too big for RAM? 1. Radix Tree (Compress chains). 2. Sharding (A-M, N-Z) - complex but necessary for billions of terms. 3. Quantization (Store top-k scores as 1 byte instead of 4).
How do you handle ‘profanity’? A Blocklist filter during the Trie Builder phase. We drop terms that match the blocklist so they never enter the production Trie.

12. Interactive Decision Visualizer: The Trie & Top-K

Visualize how the Trie stores paths. As you type, the system traverses the nodes. Notice how each node already “knows” its most popular completions (Cached Top-K), enabling O(1) retrieval.

ROOT

...

O(1) Cached Top-K Results

Start typing to query the Trie.

13. Summary (Whiteboard)

Scale & Latency

< 100ms End-to-End Latency
In-Memory Trie (RAM) is mandatory
Replication (Not Sharding) fits in RAM

Data Structures

Trie: Prefix Tree O(L)
Radix Tree: Compressed Trie
Top-K Cache: Store results at each node

Architecture

Write Path: Async (Logs -> Spark -> Trie)
Read Path: Sync (Browser -> Trie)
Deployment: Hot Swap (Double Buffering)

Challenges

Memory: 32GB RAM limit? (Use Sharding)
Updates: Real-time trends (Use separate 'Trending' Trie)
Typo: Spell check fallback