Design a URL Shortener (TinyURL)
[!IMPORTANT] In this lesson, you will master:
- 1. What is a URL Shortener?: Building intuition behind 1. what is a url shortener?.
- 2. Requirements & Goals: Building intuition behind 2. requirements & goals.
- 3. Capacity Estimation: Building intuition behind 3. capacity estimation.
1. What is a URL Shortener?
A URL Shortener is a service that creates shorter aliases for long URLs. When a user clicks these short aliases, they are redirected to the original long URL. This service is a staple of system design interviews because it touches on all the fundamentals: hashing, databases, caching, and concurrency.
Real-World Examples:
- TinyURL: The classic example.
- Bit.ly: Focuses on analytics and enterprise features.
- t.co: Twitter’s internal shortener to save character count.
[!TIP] Try it yourself: Imagine you are sending an SMS. Would you rather type
https://www.google.com/search?q=system+design+interview+guide...orhttps://tiny.url/x7z9? The latter is cleaner, cheaper (SMS cost), and easier to type.
2. Requirements & Goals (P - Problem & Requirements)
Functional Requirements
- Shorten: Given a long URL, generate a unique short alias (e.g.,
http://tiny.url/j8s9). - Redirect: Given a short alias, redirect the user to the original long URL.
- Custom Alias (Optional): Users should be able to specify a custom alias (e.g.,
tiny.url/my-blog). - Expiry: Links should expire after a default timespan (e.g., 5 years) to save space.
Non-Functional Requirements
- Availability: The system must be highly available. If the service is down, all URL redirections fail (Global Impact).
- Latency: URL redirection happens in real-time. Minimizing latency (< 100ms) is critical.
- Read-Heavy: There will be many more redirections (reads) than new link creations (writes). A 100:1 ratio is common.
Extended Requirements
- Analytics: Track click counts, location, and user agent.
- REST API: Allow other services to generate links programmatically.
3. Capacity Estimation (E - Estimation)
Traffic Estimates
Let’s assume 100 Million new URLs are generated per month.
- Write QPS: 100,000,000 / (30 days \times 24 hours \times 3600 seconds) = 100,000,000 / 2,592,000 \approx 40 writes/sec
- Read QPS (100:1 ratio): 40 \times 100 = 4,000 reads/sec
- Peak Traffic: Assume 5x multiplier for spikes. Write: 200 QPS, Read: 20k QPS.
Storage Estimates
Let’s assume we store the data for 5 years.
- Total URLs: 100M \times 12 \text{ months} \times 5 \text{ years} = 6 \text{ Billion records}
- Average URL size: 500 bytes.
- Total Storage: 6 \text{ Billion records} \times 500 \text{ bytes} = 3,000 \text{ Billion bytes} = 3 \text{ TB}
Bandwidth Estimates
- Ingress (Write): 40 \text{ writes/sec} \times 500 \text{ bytes} \approx 20 \text{ KB/s} (Negligible).
- Egress (Read): 4,000 \text{ reads/sec} \times 500 \text{ bytes} \approx 2 \text{ MB/s}.
- Note: This is just for the payload. If we include 302 redirects, the browser handles the payload of the target site, but our server just sends headers.
[!WARNING] While 3TB fits on a single modern hard drive, we distribute it across multiple nodes for Throughput (IOPS) and Reliability (Replication), not just capacity.
4. System APIs (D - Data Model)
We will use REST-style APIs for simplicity and compatibility.
Create Short URL
POST /api/v1/shorten
Request Body:
{
"longUrl": "https://www.very-long-website.com/article/123",
"customAlias": "my-article",
"expireDate": "2025-12-31"
}
Response:
- 200 OK: Success.
- 400 Bad Request: Invalid URL or Custom Alias already exists.
- 429 Too Many Requests: Rate limit exceeded.
{
"shortUrl": "https://tiny.url/my-article",
"error": null
}
Redirect
GET /{short_alias}
Response:
- Status: 301 or 302 (See below).
- Location:
https://www.very-long-website.com/article/123 - Cache-Control:
public, max-age=3600(If 301).
5. Database Design (D - Data Model)
We need to store 6 Billion records. The data is simple: ShortKey → LongURL.
SQL Schema (MySQL/PostgreSQL)
CREATE TABLE urls (
id BIGINT AUTO_INCREMENT PRIMARY KEY, -- 8 bytes, used for Base62
short_key VARCHAR(7) NOT NULL, -- 7 bytes (Indexed)
long_url VARCHAR(2048) NOT NULL, -- 2 KB max
user_id BIGINT, -- Foreign Key
created_at TIMESTAMP DEFAULT NOW(),
expiration TIMESTAMP
);
CREATE INDEX idx_short_key ON urls(short_key);
NoSQL Schema (DynamoDB/Cassandra)
Since we don’t need complex joins, NoSQL is a great fit for horizontal scaling.
{
"pk": "short_key:x7z9", // Partition Key
"long_url": "https://...",
"created_at": 1672531200
}
Trade-off: SQL provides ACID guarantees (useful for uniqueness checks). NoSQL provides easier scaling. For this problem, both work well, but SQL is often preferred for the AUTO_INCREMENT feature which aids Base62 encoding.
6. High-Level Design (A - Architecture)
Interview-Friendly High-Level Diagram
This is the simplified version of the architecture you should draw on the whiteboard.
graph TD
User([User]) --> LB[Load Balancer]
LB --> API[API Gateway]
subgraph Service Layer
API --> ShortenSvc[Shortener Service]
API --> RedirectSvc[Redirect Service]
ShortenSvc --> KGS[Key Generation Service]
end
subgraph Storage Layer
RedirectSvc --> Cache[(Redis Cache)]
ShortenSvc --> DB[(Relational DB)]
RedirectSvc --> DB
KGS --> KeyDB[(KGS DB)]
end
The architecture separates the Write Path (Shortening) from the Read Path (Redirecting) to optimize for the 100:1 read ratio.
7. Component Design: Generating Keys (L - Localized Details)
This is the core problem. How do we generate a short, unique string like x7z9?
Approach A: Hashing (MD5/SHA256)
Hash the Long URL: MD5(long_url).
- Result is 128-bit string (32 hex chars).
- Truncate to 7 characters.
- Problem: Collisions. Two different URLs might hash to the same first 7 characters.
- Resolution: If collision, append a salt (e.g., current time) and re-hash. This requires checking the DB for every write, which is slow (Bloom Filters can help, but it’s still probabilistic).
Approach B: Base64 Encoding
Base64 uses A-Z, a-z, 0-9, +, /.
- Problem: The characters
+and/are reserved in URLs. They might be interpreted as spaces or directory separators. - Resolution: Use “Base64URL” variant (
-and_instead). However, Base62 is standard for this use case.
Approach C: Base62 Conversion (The Winner)
We use a Counter (Database ID or Distributed ID) and convert it to Base62. This is a Bijective (one-to-one) mapping, guaranteeing no collisions.
- Base62 Characters:
0-9(10) +a-z(26) +A-Z(26) = 62. - Why 7 characters? 627 \approx 3.5 \text{ Trillion} combinations. Enough for 100 years.
Algorithm:
- Get Unique ID:
123456789 - Convert to Base62:
8M0kX - Short URL:
tiny.url/8M0kX
Key Generation Service (KGS)
To scale, we don’t want the App Server to hit the Database for a new ID every time.
- KGS: A dedicated service (see “KGS” in the diagram) that pre-generates keys (e.g.,
abc1,abc2) and stores them in a Redis Set (labeled “Redis (Keys)”). - Concurrency: App servers
POPa key from Redis. This is atomic and fast. - Two Tables:
unused_keys: Keys ready to be assigned.used_keys: Keys already assigned.
- Trade-off: If the KGS crashes, we lose the pre-generated keys in memory. This is acceptable as we have trillions of keys available.
8. Data Partitioning & Sharding (S - Scale)
To handle 6 Billion records and 4000 reads/sec (per node), we must shard the database.
Strategy 1: Range-Based Sharding
- Method: Partition by the first character of the Short Key.
- Shard A: Keys starting with
A-C - Shard B: Keys starting with
D-F - Pros: Simple to implement.
- Cons: Unbalanced Load. If many URLs start with ‘A’, Shard A becomes a hotspot.
Strategy 2: Hash-Based Sharding (Consistent Hashing)
- Method:
ShardID = hash(short_key) % NumShards. - Pros: Even Distribution. Traffic is spread uniformly across all nodes.
- Cons: Re-sharding (adding nodes) is complex, requiring data migration. We use Consistent Hashing to minimize movement.
Verdict: Use Hash-Based Sharding on the short_key.
9. Reliability, Caching, & Load Balancing
301 vs 302 Redirects
- 301 Moved Permanently: Browser caches the redirect. Subsequent requests bypass our server. Fast, but we lose analytics.
- 302 Found: Browser hits our server every time. Slower, but allows accurate tracking.
- Choice: Use 302 if analytics are required.
Caching Strategy
- Cache-Aside: App checks Redis → Miss → DB → Write to Redis.
- Eviction: Use LRU (Least Recently Used).
- Pareto Principle: 20% of URLs generate 80% of traffic. Caching just the top 20% fits easily in memory.
Reliability
- Replication: Use Master-Slave replication for the DB. If Master fails, promote Slave.
- Geo-DNS: Use a CDN or Geo-DNS to route users to the nearest data center.
10. Interactive Decision Visualizer: Hash vs Base62
Explore why we prefer Base62 over simple Hashing. Try the Collision Mode to see why hashing isn’t perfect for uniqueness.
[!TIP] Try it yourself: Switch to “Collision Mode” in the Hash tab and type slightly different text to see if they collide (simulated).
**The Gold Standard**: Convert a unique Integer ID (from Database or Snowflake) into a Base62 string. Bijective Zero Collisions
11. System Walkthrough (Dry Run)
Let’s trace a request through the system.
Scenario: User creates a short link
1. Request
User sends POST /shorten to the Load Balancer.
{
"longUrl": "https://www.google.com"
}
2. Key Generation The Shortener Service contacts Redis (KGS) to get a unique key.
- Redis Command:
SPOP unused_keys 1 - Result:
"8M0kX"
3. Persistence The service writes to the Primary DB.
INSERT INTO urls (short_key, long_url) VALUES ('8M0kX', 'https://www.google.com');
4. Response Service returns the short URL.
{
"shortUrl": "https://tiny.url/8M0kX"
}
Scenario: User clicks a short link
1. Request
User visits https://tiny.url/8M0kX. Request hits Redirect Service.
2. Cache Check Service checks Redis Cache.
- Redis Command:
GET url:8M0kX - Result (Hit):
"https://www.google.com" - Result (Miss):
nil→ Read from DB → Write to Redis.
3. Analytics (Async) Service pushes an event to Kafka.
{
"topic": "clicks_topic",
"key": "8M0kX",
"payload": { "ts": 1672531200, "ua": "Mozilla/5.0", "ip": "1.2.3.4" }
}
4. Redirect Service returns HTTP 302.
- Header:
Location: https://www.google.com
12. Interview Gauntlet
Be ready for these follow-up questions:
- What if two users shorten the same long URL?
- Ans: You can return the same short key (deduplication) to save space, or generate a new one. Deduplication requires checking the DB for every write, which is slow (Bloom Filters can help, but it’s still probabilistic).
- How do you prevent users from guessing IDs?
- Ans: Base62 is sequential. Add a random salt before encoding, or shuffle the alphabet.
- What if the Redis KGS runs out of keys?
- Ans: A background worker monitors the size of the
unused_keysset. When it drops below 10k, it fetches another batch from the KGS database.
- Ans: A background worker monitors the size of the
- How do you handle DB sharding?
- Ans: Shard by
short_key(hash-based). This ensures evenly distributed traffic.
- Ans: Shard by
- How do you implement custom aliases?
- Ans: Check if the alias exists in the DB. If not, insert it. This requires a unique constraint on the
short_keycolumn.
- Ans: Check if the alias exists in the DB. If not, insert it. This requires a unique constraint on the
- What happens if the cache is full?
- Ans: Use an LRU (Least Recently Used) eviction policy.
- How do you filter malicious URLs?
- Ans: Run the long URL against a blacklist (e.g., Google Safe Browsing API) before shortening.
- Can we use Zookeeper for ID generation?
- Ans: Yes, ZK can manage ranges (e.g., Server A gets 1-1000, Server B gets 1001-2000).
- Why not use UUIDs?
- Ans: UUIDs are 36 characters. Base62 of a UUID is still too long (~22 chars). We want short URLs (7 chars).
- How to handle heavy analytics writes?
- Ans: Never write to DB on the read path. Use Kafka to buffer events and write in batches to a Columnar DB (ClickHouse).
- How do you implement URL expiration without impacting read latency?
- Ans: Run a background job (e.g., cron) that periodically deletes expired keys from the DB. In the read path, if the key is present but
expireDate < now(), treat it as invalid and delete it lazily (Lazy Eviction).
- Ans: Run a background job (e.g., cron) that periodically deletes expired keys from the DB. In the read path, if the key is present but
- What if the KGS single database goes down?
- Ans: The KGS DB should be replicated (Master-Slave). If Master dies, promote Slave. Since KGS pre-allocates keys to workers in large chunks, the workers can still generate URLs while the DB is recovering.
13. Whiteboard Summary (4 Quadrant)
1. P - Core Requirements
- Shorten (Write): 40 QPS
- Redirect (Read): 4k QPS (100:1)
- Unique 7-char Key (Base62)
- 301 vs 302 Redirect
2. A - Key Architecture
- Write Path: KGS → Redis → DB
- Read Path: Cache → DB
- Async: Kafka for Analytics
- Sharding: Hash(short_key)
3. D - Data Models
- URLs Table: `id`, `short_key`, `long_url`
- Redis: `unused_keys` (Set), `url:{key}` (String)
- Analytics: Columnar DB (ClickHouse)
4. S - Scale & Bottlenecks
- Collision: Use Pre-generated Keys (KGS).
- Latency: Cache 20% hot URLs (Pareto).
- Analytics Lag: Use Kafka (Async).
- DB Space: TTL or cleanup job.