Module 10 Review: Simple Services
Congratulations on completing Module 10! These “Simple” services are the building blocks of every major distributed system. A TinyURL is just a distributed hash map; Pastebin is just an Object Store wrapper. Mastering these gives you the patterns to solve much larger problems.
1. Key Takeaways
- URL Shortener: The core challenge is Base62 conversion. Use a KGS (Key Generation Service) to pre-generate keys for high throughput and zero collisions. Partition by Hash(short_key) for even distribution.
- Pastebin: The core challenge is Data Size. Never store large blobs in a relational DB (Buffer Pool pollution). Use the Split Architecture: Metadata in SQL, Content in Object Store (S3), and Presigned URLs for direct uploads.
- Rate Limiter: The core challenge is Counting. Token Bucket allows bursts; Leaky Bucket smooths traffic. In distributed systems, use Redis + Lua to prevent Race Conditions (Read-Modify-Write gaps).
- Unique ID (Snowflake): The core challenge is Sorting. Random UUIDs destroy DB indexing performance (Page Splits). Use Timestamp-based IDs (Snowflake) to keep B-Trees happy and sequential.
2. Interactive Flashcards
Test your knowledge. Click to flip.
Base62 vs Base64?
Why do we use Base62 for URL Shorteners instead of Base64?
URL Safety
Base64 contains `+` and `/` which are special URL characters. Base62 (`0-9, a-z, A-Z`) is completely URL-safe and doesn't require encoding.
301 vs 302 Redirect?
Which one should you use for a URL Shortener if Analytics is a priority?
302 Found
302 prevents browser caching, forcing every request to hit your server. This ensures you can track every click. 301 caches the redirect, bypassing your server.
Why Presigned URLs?
Why should clients upload large files directly to S3 using a Presigned URL instead of via the App Server?
Bandwidth & CPU
Uploading 10MB via the App Server consumes server bandwidth and memory. Direct upload offloads this heavy lifting to S3, keeping the App Server lightweight.
Why NOT UUID in SQL?
Why are random UUIDs bad for Primary Keys in MySQL/Postgres?
B-Tree Fragmentation
UUIDs are random. Inserting them into a B-Tree causes frequent **Page Splits** and random disk I/O, destroying write performance and cache locality.
Buffer Pool Pollution
What happens if you store large 10MB Blobs inside a MySQL row along with metadata?
Cache Eviction
Reading a row loads the 10MB blob into RAM (Buffer Pool), pushing out valuable hot data (indexes/users). This kills the Cache Hit Ratio.
Redis Race Condition
How do you solve the race condition when two servers try to decrement a Rate Limit counter simultaneously?
Lua Scripts
Redis executes Lua scripts atomically. Reading the value, checking logic, and decrementing it happens in a single frozen moment (transaction), preventing races.
3. System Design Cheat Sheet
| Service | Primary Challenge | Database Pattern | Key Algorithm |
|---|---|---|---|
| URL Shortener | Write-Heavy unique ID generation | K-V Store (Dynamo/Redis) or SQL | Base62 Encoding + KGS |
| Pastebin | Large Data Storage | Split Architecture (SQL Metadata + S3 Object) | Presigned URLs (Direct Upload) |
| Rate Limiter | Accurate Counting at Scale | In-Memory Cache (Redis) | Token Bucket + Lua Scripts |
| Unique ID | Distributed Sorting & Uniqueness | None (Self-contained nodes) | Twitter Snowflake |
[!TIP] Final thought: In an interview, always start simple. “I’ll use a Database ID”. Then evolve: “Oh wait, at 100M users, the DB locks up. I need a distributed ID like Snowflake.” This shows your journey from Junior to Senior.