Design LeetCode (Remote Code Execution)
1. What is a Code Execution Service?
A Remote Code Execution (RCE) service allows users to submit code in various languages (Python, C++, Java), executes it against a set of test cases, and returns the results (Pass/Fail, Runtime, Memory Usage).
[!WARNING] The Danger Zone: This is one of the most dangerous systems to build. You are literally inviting strangers to run arbitrary code on your servers. Without proper Sandboxing, a user could
rm -rf /, mine crypto, or scan your internal network.
Real-World Examples
- LeetCode / HackerRank: Competitive programming platforms.
- Judge0: Open-source RCE API.
- Replit / CodeSandbox: Online IDEs (stateful environments).
2. Requirements & Goals
Functional Requirements
- Code Submission: User submits source code and language.
- Execution: System runs code against test cases.
- Feedback: Returns Standard Output (STDOUT), Standard Error (STDERR), and Verdict (Accepted, Wrong Answer, TLE).
- Limits: Enforce strict Time Limits (e.g., 2s) and Memory Limits (e.g., 256MB).
Non-Functional Requirements
- Security (Critical): Zero Trust. Code must run in total isolation.
- Performance: Low overhead for container startup. Users expect results in seconds.
- Concurrency: Handle thousands of simultaneous submissions.
3. Capacity Estimation
- Daily Submissions: 1 Million.
- Peak Traffic: 100 submissions/sec (during contests).
- Execution Time: Average 2 seconds per task.
- Compute Resources: Since tasks are CPU-bound, we need scalable worker nodes.
- 100 tasks/sec $\times$ 2 sec/task = 200 concurrent containers running.
- If each core handles 1 container, we need ~200 vCPUs (e.g., 25-50 large instances).
4. System APIs
Submit Code
POST /v1/submissions
{
"code": "print('hello')",
"language": "python3",
"problem_id": "123"
}
Response: { "submission_id": "sub_abc123" }
Get Status (Polling)
GET /v1/submissions/sub_abc123
Response:
{
"status": "PROCESSING" // or COMPLETED
"result": { "verdict": "Accepted", "runtime_ms": 45 }
}
5. Database Design
We need to store submission history and problem data. For more on relational schema design, see Module 04: Database Basics.
1. Submissions Table (Postgres)
| Column | Type | Description |
|:——–|:——–|:——–|
| id | UUID | Primary Key |
| user_id | UUID | The submitter |
| problem_id | UUID | The problem solved |
| code | TEXT | The source code (Stored in S3 if large) |
| status | ENUM | PENDING, PROCESSING, AC, WA, TLE, RE |
| runtime | INT | Execution time in ms |
| memory | INT | Memory usage in KB |
2. Test Cases (Object Store / S3)
- Test cases are often large files.
- Structure:
s3://problems/{problem_id}/input/1.txt,s3://problems/{problem_id}/output/1.txt. - Worker nodes download these during execution.
6. High-Level Design
High-Level Architecture: The Secure Execution Pipeline.
The system follows an Asynchronous Worker Pattern to handle long-running code execution without blocking the API:
- API Gateway / Submissions Svc: Validates the request, saves initial metadata to PostgreSQL, and pushes the job into a Redis Job Queue (See Module 08: Messaging for more).
- Job Queue: Decouples the API from execution. This buffers traffic bursts and allows for easy scaling of worker nodes.
- Worker Nodes: Independent compute nodes that poll the queue via
RPOP. - Sandbox Runtime (The Isolation Zone):
- Ephemeral Sandbox: The worker creates a secure environment (e.g., gVisor or Firecracker).
- Test Case Mounting: Mounts test cases from S3 as Read-Only.
- Execution: Runs the untrusted code while intercepting syscalls.
- Result Store: Once execution finishes, the worker updates the submission status in PostgreSQL (e.g., AC, WA, TLE).
[!TIP] Why Async? Code execution takes time (1-10s). Keeping an HTTP connection open is brittle. Polling or WebSockets is better for the client.
7. Deep Dive: Isolation Strategy (The Sandbox)
How do we prevent system("rm -rf /") or kernel exploits? This is handled by the Security Boundary shown in our architecture.
Level 1: Standard Containers (Docker)
- Tool: Docker, LXC.
- Pros: Fast boot (milliseconds).
- Cons: Shared Kernel. All containers share the host OS kernel. If a hacker finds a kernel vulnerability (e.g., Dirty COW), they can escape the container and take over the host.
- Verdict: Not secure enough for untrusted public code.
Level 2: Virtual Machines (VMs)
- Tool: AWS EC2, VMWare.
- Pros: Hardware-level virtualization (Hypervisor). Very secure.
- Cons: Slow boot time (minutes). Too heavy for a 2-second script.
Level 3: MicroVMs / User-Space Kernels (The Gold Standard)
This bridges the gap between VM security and Container speed.
- gVisor (Google):
- Intercepts syscalls in User Space.
- The “Guest” application talks to gVisor (Sentry), not the Host Kernel.
- Acts as a “security proxy” for syscalls.
- Firecracker (AWS Lambda):
- Lightweight KVM-based microVMs.
- Boots in < 125ms.
- Used by AWS Lambda and Fargate.
8. Defense in Depth (Specific Mitigations)
Even with MicroVMs, apply these Linux primitives:
A. Preventing “Fork Bombs” (cgroups)
A Fork Bomb (while(1) fork()) crashes a server by exhausting the Process ID (PID) table.
- Solution: Control Groups (cgroups).
- Configure
pids.max = 64. If the code tries to spawn the 65th process, the kernel blocks it. - Also limit CPU shares and Memory (OOM Killer).
B. Preventing Network Scans (Namespaces)
Users shouldn’t scan your internal AWS VPC.
- Solution: Network Namespaces.
- Run the container with No Network access (
--network none). - Only map STDIN/STDOUT.
C. Preventing File System Damage (Seccomp)
Users shouldn’t read /etc/passwd.
- Solution:
- Mount Root FS as Read-Only.
- Use Seccomp (Secure Computing Mode) to whitelist only necessary syscalls (read, write, exit). Block
socket,execve(except strictly controlled paths).
9. Data Partitioning & Sharding
We generate millions of submissions. A single DB won’t hold up.
Sharding Strategy: Shard by submission_id
- Shard by
user_id: Good for “Show me all my submissions”. Bad for global analytics or if one user spams. - Shard by
submission_id: Even distribution. But “Show me my submissions” requires Scatter-Gather. - Decision: LeetCode is Write-Heavy during contests. We likely prioritize Write Throughput, so Sharding by
submission_id(or using a dedicated high-write store like Cassandra/DynamoDB) is preferred. For user history, we can maintain a secondary index.
10. Interactive Decision Visualizer: The Secure Pipeline
This demo visualizes how different layers of defense block different types of attacks. Select an Attack Vector and see which layer catches it.
[!TIP] Try it yourself: Click “Fork Bomb” or “File Deletion” to see how Linux cgroups and Seccomp filters block these attacks in real-time.
11. System Walkthrough: The Life of a Submission
Let’s trace a user submitting Python code that tries to access the network.
Step 1: Submission
- User sends code via
POST /submissions.{ "code": "import socket; s = socket.socket(); s.connect(('google.com', 80))", "lang": "python3" } - API Gateway generates
submission_id: "abc-123"and pushes to Redis:RPUSH submission_queue "{\"id\":\"abc-123\", \"lang\":\"python3\", ...}"
Step 2: Worker Processing
- Worker Node (Golang) pulls the job:
BLPOP submission_queue 0. - It launches a Firecracker MicroVM with restricted arguments:
# Conceptual command firecracker-run \ --kernel vmlinux \ --rootfs python3-rootfs.ext4 \ --network none \ # <--- Network Isolation --cpu-template T2 \ --memory 128M
Step 3: Execution & Interception
- The Python code runs inside the MicroVM.
- It tries to call the
connect()syscall. - The Kernel (inside MicroVM) checks the network namespace. It sees no network interfaces (only loopback).
- The syscall fails with
ENETUNREACH(Network is unreachable).
Step 4: Result Collection
- The worker captures
STDERR:OSError: [Errno 101] Network is unreachable. - It writes the result to Postgres:
UPDATE submissions SET status='RUNTIME_ERROR', stderr='Network unreachable...' WHERE id='abc-123';
12. Requirements Traceability Matrix
| Requirement | Architectural Solution |
|---|---|
| Code Isolation | gVisor / Firecracker (MicroVMs) prevent kernel sharing. |
| Resource Limits | cgroups enforce CPU, Memory, and PID limits. |
| Network Security | Network Namespaces (--network none) block internet access. |
| File System Security | Read-Only RootFS + Seccomp whitelist prevents rm -rf. |
| Scalability | Redis Job Queue decouples API from Workers. Auto-scaling workers. |
| Concurrency | Firecracker boots in <125ms, allowing high density (thousands per node). |
13. Follow-Up Questions: The Interview Gauntlet
I. Security & Isolation
- Why is Docker not enough? Docker shares the Host Kernel. A kernel exploit (e.g., Dirty Pipe) allows root access to the host.
- Explain Seccomp. It stands for Secure Computing. It’s a BPF filter that whitelists syscalls. If a process calls
socket()and it’s not whitelisted, the kernel kills the process. - How to prevent Infinite Loops? Use
setrlimit(RLIMIT_CPU)in the runner code + a hard timeout (SIGKILL) from the worker supervisor after 2 seconds. - How to prevent memory exhaustion?
cgroupsmemory limit. The OOM Killer will kill the specific container, not the worker node.
II. Scalability
- What if the Queue backs up? Auto-scale the Worker Group based on
LLEN(submission_queue). If queue > 1000, add 10 nodes. - Handling Large Outputs: If a user prints 1GB of text, it fills the disk. Limit
STDOUTcapture to 100KB. Truncate the rest. - Shard Strategy: Shard DB by
submission_id. No need for complex cross-shard joins.
III. Operational Excellence
- How to update the runtime (e.g., Python 3.9 -> 3.10)? Build a new RootFS image. Rolling update the workers to use the new image.
- Malicious Users: Rate limit by User ID. If a user triggers Security Violations repeatedly, ban the account.
14. Summary: The Whiteboard Strategy
If asked to design LeetCode, draw this 4-Quadrant Layout:
1. Requirements
- Func: Execute Code, Feedback, Limits.
- Non-Func: Security (Sandbox), Speed (<2s).
- Scale: 100 QPS (Burst).
2. Architecture
↓
[Worker Group]
[Firecracker VM (Seccomp/NS)]
↓
[DB]
* Async Worker: Decouples execution.
* Firecracker: MicroVM Isolation.
3. Data & API
DB: Submissions(id, user, status, result)
S3: Test Cases (Read-Only)
4. Security Layers
- Network: Namespace (`--net none`).
- FS: Read-Only RootFS.
- Syscalls: Seccomp Whitelist.
- Kernel: gVisor / MicroVM.