Design LeetCode (Remote Code Execution)

1. What is a Code Execution Service?

A Remote Code Execution (RCE) service allows users to submit code in various languages (Python, C++, Java), executes it against a set of test cases, and returns the results (Pass/Fail, Runtime, Memory Usage).

[!WARNING] The Danger Zone: This is one of the most dangerous systems to build. You are literally inviting strangers to run arbitrary code on your servers. Without proper Sandboxing, a user could rm -rf /, mine crypto, or scan your internal network.

Real-World Examples

  • LeetCode / HackerRank: Competitive programming platforms.
  • Judge0: Open-source RCE API.
  • Replit / CodeSandbox: Online IDEs (stateful environments).

2. Requirements & Goals

Functional Requirements

  1. Code Submission: User submits source code and language.
  2. Execution: System runs code against test cases.
  3. Feedback: Returns Standard Output (STDOUT), Standard Error (STDERR), and Verdict (Accepted, Wrong Answer, TLE).
  4. Limits: Enforce strict Time Limits (e.g., 2s) and Memory Limits (e.g., 256MB).

Non-Functional Requirements

  1. Security (Critical): Zero Trust. Code must run in total isolation.
  2. Performance: Low overhead for container startup. Users expect results in seconds.
  3. Concurrency: Handle thousands of simultaneous submissions.

3. Capacity Estimation

  • Daily Submissions: 1 Million.
  • Peak Traffic: 100 submissions/sec (during contests).
  • Execution Time: Average 2 seconds per task.
  • Compute Resources: Since tasks are CPU-bound, we need scalable worker nodes.
    • 100 tasks/sec $\times$ 2 sec/task = 200 concurrent containers running.
    • If each core handles 1 container, we need ~200 vCPUs (e.g., 25-50 large instances).

4. System APIs

Submit Code

POST /v1/submissions
{
  "code": "print('hello')",
  "language": "python3",
  "problem_id": "123"
}
Response: { "submission_id": "sub_abc123" }

Get Status (Polling)

GET /v1/submissions/sub_abc123
Response:
{
  "status": "PROCESSING" // or COMPLETED
  "result": { "verdict": "Accepted", "runtime_ms": 45 }
}

5. Database Design

We need to store submission history and problem data. For more on relational schema design, see Module 04: Database Basics.

1. Submissions Table (Postgres)

| Column | Type | Description | |:——–|:——–|:——–| | id | UUID | Primary Key | | user_id | UUID | The submitter | | problem_id | UUID | The problem solved | | code | TEXT | The source code (Stored in S3 if large) | | status | ENUM | PENDING, PROCESSING, AC, WA, TLE, RE | | runtime | INT | Execution time in ms | | memory | INT | Memory usage in KB |

2. Test Cases (Object Store / S3)

  • Test cases are often large files.
  • Structure: s3://problems/{problem_id}/input/1.txt, s3://problems/{problem_id}/output/1.txt.
  • Worker nodes download these during execution.

6. High-Level Design

High-Level Architecture: The Secure Execution Pipeline.

System Architecture: LeetCode Execution Engine
Async Job Queue | Multilingual Workers | gVisor & Firecracker Isolation
Submission Path
Result Feedback
Security Sandbox
API Layer
Auth Svc
Submission Svc
Redis
JOB_QUEUE
LPUSH / RPOP
Worker Node (vCPU Heavy)
SECURITY BOUNDARY
Sandbox: gVisor / Firecracker
RootFS
(Read-Only)
Cgroups
vCPU/RAM
Net Namespace
(None)
$ ./compiled_binary < input.txt
Syscalls Intercepted by gVisor Sentry
S3 (TestCases)
Mounted Read-Only
PostgreSQL
Submission Results
POST /submit RPOP job Update Status

The system follows an Asynchronous Worker Pattern to handle long-running code execution without blocking the API:

  1. API Gateway / Submissions Svc: Validates the request, saves initial metadata to PostgreSQL, and pushes the job into a Redis Job Queue (See Module 08: Messaging for more).
  2. Job Queue: Decouples the API from execution. This buffers traffic bursts and allows for easy scaling of worker nodes.
  3. Worker Nodes: Independent compute nodes that poll the queue via RPOP.
  4. Sandbox Runtime (The Isolation Zone):
    • Ephemeral Sandbox: The worker creates a secure environment (e.g., gVisor or Firecracker).
    • Test Case Mounting: Mounts test cases from S3 as Read-Only.
    • Execution: Runs the untrusted code while intercepting syscalls.
  5. Result Store: Once execution finishes, the worker updates the submission status in PostgreSQL (e.g., AC, WA, TLE).

[!TIP] Why Async? Code execution takes time (1-10s). Keeping an HTTP connection open is brittle. Polling or WebSockets is better for the client.


7. Deep Dive: Isolation Strategy (The Sandbox)

How do we prevent system("rm -rf /") or kernel exploits? This is handled by the Security Boundary shown in our architecture.

Level 1: Standard Containers (Docker)

  • Tool: Docker, LXC.
  • Pros: Fast boot (milliseconds).
  • Cons: Shared Kernel. All containers share the host OS kernel. If a hacker finds a kernel vulnerability (e.g., Dirty COW), they can escape the container and take over the host.
  • Verdict: Not secure enough for untrusted public code.

Level 2: Virtual Machines (VMs)

  • Tool: AWS EC2, VMWare.
  • Pros: Hardware-level virtualization (Hypervisor). Very secure.
  • Cons: Slow boot time (minutes). Too heavy for a 2-second script.

Level 3: MicroVMs / User-Space Kernels (The Gold Standard)

This bridges the gap between VM security and Container speed.

  1. gVisor (Google):
    • Intercepts syscalls in User Space.
    • The “Guest” application talks to gVisor (Sentry), not the Host Kernel.
    • Acts as a “security proxy” for syscalls.
  2. Firecracker (AWS Lambda):
    • Lightweight KVM-based microVMs.
    • Boots in < 125ms.
    • Used by AWS Lambda and Fargate.

8. Defense in Depth (Specific Mitigations)

Even with MicroVMs, apply these Linux primitives:

A. Preventing “Fork Bombs” (cgroups)

A Fork Bomb (while(1) fork()) crashes a server by exhausting the Process ID (PID) table.

  • Solution: Control Groups (cgroups).
  • Configure pids.max = 64. If the code tries to spawn the 65th process, the kernel blocks it.
  • Also limit CPU shares and Memory (OOM Killer).

B. Preventing Network Scans (Namespaces)

Users shouldn’t scan your internal AWS VPC.

  • Solution: Network Namespaces.
  • Run the container with No Network access (--network none).
  • Only map STDIN/STDOUT.

C. Preventing File System Damage (Seccomp)

Users shouldn’t read /etc/passwd.

  • Solution:
    1. Mount Root FS as Read-Only.
    2. Use Seccomp (Secure Computing Mode) to whitelist only necessary syscalls (read, write, exit). Block socket, execve (except strictly controlled paths).

9. Data Partitioning & Sharding

We generate millions of submissions. A single DB won’t hold up.

Sharding Strategy: Shard by submission_id

  • Shard by user_id: Good for “Show me all my submissions”. Bad for global analytics or if one user spams.
  • Shard by submission_id: Even distribution. But “Show me my submissions” requires Scatter-Gather.
  • Decision: LeetCode is Write-Heavy during contests. We likely prioritize Write Throughput, so Sharding by submission_id (or using a dedicated high-write store like Cassandra/DynamoDB) is preferred. For user history, we can maintain a secondary index.

10. Interactive Decision Visualizer: The Secure Pipeline

This demo visualizes how different layers of defense block different types of attacks. Select an Attack Vector and see which layer catches it.

[!TIP] Try it yourself: Click “Fork Bomb” or “File Deletion” to see how Linux cgroups and Seccomp filters block these attacks in real-time.

Attack Simulator
print("Hello World")
1
Job Queue
Pending
2
cgroups (pids.max)
Checks CPU/RAM/PIDs
3
Network Namespace
Checks Connectivity
4
Seccomp Filter
Checks Syscalls (rm, exec)
5
Verdict
-

11. System Walkthrough: The Life of a Submission

Let’s trace a user submitting Python code that tries to access the network.

Step 1: Submission

  • User sends code via POST /submissions.
    {
      "code": "import socket; s = socket.socket(); s.connect(('google.com', 80))",
      "lang": "python3"
    }
    
  • API Gateway generates submission_id: "abc-123" and pushes to Redis:
    RPUSH submission_queue "{\"id\":\"abc-123\", \"lang\":\"python3\", ...}"
    

Step 2: Worker Processing

  • Worker Node (Golang) pulls the job: BLPOP submission_queue 0.
  • It launches a Firecracker MicroVM with restricted arguments:
    # Conceptual command
    firecracker-run \
      --kernel vmlinux \
      --rootfs python3-rootfs.ext4 \
      --network none \  # <--- Network Isolation
      --cpu-template T2 \
      --memory 128M
    

Step 3: Execution & Interception

  • The Python code runs inside the MicroVM.
  • It tries to call the connect() syscall.
  • The Kernel (inside MicroVM) checks the network namespace. It sees no network interfaces (only loopback).
  • The syscall fails with ENETUNREACH (Network is unreachable).

Step 4: Result Collection

  • The worker captures STDERR: OSError: [Errno 101] Network is unreachable.
  • It writes the result to Postgres:
    UPDATE submissions SET status='RUNTIME_ERROR', stderr='Network unreachable...' WHERE id='abc-123';
    

12. Requirements Traceability Matrix

Requirement Architectural Solution
Code Isolation gVisor / Firecracker (MicroVMs) prevent kernel sharing.
Resource Limits cgroups enforce CPU, Memory, and PID limits.
Network Security Network Namespaces (--network none) block internet access.
File System Security Read-Only RootFS + Seccomp whitelist prevents rm -rf.
Scalability Redis Job Queue decouples API from Workers. Auto-scaling workers.
Concurrency Firecracker boots in <125ms, allowing high density (thousands per node).

13. Follow-Up Questions: The Interview Gauntlet

I. Security & Isolation

  • Why is Docker not enough? Docker shares the Host Kernel. A kernel exploit (e.g., Dirty Pipe) allows root access to the host.
  • Explain Seccomp. It stands for Secure Computing. It’s a BPF filter that whitelists syscalls. If a process calls socket() and it’s not whitelisted, the kernel kills the process.
  • How to prevent Infinite Loops? Use setrlimit(RLIMIT_CPU) in the runner code + a hard timeout (SIGKILL) from the worker supervisor after 2 seconds.
  • How to prevent memory exhaustion? cgroups memory limit. The OOM Killer will kill the specific container, not the worker node.

II. Scalability

  • What if the Queue backs up? Auto-scale the Worker Group based on LLEN(submission_queue). If queue > 1000, add 10 nodes.
  • Handling Large Outputs: If a user prints 1GB of text, it fills the disk. Limit STDOUT capture to 100KB. Truncate the rest.
  • Shard Strategy: Shard DB by submission_id. No need for complex cross-shard joins.

III. Operational Excellence

  • How to update the runtime (e.g., Python 3.9 -> 3.10)? Build a new RootFS image. Rolling update the workers to use the new image.
  • Malicious Users: Rate limit by User ID. If a user triggers Security Violations repeatedly, ban the account.

14. Summary: The Whiteboard Strategy

If asked to design LeetCode, draw this 4-Quadrant Layout:

1. Requirements

  • Func: Execute Code, Feedback, Limits.
  • Non-Func: Security (Sandbox), Speed (<2s).
  • Scale: 100 QPS (Burst).

2. Architecture

[Client] -> [API] -> [Redis Queue]

[Worker Group]
[Firecracker VM (Seccomp/NS)]

[DB]

* Async Worker: Decouples execution.
* Firecracker: MicroVM Isolation.

3. Data & API

POST /submit {code, lang}
DB: Submissions(id, user, status, result)
S3: Test Cases (Read-Only)

4. Security Layers

  • Network: Namespace (`--net none`).
  • FS: Read-Only RootFS.
  • Syscalls: Seccomp Whitelist.
  • Kernel: gVisor / MicroVM.

Return to Specialized Systems