Design LeetCode (Remote Code Execution)

1. What is a Code Execution Service?

A Remote Code Execution (RCE) service allows users to submit code in various languages (Python, C++, Java), executes it against a set of test cases, and returns the results (Pass/Fail, Runtime, Memory Usage).

[!WARNING] The Danger Zone: This is one of the most dangerous systems to build. You are literally inviting strangers to run arbitrary code on your servers. Without proper Sandboxing, a user could rm -rf /, mine crypto, or scan your internal network.

Real-World Examples

LeetCode / HackerRank: Competitive programming platforms.
Judge0: Open-source RCE API.
Replit / CodeSandbox: Online IDEs (stateful environments).

2. Requirements & Goals

Functional Requirements

Code Submission: User submits source code and language.
Execution: System runs code against test cases.
Feedback: Returns Standard Output (STDOUT), Standard Error (STDERR), and Verdict (Accepted, Wrong Answer, TLE).
Limits: Enforce strict Time Limits (e.g., 2s) and Memory Limits (e.g., 256MB).

Non-Functional Requirements

Security (Critical): Zero Trust. Code must run in total isolation.
Performance: Low overhead for container startup. Users expect results in seconds.
Concurrency: Handle thousands of simultaneous submissions.

3. Capacity Estimation

Daily Submissions: 1 Million.
Peak Traffic: 100 submissions/sec (during contests).
Execution Time: Average 2 seconds per task.
Compute Resources: Since tasks are CPU-bound, we need scalable worker nodes.
- 100 tasks/sec $\times$ 2 sec/task = 200 concurrent containers running.
- If each core handles 1 container, we need ~200 vCPUs (e.g., 25-50 large instances).

4. System APIs

Submit Code

POST /v1/submissions
{
  "code": "print('hello')",
  "language": "python3",
  "problem_id": "123"
}
Response: { "submission_id": "sub_abc123" }

Get Status (Polling)

GET /v1/submissions/sub_abc123
Response:
{
  "status": "PROCESSING" // or COMPLETED
  "result": { "verdict": "Accepted", "runtime_ms": 45 }
}

5. Database Design

We need to store submission history and problem data. For more on relational schema design, see Module 04: Database Basics.

1. Submissions Table (Postgres)

2. Test Cases (Object Store / S3)

Test cases are often large files.
Structure: s3://problems/{problem_id}/input/1.txt, s3://problems/{problem_id}/output/1.txt.
Worker nodes download these during execution.

6. High-Level Design

High-Level Architecture: The Secure Execution Pipeline.

System Architecture: LeetCode Execution Engine

Async Job Queue | Multilingual Workers | gVisor & Firecracker Isolation

Submission Path

Result Feedback

Security Sandbox

API Layer

Auth Svc

Submission Svc

Redis

JOB_QUEUE

LPUSH / RPOP

Worker Node (vCPU Heavy)

SECURITY BOUNDARY

Sandbox: gVisor / Firecracker

RootFS
(Read-Only)

Cgroups
vCPU/RAM

Net Namespace
(None)

                     $ ./compiled_binary < input.txt
                 

Syscalls Intercepted by gVisor Sentry

S3 (TestCases)

Mounted Read-Only

PostgreSQL

Submission Results

The system follows an Asynchronous Worker Pattern to handle long-running code execution without blocking the API:

API Gateway / Submissions Svc: Validates the request, saves initial metadata to PostgreSQL, and pushes the job into a Redis Job Queue (See Module 08: Messaging for more).
Job Queue: Decouples the API from execution. This buffers traffic bursts and allows for easy scaling of worker nodes.
Worker Nodes: Independent compute nodes that poll the queue via RPOP.
Sandbox Runtime (The Isolation Zone):
- Ephemeral Sandbox: The worker creates a secure environment (e.g., gVisor or Firecracker).
- Test Case Mounting: Mounts test cases from S3 as Read-Only.
- Execution: Runs the untrusted code while intercepting syscalls.
Result Store: Once execution finishes, the worker updates the submission status in PostgreSQL (e.g., AC, WA, TLE).

[!TIP] Why Async? Code execution takes time (1-10s). Keeping an HTTP connection open is brittle. Polling or WebSockets is better for the client.

7. Deep Dive: Isolation Strategy (The Sandbox)

How do we prevent system("rm -rf /") or kernel exploits? This is handled by the Security Boundary shown in our architecture.

Level 1: Standard Containers (Docker)

Tool: Docker, LXC.
Pros: Fast boot (milliseconds).
Cons: Shared Kernel. All containers share the host OS kernel. If a hacker finds a kernel vulnerability (e.g., Dirty COW), they can escape the container and take over the host.
Verdict: Not secure enough for untrusted public code.

Level 2: Virtual Machines (VMs)

Tool: AWS EC2, VMWare.
Pros: Hardware-level virtualization (Hypervisor). Very secure.
Cons: Slow boot time (minutes). Too heavy for a 2-second script.

Level 3: MicroVMs / User-Space Kernels (The Gold Standard)

This bridges the gap between VM security and Container speed.

gVisor (Google):
- Intercepts syscalls in User Space.
- The “Guest” application talks to gVisor (Sentry), not the Host Kernel.
- Acts as a “security proxy” for syscalls.
Firecracker (AWS Lambda):
- Lightweight KVM-based microVMs.
- Boots in < 125ms.
- Used by AWS Lambda and Fargate.

8. Defense in Depth (Specific Mitigations)

Even with MicroVMs, apply these Linux primitives:

A. Preventing “Fork Bombs” (cgroups)

A Fork Bomb (while(1) fork()) crashes a server by exhausting the Process ID (PID) table.

Solution: Control Groups (cgroups).
Configure pids.max = 64. If the code tries to spawn the 65th process, the kernel blocks it.
Also limit CPU shares and Memory (OOM Killer).

B. Preventing Network Scans (Namespaces)

Users shouldn’t scan your internal AWS VPC.

Solution: Network Namespaces.
Run the container with No Network access (--network none).
Only map STDIN/STDOUT.

C. Preventing File System Damage (Seccomp)

Users shouldn’t read /etc/passwd.

Solution:
1. Mount Root FS as Read-Only.
2. Use Seccomp (Secure Computing Mode) to whitelist only necessary syscalls (read, write, exit). Block socket, execve (except strictly controlled paths).

9. Data Partitioning & Sharding

We generate millions of submissions. A single DB won’t hold up.

Sharding Strategy: Shard by `submission_id`

Shard by user_id: Good for “Show me all my submissions”. Bad for global analytics or if one user spams.
Shard by submission_id: Even distribution. But “Show me my submissions” requires Scatter-Gather.
Decision: LeetCode is Write-Heavy during contests. We likely prioritize Write Throughput, so Sharding by submission_id (or using a dedicated high-write store like Cassandra/DynamoDB) is preferred. For user history, we can maintain a secondary index.

10. Interactive Decision Visualizer: The Secure Pipeline

This demo visualizes how different layers of defense block different types of attacks. Select an Attack Vector and see which layer catches it.

[!TIP] Try it yourself: Click “Fork Bomb” or “File Deletion” to see how Linux cgroups and Seccomp filters block these attacks in real-time.

Attack Simulator

                print("Hello World")
            

Job Queue

Pending

cgroups (pids.max)

Checks CPU/RAM/PIDs

Network Namespace

Checks Connectivity

Seccomp Filter

Checks Syscalls (rm, exec)

Verdict

11. System Walkthrough: The Life of a Submission

Let’s trace a user submitting Python code that tries to access the network.

Step 1: Submission

User sends code via POST /submissions.

{
  "code": "import socket; s = socket.socket(); s.connect(('google.com', 80))",
  "lang": "python3"
}

API Gateway generates submission_id: "abc-123" and pushes to Redis:

RPUSH submission_queue "{\"id\":\"abc-123\", \"lang\":\"python3\", ...}"

Step 2: Worker Processing

Worker Node (Golang) pulls the job: BLPOP submission_queue 0.

It launches a Firecracker MicroVM with restricted arguments:

# Conceptual command
firecracker-run \
  --kernel vmlinux \
  --rootfs python3-rootfs.ext4 \
  --network none \  # <--- Network Isolation
  --cpu-template T2 \
  --memory 128M

Step 3: Execution & Interception

The Python code runs inside the MicroVM.
It tries to call the connect() syscall.
The Kernel (inside MicroVM) checks the network namespace. It sees no network interfaces (only loopback).
The syscall fails with ENETUNREACH (Network is unreachable).

Step 4: Result Collection

The worker captures STDERR: OSError: [Errno 101] Network is unreachable.

It writes the result to Postgres:

UPDATE submissions SET status='RUNTIME_ERROR', stderr='Network unreachable...' WHERE id='abc-123';

12. Requirements Traceability Matrix

Requirement	Architectural Solution
Code Isolation	gVisor / Firecracker (MicroVMs) prevent kernel sharing.
Resource Limits	cgroups enforce CPU, Memory, and PID limits.
Network Security	Network Namespaces (`--network none`) block internet access.
File System Security	Read-Only RootFS + Seccomp whitelist prevents `rm -rf`.
Scalability	Redis Job Queue decouples API from Workers. Auto-scaling workers.
Concurrency	Firecracker boots in <125ms, allowing high density (thousands per node).

13. Follow-Up Questions: The Interview Gauntlet

I. Security & Isolation

Why is Docker not enough? Docker shares the Host Kernel. A kernel exploit (e.g., Dirty Pipe) allows root access to the host.
Explain Seccomp. It stands for Secure Computing. It’s a BPF filter that whitelists syscalls. If a process calls socket() and it’s not whitelisted, the kernel kills the process.
How to prevent Infinite Loops? Use setrlimit(RLIMIT_CPU) in the runner code + a hard timeout (SIGKILL) from the worker supervisor after 2 seconds.
How to prevent memory exhaustion? cgroups memory limit. The OOM Killer will kill the specific container, not the worker node.

II. Scalability

What if the Queue backs up? Auto-scale the Worker Group based on LLEN(submission_queue). If queue > 1000, add 10 nodes.
Handling Large Outputs: If a user prints 1GB of text, it fills the disk. Limit STDOUT capture to 100KB. Truncate the rest.
Shard Strategy: Shard DB by submission_id. No need for complex cross-shard joins.

III. Operational Excellence

How to update the runtime (e.g., Python 3.9 -> 3.10)? Build a new RootFS image. Rolling update the workers to use the new image.
Malicious Users: Rate limit by User ID. If a user triggers Security Violations repeatedly, ban the account.

14. Summary: The Whiteboard Strategy

If asked to design LeetCode, draw this 4-Quadrant Layout:

1. Requirements

Func: Execute Code, Feedback, Limits.
Non-Func: Security (Sandbox), Speed (<2s).
Scale: 100 QPS (Burst).

2. Architecture

[Client] -> [API] -> [Redis Queue]
↓
[Worker Group]
[Firecracker VM (Seccomp/NS)]
↓
[DB]

* Async Worker: Decouples execution.
* Firecracker: MicroVM Isolation.

3. Data & API

            POST /submit {code, lang}

            DB: Submissions(id, user, status, result)

            S3: Test Cases (Read-Only)

4. Security Layers

Network: Namespace (`--net none`).
FS: Read-Only RootFS.
Syscalls: Seccomp Whitelist.
Kernel: gVisor / MicroVM.

Return to Specialized Systems