Design LeetCode (Remote Code Execution)

[!IMPORTANT] In this lesson, you will master:

1. What is a Code Execution Service?: Building intuition behind 1. what is a code execution service?.

2. Requirements & Goals: Building intuition behind 2. requirements & goals.

3. Capacity Estimation: Building intuition behind 3. capacity estimation.

1. What is a Code Execution Service?

A Remote Code Execution (RCE) service allows users to submit code in various languages (Python, C++, Java), executes it against a set of test cases, and returns the results (Pass/Fail, Runtime, Memory Usage).

[!WARNING] The Danger Zone: This is one of the most dangerous systems to build. You are literally inviting strangers to run arbitrary code on your servers. Without proper Sandboxing, a user could rm -rf /, mine crypto, or scan your internal network.

Real-World Examples

LeetCode / HackerRank: Competitive programming platforms.
Judge0: Open-source RCE API.
Replit / CodeSandbox: Online IDEs (stateful environments).

2. Requirements & Goals

Functional Requirements

Code Submission: User submits source code and language.
Execution: System runs code against test cases.
Feedback: Returns Standard Output (STDOUT), Standard Error (STDERR), and Verdict (Accepted, Wrong Answer, TLE).
Limits: Enforce strict Time Limits (e.g., 2s) and Memory Limits (e.g., 256MB).

Non-Functional Requirements

Security (Critical): Zero Trust. Code must run in total isolation.
Performance: Low overhead for container startup. Users expect results in seconds.
Concurrency: Handle thousands of simultaneous submissions.

3. Capacity Estimation

Daily Submissions: 1 Million.
Peak Traffic: 100 submissions/sec (during contests).
Execution Time: Average 2 seconds per task.
Compute Resources: Since tasks are CPU-bound, we need scalable worker nodes.
100 tasks/sec × 2 sec/task = 200 concurrent containers running.
If each core handles 1 container, we need ~200 vCPUs (e.g., 25-50 large instances).

4. System APIs

Submit Code

POST /v1/submissions
{
  "code": "print('hello')",
  "language": "python3",
  "problem_id": "123"
}
Response: { "submission_id": "sub_abc123" }

Get Status (Polling)

GET /v1/submissions/sub_abc123
Response:
{
  "status": "PROCESSING" // or COMPLETED
  "result": { "verdict": "Accepted", "runtime_ms": 45 }
}

5. Database Design

We need to store submission history and problem data. For more on relational schema design, see Module 04: Database Basics.

1. Submissions Table (Postgres)

Column	Type	Description
`id`	UUID	Primary Key
`user_id`	UUID	The submitter
`problem_id`	UUID	The problem solved
`code`	TEXT	The source code (Stored in S3 if large)
`status`	ENUM	PENDING, PROCESSING, AC, WA, TLE, RE
`runtime`	INT	Execution time in ms
`memory`	INT	Memory usage in KB

2. Test Cases (Object Store / S3)

Test cases are often large files.
Structure: s3://problems/{problem_id}/input/1.txt, s3://problems/{problem_id}/output/1.txt.
Worker nodes download these during execution.

6. High-Level Design

High-Level Architecture: The Secure Execution Pipeline.

System Architecture: LeetCode Execution Engine

Async Job Queue | Multilingual Workers | gVisor & Firecracker Isolation

Submission Path

Result Feedback

Security Sandbox

API Layer

Auth Svc

Submission Svc

Redis

JOB_QUEUE

LPUSH / RPOP

Worker Node (vCPU Heavy)

SECURITY BOUNDARY

Sandbox: gVisor / Firecracker

RootFS (Read-Only)

Cgroups vCPU/RAM

Net Namespace (None)

                   $ ./compiled_binary < input.txt
               

Syscalls Intercepted by gVisor Sentry

S3 (TestCases)

Mounted Read-Only

PostgreSQL

Submission Results

The system follows an Asynchronous Worker Pattern to handle long-running code execution without blocking the API:

API Gateway / Submissions Svc: Validates the request, saves initial metadata to PostgreSQL, and pushes the job into a Redis Job Queue (See Module 08: Messaging for more).
Job Queue: Decouples the API from execution. This buffers traffic bursts and allows for easy scaling of worker nodes.
Worker Nodes: Independent compute nodes that poll the queue via RPOP.
Sandbox Runtime (The Isolation Zone):
- Ephemeral Sandbox: The worker creates a secure environment (e.g., gVisor or Firecracker).
- Test Case Mounting: Mounts test cases from S3 as Read-Only.
- Execution: Runs the untrusted code while intercepting syscalls.
Result Store: Once execution finishes, the worker updates the submission status in PostgreSQL (e.g., AC, WA, TLE).

[!TIP] Why Async? Code execution takes time (1-10s). Keeping an HTTP connection open is brittle. Polling or WebSockets is better for the client.

7. Deep Dive: Isolation Strategy (The Sandbox)

How do we prevent system("rm -rf /") or kernel exploits? This is handled by the Security Boundary shown in our architecture.

Level 1: Standard Containers (Docker)

Tool: Docker, LXC.
Pros: Fast boot (milliseconds).
Cons: Shared Kernel. All containers share the host OS kernel. If a hacker finds a kernel vulnerability (e.g., Dirty COW), they can escape the container and take over the host.
Verdict: Not secure enough for untrusted public code.

Level 2: Virtual Machines (VMs)

Tool: AWS EC2, VMWare.
Pros: Hardware-level virtualization (Hypervisor). Very secure.
Cons: Slow boot time (minutes). Too heavy for a 2-second script.

Level 3: MicroVMs / User-Space Kernels (The Gold Standard)

This bridges the gap between VM security and Container speed.

gVisor (Google):
- Intercepts syscalls in User Space.
- The “Guest” application talks to gVisor (Sentry), not the Host Kernel.
- Acts as a “security proxy” for syscalls.
Firecracker (AWS Lambda):
- Lightweight KVM-based microVMs.
- Boots in < 125ms.
- Used by AWS Lambda and Fargate.

8. Defense in Depth (Specific Mitigations)

Even with MicroVMs, apply these Linux primitives:

A. Preventing “Fork Bombs” (cgroups)

[!NOTE] War Story: The “Infinite Loop” Bill In the early days of a popular competitive programming platform, a user accidentally submitted a C++ program with an infinite loop that allocated memory on each iteration. Because there were no strict memory cgroups configured on the worker node, the process eventually triggered the Linux OOM Killer, which indiscriminately killed the main task supervisor instead of the user’s process. The node stayed up but stopped reporting status, leaving the infinite loop to burn CPU for hours until an engineer manually intervened. This incident led to the immediate adoption of strict pids.max and memory cgroup limits.

A Fork Bomb (while(1) fork()) crashes a server by exhausting the Process ID (PID) table.

Solution: Control Groups (cgroups).
Configure pids.max = 64. If the code tries to spawn the 65th process, the kernel blocks it.
Also limit CPU shares and Memory (OOM Killer).

B. Preventing Network Scans (Namespaces)

Users shouldn’t scan your internal AWS VPC.

Solution: Network Namespaces.
Run the container with No Network access (--network none).
Only map STDIN/STDOUT.

C. Preventing File System Damage (Seccomp)

Users shouldn’t read /etc/passwd.

Solution:
1. Mount Root FS as Read-Only.
2. Use Seccomp (Secure Computing Mode) to whitelist only necessary syscalls (read, write, exit). Block socket, execve (except strictly controlled paths).

9. Data Partitioning & Sharding

We generate millions of submissions. A single DB won’t hold up.

Sharding Strategy: Shard by `submission_id`

Shard by user_id: Good for “Show me all my submissions”. Bad for global analytics or if one user spams.
Shard by submission_id: Even distribution. But “Show me my submissions” requires Scatter-Gather.
Decision: LeetCode is Write-Heavy during contests. We likely prioritize Write Throughput, so Sharding by submission_id (or using a dedicated high-write store like Cassandra/DynamoDB) is preferred. For user history, we can maintain a secondary index.

10. Interactive Decision Visualizer: The Secure Pipeline

This demo visualizes how different layers of defense block different types of attacks. Select an Attack Vector and see which layer catches it.

[!TIP] Try it yourself: Click “Fork Bomb” or “File Deletion” to see how Linux cgroups and Seccomp filters block these attacks in real-time.

Attack Simulator

        print("Hello World")
      

Job Queue

Pending

cgroups (pids.max)

Checks CPU/RAM/PIDs

Network Namespace

Checks Connectivity

Seccomp Filter

Checks Syscalls (rm, exec)

Verdict

11. System Walkthrough: The Life of a Submission

Let’s trace a user submitting Python code that tries to access the network.

Step 1: Submission

User sends code via POST /submissions.

  {
"code": "import socket; s = socket.socket(); s.connect(('google.com', 80))",
"lang": "python3"
  }

API Gateway generates submission_id: "abc-123" and pushes to Redis:

  RPUSH submission_queue "{\"id\":\"abc-123\", \"lang\":\"python3\", ...}"

Step 2: Worker Processing

Worker Node (Golang) pulls the job: BLPOP submission_queue 0.

It launches a Firecracker MicroVM with restricted arguments:

  # Conceptual command
  firecracker-run \
--kernel vmlinux \
--rootfs python3-rootfs.ext4 \
--network none \  # <--- Network Isolation
--cpu-template T2 \
--memory 128M

Step 3: Execution & Interception

The Python code runs inside the MicroVM.
It tries to call the connect() syscall.
The Kernel (inside MicroVM) checks the network namespace. It sees no network interfaces (only loopback).
The syscall fails with ENETUNREACH (Network is unreachable).

Step 4: Result Collection

The worker captures STDERR: OSError: [Errno 101] Network is unreachable.

It writes the result to Postgres:

  UPDATE submissions SET status='RUNTIME_ERROR', stderr='Network unreachable...' WHERE id='abc-123';

12. Requirements Traceability Matrix

Requirement	Architectural Solution
Code Isolation	gVisor / Firecracker (MicroVMs) prevent kernel sharing.
Resource Limits	cgroups enforce CPU, Memory, and PID limits.
Network Security	Network Namespaces (`--network none`) block internet access.
File System Security	Read-Only RootFS + Seccomp whitelist prevents `rm -rf`.
Scalability	Redis Job Queue decouples API from Workers. Auto-scaling workers.
Concurrency	Firecracker boots in <125ms, allowing high density (thousands per node).

13. Follow-Up Questions: The Interview Gauntlet

I. Security & Isolation

Why is Docker not enough? Docker shares the Host Kernel. A kernel exploit (e.g., Dirty Pipe) allows root access to the host.
Explain Seccomp. It stands for Secure Computing. It’s a BPF filter that whitelists syscalls. If a process calls socket() and it’s not whitelisted, the kernel kills the process.
How to prevent Infinite Loops? Use setrlimit(RLIMIT_CPU) in the runner code + a hard timeout (SIGKILL) from the worker supervisor after 2 seconds.
How to prevent memory exhaustion? cgroups memory limit. The OOM Killer will kill the specific container, not the worker node.

II. Scalability

What if the Queue backs up? Auto-scale the Worker Group based on LLEN(submission_queue). If queue > 1000, add 10 nodes.
Handling Large Outputs: If a user prints 1GB of text, it fills the disk. Limit STDOUT capture to 100KB. Truncate the rest.
Shard Strategy: Shard DB by submission_id. No need for complex cross-shard joins.

III. Operational Excellence

How to update the runtime (e.g., Python 3.9 → 3.10)? Build a new RootFS image. Rolling update the workers to use the new image.
Malicious Users: Rate limit by User ID. If a user triggers Security Violations repeatedly, ban the account.

14. Summary: The Whiteboard Strategy

If asked to design LeetCode, draw this 4-Quadrant Layout:

1. Requirements

Func: Execute Code, Feedback, Limits.
Non-Func: Security (Sandbox), Speed (<2s).
Scale: 100 QPS (Burst).

2. Architecture

[Client] → [API] → [Redis Queue]
↓
[Worker Group]
[Firecracker VM (Seccomp/NS)]
↓
[DB]

* Async Worker: Decouples execution.
* Firecracker: MicroVM Isolation.

3. Data & API

      POST /submit {code, lang}

      DB: Submissions(id, user, status, result)

      S3: Test Cases (Read-Only)

4. Security Layers

Network: Namespace (`--net none`).
FS: Read-Only RootFS.
Syscalls: Seccomp Whitelist.
Kernel: gVisor / MicroVM.

Return to Specialized Systems

Design LeetCode (Remote Code Execution)

Design LeetCode (Remote Code Execution)

1. What is a Code Execution Service?

Real-World Examples

2. Requirements & Goals

Functional Requirements

Non-Functional Requirements

3. Capacity Estimation

4. System APIs

Submit Code

Get Status (Polling)

5. Database Design

1. Submissions Table (Postgres)

2. Test Cases (Object Store / S3)

6. High-Level Design

7. Deep Dive: Isolation Strategy (The Sandbox)

Level 1: Standard Containers (Docker)

Level 2: Virtual Machines (VMs)

Level 3: MicroVMs / User-Space Kernels (The Gold Standard)

8. Defense in Depth (Specific Mitigations)

A. Preventing “Fork Bombs” (cgroups)

B. Preventing Network Scans (Namespaces)

C. Preventing File System Damage (Seccomp)

9. Data Partitioning & Sharding

Sharding Strategy: Shard by submission_id

10. Interactive Decision Visualizer: The Secure Pipeline

11. System Walkthrough: The Life of a Submission

Step 1: Submission

Step 2: Worker Processing

Step 3: Execution & Interception

Step 4: Result Collection

12. Requirements Traceability Matrix

13. Follow-Up Questions: The Interview Gauntlet

I. Security & Isolation

II. Scalability

III. Operational Excellence

14. Summary: The Whiteboard Strategy

1. Requirements

2. Architecture

3. Data & API

4. Security Layers

Found this lesson helpful?

Sharding Strategy: Shard by `submission_id`