Low Level Design: Code Execution Sandbox

What Is a Code Execution Sandbox?

A code execution sandbox is an isolated environment that compiles and runs arbitrary user-submitted code safely. It must enforce strict resource limits, prevent host system access, capture output, and return results quickly. The design appears frequently in online judges, coding interview platforms, and educational tools.

Core Requirements

Functional

Accept source code and language identifier
Compile (if needed) and execute the code
Provide stdin input to the process
Capture stdout, stderr, and exit code
Enforce time and memory limits
Return execution result with status (Accepted, TLE, MLE, Runtime Error, Compile Error)

Non-Functional

Strong isolation: no network, no filesystem escape, no process spawning
Low latency: results in under 3 seconds for typical submissions
High throughput: handle hundreds of concurrent executions
Language extensibility: add new runtimes without redesign

High-Level Architecture

The system has three main tiers: an API layer that receives submissions, a queue that decouples intake from execution, and a pool of worker nodes that run isolated containers.

Client
  |
  v
API Service  --->  Submission Queue (Redis / SQS)
                         |
              +----------+----------+
              |          |          |
           Worker     Worker     Worker
           (Docker    (Docker    (Docker
           /gVisor)   /gVisor)   /gVisor)

Isolation Strategy

Each submission runs inside a fresh container (Docker + gVisor or Firecracker microVM). The container is created from a pre-warmed language image, given the source file and stdin, then destroyed after execution. Key security controls:

No network: container launched with --network none
Read-only filesystem: only a tmpfs scratch directory is writable
seccomp profile: whitelist of allowed syscalls only
Non-root user: code runs as an unprivileged uid inside the container
PID limit: cgroup restricts fork bombs

Resource Limits

Limits are enforced at two levels: cgroup (kernel) and application watchdog.

Resource	Mechanism	Typical Limit
CPU time	cgroup cpu.max + SIGKILL watchdog	5 s
Wall clock time	outer timeout goroutine	10 s
Memory	cgroup memory.max	256 MB
Disk writes	tmpfs size limit	64 MB
Output size	pipe read cap	1 MB
Processes	cgroup pids.max	64

Execution Flow

Client POSTs {language, source, stdin, time_limit_ms, memory_limit_mb} to the API.
API validates the request, generates a submission ID, and pushes a job onto the queue.
A worker dequeues the job, pulls (or reuses) the language image, and starts a container.
Source is written to a tmpfs path; compile step runs if the language requires it.
On compile error, the worker returns the compiler output immediately.
The run step executes the binary/interpreter with stdin piped in.
The watchdog monitors wall-clock time; cgroups enforce CPU and memory.
After termination, stdout, stderr, exit code, and resource usage are collected.
The container is removed; the result is written to the result store (Redis / DB).
The client polls or receives a webhook with the result.

Multi-Language Support

Each language is defined by a small configuration object rather than hard-coded logic:

type LanguageConfig struct {
    Image       string   // Docker image tag
    CompileCmd  []string // nil for interpreted languages
    RunCmd      []string // template: {binary}, {source}
    Extension   string
}

Adding Python 3.12 means adding one config entry and a Docker image — no worker code changes needed.

Result Caching

Identical submissions (same language + source + stdin + limits) can be cached. The cache key is SHA256(language || source || stdin || time_limit || memory_limit). Cache hits skip the queue entirely and return in milliseconds. This is especially effective for popular problem sets where many users submit identical correct solutions.

Worker Autoscaling

Workers run on a Kubernetes Deployment or EC2 Auto Scaling Group. The queue depth metric (jobs waiting / workers active) drives horizontal scaling. Pre-warming containers for common languages reduces cold-start latency: keep a pool of paused containers ready to accept a job.

API Design

POST /execute
{
  "language": "python3",
  "source": "print(input())",
  "stdin": "hello",
  "time_limit_ms": 2000,
  "memory_limit_mb": 128
}

Response 202:
{ "submission_id": "abc123" }

GET /result/abc123
{
  "status": "Accepted",
  "stdout": "hello",
  "stderr": "",
  "exit_code": 0,
  "time_ms": 42,
  "memory_kb": 8192
}

Failure Modes and Mitigations

Failure	Mitigation
Worker crash mid-execution	Job visibility timeout; requeued automatically
Container escape attempt	gVisor kernel intercepts; seccomp blocks unknown syscalls
Memory bomb	cgroup OOM killer terminates container; status = MLE
Infinite loop	Watchdog kills container after wall-clock limit; status = TLE
Malicious file writes	tmpfs size cap + read-only root FS

Interview Tips

Clarify whether multi-language is required upfront — it drives the image strategy.
Distinguish CPU time (billable) from wall-clock time (user experience).
Mention gVisor or Firecracker to show awareness of the security layer beyond plain Docker.
Discuss the pre-warming pool trade-off: memory cost vs. latency reduction.
Cache keying is a common follow-up — explain what makes two submissions equivalent.

{ “@context”: “https://schema.org”, “@type”: “FAQPage”, “mainEntity”: [ { “@type”: “Question”, “name”: “What is a code execution sandbox?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “A code execution sandbox is an isolated environment that runs untrusted user-submitted code safely. It confines the process so it cannot access the host filesystem, network, or other processes, ensuring that malicious or buggy code cannot escape its container and harm the surrounding infrastructure.” } }, { “@type”: “Question”, “name”: “How does Docker or gVisor isolation work in a sandboxed execution service?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Docker uses Linux namespaces and cgroups to isolate processes, but system calls still reach the host kernel. gVisor adds a user-space kernel (the Sentry) that intercepts system calls before they hit the host, drastically reducing the attack surface. In a code execution service, gVisor-backed containers provide stronger isolation because a kernel exploit in the guest cannot directly affect the host.” } }, { “@type”: “Question”, “name”: “How do you prevent resource exhaustion in sandboxed code execution?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Resource exhaustion is controlled through cgroup limits on CPU, memory, and I/O; hard wall-clock and CPU-time timeouts enforced by the orchestrator; file-size and open-file-descriptor limits via ulimit; and network egress blocking. If a submission exceeds any limit, the sandbox kills the process and reports a Time Limit Exceeded or Memory Limit Exceeded verdict rather than letting it starve shared capacity.” } }, { “@type”: “Question”, “name”: “What are the security layers in a code execution service?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “A production code execution service typically stacks multiple layers: network isolation (no outbound internet from the sandbox), filesystem restrictions (read-only root with a small writable tmpfs), seccomp-bpf syscall filtering to block dangerous calls, user-namespace isolation so the process runs as an unprivileged UID, a gVisor or similar user-space kernel, and a watchdog process outside the container that enforces time and memory limits and cleans up zombie containers.” } } ] }