Low Level Design: Code Execution Sandbox

What Is a Code Execution Sandbox?

A code execution sandbox is an isolated environment that compiles and runs arbitrary user-submitted code safely. It must enforce strict resource limits, prevent host system access, capture output, and return results quickly. The design appears frequently in online judges, coding interview platforms, and educational tools.

Core Requirements

Functional

  • Accept source code and language identifier
  • Compile (if needed) and execute the code
  • Provide stdin input to the process
  • Capture stdout, stderr, and exit code
  • Enforce time and memory limits
  • Return execution result with status (Accepted, TLE, MLE, Runtime Error, Compile Error)

Non-Functional

  • Strong isolation: no network, no filesystem escape, no process spawning
  • Low latency: results in under 3 seconds for typical submissions
  • High throughput: handle hundreds of concurrent executions
  • Language extensibility: add new runtimes without redesign

High-Level Architecture

The system has three main tiers: an API layer that receives submissions, a queue that decouples intake from execution, and a pool of worker nodes that run isolated containers.

Client
  |
  v
API Service  --->  Submission Queue (Redis / SQS)
                         |
              +----------+----------+
              |          |          |
           Worker     Worker     Worker
           (Docker    (Docker    (Docker
           /gVisor)   /gVisor)   /gVisor)

Isolation Strategy

Each submission runs inside a fresh container (Docker + gVisor or Firecracker microVM). The container is created from a pre-warmed language image, given the source file and stdin, then destroyed after execution. Key security controls:

  • No network: container launched with --network none
  • Read-only filesystem: only a tmpfs scratch directory is writable
  • seccomp profile: whitelist of allowed syscalls only
  • Non-root user: code runs as an unprivileged uid inside the container
  • PID limit: cgroup restricts fork bombs

Resource Limits

Limits are enforced at two levels: cgroup (kernel) and application watchdog.

ResourceMechanismTypical Limit
CPU timecgroup cpu.max + SIGKILL watchdog5 s
Wall clock timeouter timeout goroutine10 s
Memorycgroup memory.max256 MB
Disk writestmpfs size limit64 MB
Output sizepipe read cap1 MB
Processescgroup pids.max64

Execution Flow

  1. Client POSTs {language, source, stdin, time_limit_ms, memory_limit_mb} to the API.
  2. API validates the request, generates a submission ID, and pushes a job onto the queue.
  3. A worker dequeues the job, pulls (or reuses) the language image, and starts a container.
  4. Source is written to a tmpfs path; compile step runs if the language requires it.
  5. On compile error, the worker returns the compiler output immediately.
  6. The run step executes the binary/interpreter with stdin piped in.
  7. The watchdog monitors wall-clock time; cgroups enforce CPU and memory.
  8. After termination, stdout, stderr, exit code, and resource usage are collected.
  9. The container is removed; the result is written to the result store (Redis / DB).
  10. The client polls or receives a webhook with the result.

Multi-Language Support

Each language is defined by a small configuration object rather than hard-coded logic:

type LanguageConfig struct {
    Image       string   // Docker image tag
    CompileCmd  []string // nil for interpreted languages
    RunCmd      []string // template: {binary}, {source}
    Extension   string
}

Adding Python 3.12 means adding one config entry and a Docker image — no worker code changes needed.

Result Caching

Identical submissions (same language + source + stdin + limits) can be cached. The cache key is SHA256(language || source || stdin || time_limit || memory_limit). Cache hits skip the queue entirely and return in milliseconds. This is especially effective for popular problem sets where many users submit identical correct solutions.

Worker Autoscaling

Workers run on a Kubernetes Deployment or EC2 Auto Scaling Group. The queue depth metric (jobs waiting / workers active) drives horizontal scaling. Pre-warming containers for common languages reduces cold-start latency: keep a pool of paused containers ready to accept a job.

API Design

POST /execute
{
  "language": "python3",
  "source": "print(input())",
  "stdin": "hello",
  "time_limit_ms": 2000,
  "memory_limit_mb": 128
}

Response 202:
{ "submission_id": "abc123" }

GET /result/abc123
{
  "status": "Accepted",
  "stdout": "hello",
  "stderr": "",
  "exit_code": 0,
  "time_ms": 42,
  "memory_kb": 8192
}

Failure Modes and Mitigations

FailureMitigation
Worker crash mid-executionJob visibility timeout; requeued automatically
Container escape attemptgVisor kernel intercepts; seccomp blocks unknown syscalls
Memory bombcgroup OOM killer terminates container; status = MLE
Infinite loopWatchdog kills container after wall-clock limit; status = TLE
Malicious file writestmpfs size cap + read-only root FS

Interview Tips

  • Clarify whether multi-language is required upfront — it drives the image strategy.
  • Distinguish CPU time (billable) from wall-clock time (user experience).
  • Mention gVisor or Firecracker to show awareness of the security layer beyond plain Docker.
  • Discuss the pre-warming pool trade-off: memory cost vs. latency reduction.
  • Cache keying is a common follow-up — explain what makes two submissions equivalent.
{ “@context”: “https://schema.org”, “@type”: “FAQPage”, “mainEntity”: [ { “@type”: “Question”, “name”: “What is a code execution sandbox?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “A code execution sandbox is an isolated environment that runs untrusted user-submitted code safely. It confines the process so it cannot access the host filesystem, network, or other processes, ensuring that malicious or buggy code cannot escape its container and harm the surrounding infrastructure.” } }, { “@type”: “Question”, “name”: “How does Docker or gVisor isolation work in a sandboxed execution service?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Docker uses Linux namespaces and cgroups to isolate processes, but system calls still reach the host kernel. gVisor adds a user-space kernel (the Sentry) that intercepts system calls before they hit the host, drastically reducing the attack surface. In a code execution service, gVisor-backed containers provide stronger isolation because a kernel exploit in the guest cannot directly affect the host.” } }, { “@type”: “Question”, “name”: “How do you prevent resource exhaustion in sandboxed code execution?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Resource exhaustion is controlled through cgroup limits on CPU, memory, and I/O; hard wall-clock and CPU-time timeouts enforced by the orchestrator; file-size and open-file-descriptor limits via ulimit; and network egress blocking. If a submission exceeds any limit, the sandbox kills the process and reports a Time Limit Exceeded or Memory Limit Exceeded verdict rather than letting it starve shared capacity.” } }, { “@type”: “Question”, “name”: “What are the security layers in a code execution service?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “A production code execution service typically stacks multiple layers: network isolation (no outbound internet from the sandbox), filesystem restrictions (read-only root with a small writable tmpfs), seccomp-bpf syscall filtering to block dangerous calls, user-namespace isolation so the process runs as an unprivileged UID, a gVisor or similar user-space kernel, and a watchdog process outside the container that enforces time and memory limits and cleans up zombie containers.” } } ] }

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Atlassian Interview Guide

Scroll to Top