What Is a Code Execution Sandbox?
A code execution sandbox is an isolated environment that compiles and runs arbitrary user-submitted code safely. It must enforce strict resource limits, prevent host system access, capture output, and return results quickly. The design appears frequently in online judges, coding interview platforms, and educational tools.
Core Requirements
Functional
- Accept source code and language identifier
- Compile (if needed) and execute the code
- Provide stdin input to the process
- Capture stdout, stderr, and exit code
- Enforce time and memory limits
- Return execution result with status (Accepted, TLE, MLE, Runtime Error, Compile Error)
Non-Functional
- Strong isolation: no network, no filesystem escape, no process spawning
- Low latency: results in under 3 seconds for typical submissions
- High throughput: handle hundreds of concurrent executions
- Language extensibility: add new runtimes without redesign
High-Level Architecture
The system has three main tiers: an API layer that receives submissions, a queue that decouples intake from execution, and a pool of worker nodes that run isolated containers.
Client
|
v
API Service ---> Submission Queue (Redis / SQS)
|
+----------+----------+
| | |
Worker Worker Worker
(Docker (Docker (Docker
/gVisor) /gVisor) /gVisor)
Isolation Strategy
Each submission runs inside a fresh container (Docker + gVisor or Firecracker microVM). The container is created from a pre-warmed language image, given the source file and stdin, then destroyed after execution. Key security controls:
- No network: container launched with
--network none - Read-only filesystem: only a tmpfs scratch directory is writable
- seccomp profile: whitelist of allowed syscalls only
- Non-root user: code runs as an unprivileged uid inside the container
- PID limit: cgroup restricts fork bombs
Resource Limits
Limits are enforced at two levels: cgroup (kernel) and application watchdog.
| Resource | Mechanism | Typical Limit |
|---|---|---|
| CPU time | cgroup cpu.max + SIGKILL watchdog | 5 s |
| Wall clock time | outer timeout goroutine | 10 s |
| Memory | cgroup memory.max | 256 MB |
| Disk writes | tmpfs size limit | 64 MB |
| Output size | pipe read cap | 1 MB |
| Processes | cgroup pids.max | 64 |
Execution Flow
- Client POSTs
{language, source, stdin, time_limit_ms, memory_limit_mb}to the API. - API validates the request, generates a submission ID, and pushes a job onto the queue.
- A worker dequeues the job, pulls (or reuses) the language image, and starts a container.
- Source is written to a tmpfs path; compile step runs if the language requires it.
- On compile error, the worker returns the compiler output immediately.
- The run step executes the binary/interpreter with stdin piped in.
- The watchdog monitors wall-clock time; cgroups enforce CPU and memory.
- After termination, stdout, stderr, exit code, and resource usage are collected.
- The container is removed; the result is written to the result store (Redis / DB).
- The client polls or receives a webhook with the result.
Multi-Language Support
Each language is defined by a small configuration object rather than hard-coded logic:
type LanguageConfig struct {
Image string // Docker image tag
CompileCmd []string // nil for interpreted languages
RunCmd []string // template: {binary}, {source}
Extension string
}
Adding Python 3.12 means adding one config entry and a Docker image — no worker code changes needed.
Result Caching
Identical submissions (same language + source + stdin + limits) can be cached. The cache key is SHA256(language || source || stdin || time_limit || memory_limit). Cache hits skip the queue entirely and return in milliseconds. This is especially effective for popular problem sets where many users submit identical correct solutions.
Worker Autoscaling
Workers run on a Kubernetes Deployment or EC2 Auto Scaling Group. The queue depth metric (jobs waiting / workers active) drives horizontal scaling. Pre-warming containers for common languages reduces cold-start latency: keep a pool of paused containers ready to accept a job.
API Design
POST /execute
{
"language": "python3",
"source": "print(input())",
"stdin": "hello",
"time_limit_ms": 2000,
"memory_limit_mb": 128
}
Response 202:
{ "submission_id": "abc123" }
GET /result/abc123
{
"status": "Accepted",
"stdout": "hello",
"stderr": "",
"exit_code": 0,
"time_ms": 42,
"memory_kb": 8192
}
Failure Modes and Mitigations
| Failure | Mitigation |
|---|---|
| Worker crash mid-execution | Job visibility timeout; requeued automatically |
| Container escape attempt | gVisor kernel intercepts; seccomp blocks unknown syscalls |
| Memory bomb | cgroup OOM killer terminates container; status = MLE |
| Infinite loop | Watchdog kills container after wall-clock limit; status = TLE |
| Malicious file writes | tmpfs size cap + read-only root FS |
Interview Tips
- Clarify whether multi-language is required upfront — it drives the image strategy.
- Distinguish CPU time (billable) from wall-clock time (user experience).
- Mention gVisor or Firecracker to show awareness of the security layer beyond plain Docker.
- Discuss the pre-warming pool trade-off: memory cost vs. latency reduction.
- Cache keying is a common follow-up — explain what makes two submissions equivalent.
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Atlassian Interview Guide