Question 1

How do you sandbox untrusted code execution safely?

Accepted Answer

Multiple isolation layers are required because any single layer can be bypassed. Layer 1 -- Container: run code inside a Docker container with no network access (--network none), read-only root filesystem, and a tmpfs /tmp. Limits CPU and memory via cgroups. Layer 2 -- Seccomp: whitelist only the system calls needed for computation: read, write, mmap, exit, brk, futex. Block fork (prevents fork bombs), socket (prevents network), execve (prevents spawning new processes), open with write flags (prevents file modification). Layer 3 -- Namespaces: separate PID namespace (process cannot see host processes), separate mount namespace (cannot access host filesystem). Layer 4 -- User namespace: run as an unprivileged user inside the container (uid 1000) even if the container process appears as root inside the namespace. Each layer independently prevents different attack classes.

Question 2

How do you implement time and memory limits for code execution?

Accepted Answer

Time limit: use cgroups CPU quota (cpu.cfs_quota_us) to limit CPU time. For a 2-second limit: set quota to 2 CPU-seconds. The process is killed when quota is exceeded. Alternatively: set a SIGALRM timer in the runner process (signal after 2 real seconds). Wall clock limit is simpler but unfair if the machine is under load. CPU time limit is more precise for fairness. Memory limit: cgroups memory.limit_in_bytes. Set to the problem's memory limit (e.g., 256MB). When exceeded: the process receives SIGKILL (OOM killer). The runner detects the OOM exit code and reports Memory Limit Exceeded. Stack size: setrlimit RLIMIT_STACK (typically 8MB default, can be reduced). Combined: set both CPU and wall clock limits (whichever fires first) to handle I/O-heavy programs that wait without consuming CPU but exceed the real-time expectation.

Question 3

How do you handle test case management and prevent cheating?

Accepted Answer

Test cases are the platform's intellectual property. Storage: encrypted in S3 (AES-256). Workers decrypt using a key stored in AWS Secrets Manager -- only judge workers have access, not the API servers. Test case transmission: test cases are never sent to clients. Even if a client intercepts network traffic, they see encrypted data. Output checking: instead of comparing raw output strings, use a custom checker for problems with multiple valid outputs (e.g., shortest path problems where multiple paths have the same length). The checker is a program that takes (input, expected_output, user_output) and returns OK or WRONG. Anti-hack: do not include test cases in error messages. On Wrong Answer, show the first failing test case input (not the expected output) to help debugging without revealing answers. For premium users: show the expected output as a benefit.

Question 4

How does an online judge handle concurrent submissions during a contest?

Accepted Answer

A contest with 10,000 participants submitting simultaneously creates significant load spikes. Architecture: submit to Kafka (durable, handles the burst). Judge workers are auto-scaled based on queue depth (CloudWatch metric: ApproximateNumberOfMessages). Each worker processes one submission at a time (CPU-bound -- no value in concurrency per worker). Target: process backlog within 30 seconds for a good user experience. Separate queues by priority: contest submissions (high priority), regular practice (lower priority). Separate queues by language: Python queue, C++ queue, Java queue. This prevents a flood of slow Python submissions from delaying fast C++ submissions. Worker sizing: a C++ submission takes ~2 seconds. A Python submission takes ~5 seconds. Plan worker count accordingly. Pre-warm workers before a contest: scale up 30 minutes early.

Question 5

How do you compute verdicts correctly for edge cases?

Accepted Answer

Verdict accuracy: (1) Time Limit Exceeded vs Accepted: measure CPU time, not wall clock. A slow machine should not give TLE to a correct solution. Use getrusage() or /proc/self/stat for CPU time measurement. (2) Memory Limit Exceeded: read peak RSS (Resident Set Size) from /proc/self/status or cgroups memory.max_usage_in_bytes after execution. (3) Runtime Error: catch all non-zero exit codes and signal terminations (SIGSEGV, SIGABRT). Map to Runtime Error. (4) Wrong Answer: compare output exactly (or via checker). Strip trailing whitespace and newlines before comparison -- many correct solutions output trailing newlines. (5) Output Limit Exceeded: cap output at 4MB. Kill the process if output exceeds the cap. This prevents a program from writing a terabyte to stdout. (6) Compilation Error: compile with a timeout (30 seconds). Capture and return the compiler error message to the user.

System Design: Online Judge — Code Execution, Sandboxing, Test Cases, and Scalable Evaluation

What Is an Online Judge?

Code Execution and Sandboxing

Execution Pipeline

Language Support

Scalability

Result Delivery

Interview Tips