Modal Interview Guide 2026: Serverless GPU Compute, Custom Scheduler, and Python-First Infrastructure

⏱ 8 min read

Modal

Modal Interview Process: Complete 2026 Guide

Overview

Modal is the serverless compute platform purpose-built for AI and data-intensive Python workloads — GPU inference, distributed training, batch data processing, LLM fine-tuning, and increasingly agent-based pipelines. Founded 2021 by Erik Bernhardsson (ex-Spotify, creator of Luigi and Annoy) and Akshat Bubna, private with Series B valuation in 2024 and continued funding through 2025. ~100 employees in 2026 — distinctively small relative to product sophistication. Headquartered in New York City with San Francisco presence and remote hiring across the US and Europe. The product is a Python-first serverless platform that abstracts containers, GPUs, and scheduling into decorators and function calls. Infrastructure is C++ / Rust heavy underneath (custom container orchestration, GPU scheduling, fast cold-start on serverless GPU instances) with Python bindings and SDK. Interviews reflect the reality of running a modern infrastructure company: expect real systems depth, GPU / distributed-compute expertise, and a bar closer to Stripe or HashiCorp than to typical startups.

Interview Structure

Recruiter screen (30 min): background, why Modal, team preference. The engineering surface is small but specialized: container / scheduler infrastructure, GPU orchestration, storage systems, Python SDK, developer platform, billing / observability, and emerging AI-product integrations. Team fit matters — scheduler and GPU teams have different profiles.

Technical phone screen (60 min): one coding problem, medium-hard. Python for SDK and product work; Rust for infrastructure; Go for some services. Problems tilt systems-y — implement a scheduling primitive, build a small container-runtime interface, process a stream with backpressure.

Take-home (most senior / staff roles): 4–8 hours on a realistic infrastructure engineering problem. Modal’s take-home is known for being substantive and design-document-heavy.

Onsite / virtual onsite (4–5 rounds):

Coding (1–2 rounds): one algorithms round, one applied systems round. The applied round often involves container / scheduler primitives — queue management, resource allocation under constraints, GPU-pool scheduling.
System design (1 round): compute-platform prompts. “Design a serverless GPU platform with sub-second cold starts on H100s.” “Design a distributed filesystem for user code and datasets across compute nodes.” “Design the billing / metering system for fractional GPU-second usage.”
Infrastructure / systems deep-dive (1 round): container runtimes (containerd, runc, gVisor), Linux primitives (cgroups, namespaces), Kubernetes internals or deliberate alternatives, network virtualization, storage systems (snapshot, COW, dedup).
Behavioral / hiring manager: past projects, early-stage comfort, technical depth across specialties, customer empathy.

Technical Focus Areas

Coding: Python fluency (modern idioms, type hints, async / await, testing); Rust for infrastructure roles (ownership, async with tokio, systems-level patterns). Clean code with production-grade error handling.

Container runtime / scheduling: Linux namespaces and cgroups, containerd / runc / crun, OCI image format, layer caching, container snapshotting for fast cold start, cgroup v2 for resource isolation, gVisor and Kata for sandboxing untrusted code.

GPU orchestration: GPU scheduling (MIG partitioning on H100 / Blackwell), CUDA context management, multi-tenancy on shared GPUs, GPU memory management, NVLink / NVSwitch topology awareness, GPU-driver version compatibility.

Distributed systems: task scheduling at scale, consensus for coordination (Modal runs a custom scheduler), backpressure in multi-tenant environments, fair-share resource allocation, preemption.

Storage: fast container-image distribution (Modal loads container images faster than typical Kubernetes), networked filesystems for user code, volume management, snapshot / checkpoint for long-running workloads.

Python SDK design: the user-facing Python API is a key part of the product — decorator-based function invocation, stubs-with-types, data-mounting, secrets management. Clean API design matters.

Observability / billing: metering at sub-second granularity, usage-based billing with predictable cost envelopes, log / trace aggregation, incident detection.

Coding Interview Details

Two coding rounds, 60 minutes each. Difficulty is medium-hard — comparable to Stripe or HashiCorp, below Google L5 on pure algorithms but higher on systems-applied problems. Interviewers push on correctness under concurrency and realistic failure modes.

Typical problem shapes:

Scheduling primitive: implement a task queue with priority, fair-share, and preemption
Resource allocator: given GPU / CPU / memory constraints, allocate to incoming tasks with acceptance / rejection logic
Streaming processor with backpressure and bounded-memory constraints
Container / image primitive (implement a layer-cache with eviction, manage a small OCI-like image format)
Classic algorithm problems (graphs, intervals, DP) with systems-applied twists (dependency DAG execution, resource-constraint satisfaction)

System Design Interview

One round, 60 minutes. Prompts focus on compute-platform engineering:

“Design a serverless GPU platform with sub-second cold-start on H100 / Blackwell instances.”
“Design a distributed filesystem for user code and datasets with consistency across nodes.”
“Design the billing / metering system for fractional GPU-second usage with audit-grade correctness.”
“Design a fair-share scheduler for multi-tenant GPU workloads preventing noisy-neighbor issues.”

What works: specific numbers (H100 FP8 throughput is X, container pull overhead is Y, cold-start target is Z), explicit treatment of failure modes (GPU OOMs, container startup failures, billing-discrepancy recovery), and real systems mechanism. What doesn’t: generic Kubernetes-everywhere designs without engaging with what Modal actually differentiates on.

Infrastructure Deep-Dive

Distinctive to Modal. Sample topics:

Walk through what happens when you invoke a Modal function — from SDK call through scheduler to container creation to execution.
Explain why Modal doesn’t use Kubernetes directly, and what Modal does instead.
Discuss how you’d handle GPU OOM mid-training in a fair-share multi-tenant environment.
Reason about container-image pull optimization for large AI images (multi-GB with CUDA / PyTorch).
Describe your approach to checkpoint / resume for long-running training jobs.

Strong candidates demonstrate real systems-internals knowledge. Weak candidates rely on vague “we’d use Kubernetes” answers. Modal has explicitly chosen a non-Kubernetes architecture; engaging with that choice thoughtfully matters.

Behavioral Interview

Key themes:

Early-stage comfort: “How do you handle ambiguity and wearing multiple hats?”
Technical depth + breadth: “Describe a hard technical problem that required you to learn a new area.”
Customer empathy: “Have you engaged directly with customers or users? How did it shape your engineering?”
Ownership: “Tell me about a production incident you owned.”

Preparation Strategy

Weeks 4-8 out: Python and/or Rust LeetCode medium/medium-hard. If targeting infrastructure roles, spend time with Rust — the Rust book plus a small infrastructure project (implement a simple container runtime, a task scheduler) pays off.

Weeks 2-4 out: read about container internals (Julia Evans’s zines are accessible entry points), GPU infrastructure (Nvidia’s CUDA and MIG documentation), and distributed scheduling. Modal has published technical posts — read them. Erik Bernhardsson’s blog (old posts about Spotify recommendation engineering) is worth reading for engineering-philosophy context.

Weeks 1-2 out: use Modal for a real project — deploy a small GPU inference endpoint or train a toy model. Form opinions about the developer experience.

Day before: review container / cgroup fundamentals; refresh async Rust if applicable; prepare 3 behavioral stories with infrastructure angles.

Difficulty: 8/10

Hard. The infrastructure specialization means candidates without relevant background struggle. Coding bar is solidly medium-hard; system design bar matches Stripe / HashiCorp; the infrastructure deep-dive is genuinely rigorous. Modal hires experienced engineers with real systems background — less than 100 employees means a high bar per hire.

Compensation (2025 data, US engineering roles)

Software Engineer: $185k–$230k base, $180k–$320k equity (4 years), modest bonus. Total: ~$300k–$470k / year.
Senior Software Engineer: $235k–$295k base, $350k–$650k equity. Total: ~$420k–$670k / year.
Staff Engineer: $300k–$370k base, $700k–$1.3M equity. Total: ~$610k–$1M / year.

Private-company equity valued at recent Series B / secondary marks. 4-year vest with 1-year cliff. Expected value is meaningful given the AI-compute tailwinds and strong product-market fit among AI developers. Treat equity as upper-mid upside with illiquidity risk. Cash comp is competitive with top private-company bands.

Culture & Work Environment

Craft-focused, technically-serious culture with Erik Bernhardsson’s influence visible — he has a substantial public writing presence on engineering topics, and the culture reflects thoughtful-engineer values. NYC-headquartered with SF presence and remote-friendly across US and Europe. Pace is fast but not frantic; the AI-compute tailwinds have brought growth but the company has resisted hyper-scaling headcount. On-call for infrastructure teams is real — when customer GPU jobs fail, Modal engineers own the diagnosis.

Things That Surprise People

Modal deliberately doesn’t use Kubernetes. The custom scheduler is a strategic technical choice, not a default.
The engineering depth per headcount is high. 100 people ships what many 500-person companies do in infrastructure.
The Python SDK is treated as product-level craft — API design matters.
Cold-start optimization is a real competitive moat. Modal invests heavily here.

Red Flags to Watch

“I’d use Kubernetes” answers to system design. Engage with why Modal doesn’t.
Weak container / cgroup fundamentals for infrastructure roles.
Not having used Modal before interviewing. Authentic product knowledge helps.
Dismissing the GPU-infrastructure specialty as “just cloud scheduling.”

Tips for Success

Use Modal for a real project. Deploy a small LLM inference endpoint or a batch data job. Understand the developer experience.
Know container / Linux internals. cgroups, namespaces, OCI — vocabulary for infrastructure interviews.
Engage with why-not-Kubernetes. Have a thoughtful position, not a rote one.
Know GPU infrastructure concepts. MIG, CUDA contexts, multi-tenancy trade-offs.
Read Erik Bernhardsson’s blog. The engineering-philosophy context helps in craft rounds.

Resources That Help

Modal engineering blog (posts on scheduler, cold-start, GPU infrastructure)
Erik Bernhardsson’s blog (especially older posts on Spotify recommender engineering)
Julia Evans’s container internals zines
Nvidia CUDA documentation and MIG technical overview
Designing Data-Intensive Applications (Kleppmann) for general systems context
The Rust Programming Language if targeting Rust infrastructure roles
Modal itself — deploy a small project before interviewing

Frequently Asked Questions

Do I need GPU / ML infrastructure background to get hired?

For GPU-scheduler and ML-infrastructure-specific roles, yes — real depth is required. For general backend, SDK, billing, and developer-experience roles, strong infrastructure generalists transition well. What’s required is willingness to engage with the systems-depth culture and openness to learning GPU / container specifics on the job.

Why doesn’t Modal use Kubernetes?

Kubernetes was designed for long-running services with predictable resource patterns, not sub-second-cold-start serverless with per-request scaling. Modal’s workload is different enough that a custom scheduler with tight integration to container-image caching, GPU allocation, and user-function lifecycle delivers better cold-start latency, resource utilization, and developer experience. This isn’t NIH — it’s a considered architectural choice explained in their engineering content. Candidates should engage with this choice thoughtfully during interviews.

How does Modal compare to other serverless GPU platforms?

Competitors include RunPod, Replicate, Baseten, Together AI (for inference), and managed offerings from AWS / GCP / Azure. Modal’s differentiation is developer experience (Python-first SDK with decorators), cold-start speed, and focus on Python-centric workflows. Replicate is more model-zoo-focused; Baseten targets production model serving specifically; RunPod is closer to raw GPU rentals. Compensation at Modal is competitive with the broader AI-infrastructure cohort.

Is Modal financially stable given competition?

The company has raised substantial private capital and has strong product-market fit among AI developers. Revenue traction appears healthy based on hiring pace and public signals. Treat equity as real but not bankable — the AI-infrastructure space is competitive, and long-term outcomes depend on execution. Cash comp alone is competitive enough that many engineers don’t optimize primarily for equity upside.

What’s the NYC headquarters reality?

Real but not exclusive. NYC has the largest engineering concentration and most senior leadership. SF presence is growing. Remote hiring across US and Europe happens meaningfully. Some roles prefer NYC in-office collaboration (especially senior infrastructure leadership); others are fully remote. Check the JD carefully.