What Modal Labs actually asks backend and infra engineers

Updated July 1, 2026 · techinterview.org

⏱ 7 min read

Modal rewrote its core infrastructure from Python to Rust over roughly eighteen months, built its own container runtime, filesystem, image builder, and scheduler, and runs the whole thing on gVisor with GPUs attached. The interview is shaped by that bet. If you walk in expecting a tidy LeetCode loop, you’ll be off balance by the second round.

The company is small and was founded in 2021 by Erik Bernhardsson, who led machine learning at Spotify and was CTO at Better, along with Akshat Bubna, who came out of Scale. That shapes how they hire. There’s no army of interviewers reading from a shared rubric. The people across the table built the thing you’d be working on, and they can usually tell inside twenty minutes whether you actually reason about systems or just memorized that you’re supposed to.

What the loop looks like

For a backend or infrastructure role, expect a recruiter call, one or two technical screens, then a virtual onsite of four to five rounds. The exact shape moves around with the team and your background, but the pieces are consistent: a coding deep-dive, a system design round, a conversation about Modal’s own architecture, and a behavioral round focused on ownership.

Round	What they’re checking	Where people lose it
Coding deep-dive	Whether you write real code under a system constraint, rather than only passing tests	Treating it as a puzzle instead of reasoning about the machine underneath
System design	Designing a piece of infra against concrete SLOs and failure modes	Hand-waving past cold starts, scheduling, and resource contention
Architecture conversation	Whether you’ve read their engineering writing and can push back on it	Showing up cold with no opinion on the tradeoffs they’ve published
Behavioral and ownership	High agency, owning failures, shipping under ambiguity	Generic STAR answers with no real stakes attached

The coding round is about the machine, not the trick

The coding bar sits at a solid medium-to-hard, but the framing is different from a typical big-company screen. You’re less likely to get “invert this binary tree” and more likely to get a problem that maps onto something Modal actually deals with. A common pattern: implement a caching layer for a filesystem that serves chunks of a container image over the network, where the same chunk gets requested by hundreds of containers within a few seconds.

That sounds like a plain LRU cache, and the first version usually is. The interesting part is what they ask next. What happens when two containers miss on the same chunk at the same moment and both go fetch it? Now you’re talking about a single-flight pattern so you don’t stampede the backend. What if the chunk is 200 MB and you’re memory constrained? Now you’re talking eviction policy and whether you pin hot chunks. The code stays small. The reasoning gets deep.

class ChunkCache:
    def __init__(self, capacity_bytes):
        self.capacity = capacity_bytes
        self.used = 0
        self.store = {}            # key -> bytes
        self.inflight = {}         # key -> Future, so we fetch once

    async def get(self, key):
        if key in self.store:
            return self.store[key]
        if key in self.inflight:
            return await self.inflight[key]   # coalesce the stampede
        fut = self.inflight[key] = asyncio.ensure_future(self._fetch(key))
        try:
            data = await fut
        finally:
            self.inflight.pop(key, None)
        self._admit(key, data)
        return data

If you write the naive version and then narrate the failure modes before they ask, you’re already passing. They want to see you think about contention, beyond correctness on the happy path. A candidate who pauses and says “wait, two callers will race here” earns more than one who produces a clever one-liner and can’t explain when it breaks.

The design round is where the company’s whole thesis shows up. The prompts are barely disguised versions of problems the team has solved or is still solving. A few that come up, phrased close to how they’re actually asked:

“Design the part of the system that keeps a buffer of warm GPU containers ready, so a function call doesn’t sit for thirty seconds waiting on a cold start.”
“A customer’s job loads a 40 GB model file on every cold start. How do you make the second start fast without copying 40 GB around?”
“Ten thousand short tasks land in a single second. Walk me through how the scheduler places them.”

The cold-start question is the one to internalize, because it’s the company’s core engineering problem. A weak answer reaches for “just keep instances always on,” which torches the economics of a serverless GPU platform where idle A100s are expensive. A strong answer separates the layers. There’s the container sandbox itself, where gVisor gives you isolation but adds boot overhead, and Modal famously made gVisor work with GPUs when almost nobody else had. There’s the filesystem, where you don’t ship a 40 GB image; you back the root filesystem with the network and lazily pull only the chunks the process actually touches, which is exactly why the cache problem from the coding round matters. And there’s GPU memory snapshotting, where you checkpoint the loaded model state so a restart restores it instead of recomputing from zero.

For the warm-buffer question, talk about it as a control loop. You predict demand per function, hold a pool of pre-initialized sandboxes sized against a latency SLO, and let the pool shrink when traffic dies so you’re not paying for idle hardware. Then the interviewer pushes: what’s your signal for scaling the pool, how do you avoid oscillation, what happens during a thundering herd when a popular function gets hammered at once. If you’ve ever run an autoscaler in anger, lean on that. If you haven’t, reason it out loud and they’ll follow you.

The architecture conversation, and why reading their blog pays off

Modal publishes unusually detailed engineering writing. There are posts on how they cut container launch times, how the filesystem streams images, how snapshotting works. One round is essentially a conversation about that material, and the interviewer expects you to have read some of it and formed a view. Not a recap, a view. They’ve made specific tradeoffs, and they want to know if you can engage with them as a peer rather than nod along.

Concretely, that means going in able to say something like “I read how you snapshot GPU memory to skip model reload, and I’d want to know how you handle a CUDA driver version mismatch between snapshot and restore host,” or “gVisor’s syscall interception adds overhead on I/O-heavy workloads, so I’m curious where you draw the line versus a lighter sandbox.” You don’t need to be right. You need to have actually thought about it. Candidates who show up cold here read as people who want a job at a hot AI-infra company without caring about the infra, and that’s the fastest way out of the process.

The behavioral round screens for agency, not polish

Two questions show up in some form almost every time. One is “tell me about a time you owned an infrastructure failure that hit a customer.” The other is “tell me about a bold systems decision you made that could have failed badly.” Both are probing the same trait. Modal hires builders who take responsibility for production and make calls without a committee.

The answers that land are specific and a little uncomfortable. Name the outage, the blast radius, what you did at 2 a.m., and what you changed so it couldn’t recur. Talk about the migration you pushed through over objections, including the part where it nearly went sideways. The answers that flop are the rehearsed ones where nothing was ever really at risk and everything worked out cleanly. Real ownership has scar tissue, and they’re looking for it.

How to actually prepare

Spend real time with Rust if you’re aiming at infrastructure. You don’t need to be an expert, but you should be comfortable reading it and able to talk about ownership, lifetimes, and why a team would accept the borrow checker’s friction for memory safety in a runtime that can’t afford to crash. The single best prep is to build something small and concrete: a toy container runtime using Linux namespaces and cgroups, or a basic task scheduler that places jobs across workers. Even a weekend version teaches you the vocabulary the interviewers speak.

Read the gVisor and Firecracker design docs so sandboxing isn’t a black box. Read Modal’s own posts so the architecture round isn’t a surprise. And don’t pour weeks into grinding three hundred algorithm problems; the coding round rewards systems reasoning far more than pattern-matching, and the time is better spent understanding cold starts than memorizing dynamic programming templates.

The people who do well here aren’t the ones with the cleanest two-pointer solution. They’re the ones who, when a GPU container takes four seconds to boot, get genuinely annoyed and want to know exactly where the time went. Bring that, and most of the loop turns into a conversation rather than an exam.