Replicate is the API-first platform for running open-source ML models — built around the Cog packaging format. Backed by Andreessen Horowitz. The interview emphasizes ML inference infrastructure, GPU autoscaling, and the developer experience of model deployment.
Process
Recruiter screen → 60-minute coding phone (Python or Go) → onsite virtual: 2 coding, 1 system design, 1 craft deep-dive, 1 behavioral. Cycle: 3–4 weeks. Some senior roles include a take-home (Cog model containerization).
What they actually ask
- Design a model registry with versioning, weights, and Cog packaging
- Design a GPU inference scheduler with autoscaling and warm pools
- Design a usage/billing pipeline for variable-duration ML calls
- Coding: systems and pipeline framing, often Python
- Behavioral: ownership, customer empathy, working in a small distributed team
Levels and comp (2026)
- SE: $180K–$235K total (cash + early-stage equity)
- Senior SE: $245K–$320K total
- Staff: $320K–$430K total
Prep priorities
- Be fluent in Python (Cog, SDK) and Go (control plane)
- Understand container internals and GPU scheduling
- Brush up on common ML inference patterns (batching, KV cache, quantization)
Frequently Asked Questions
Is Replicate remote-friendly?
Fully distributed since founding. Most engineers are remote across the Americas and Europe.
How does Replicate compare to Modal or Hugging Face Inference Endpoints?
Replicate is opinionated ML-first with the Cog format. Modal is general compute. Hugging Face is the model-hub-tied option. Replicate pays competitively for distributed-first early stage.
What is the engineering culture?
Small, ship-focused, opinionated about developer experience. Strong async/written-first culture.