CoreWeave is the largest specialty GPU cloud — operates 250,000+ GPUs across multiple data centers, with Microsoft and OpenAI among its largest customers. IPO in 2025. The interview emphasizes deep infrastructure engineering, InfiniBand topology, and the operational reality of running tens of thousands of H100/B200 GPUs reliably.
Process
Recruiter screen → 60-minute coding (Go or Python) → onsite virtual: 2 coding, 1 system design (often distributed-systems-flavored), 1 craft deep-dive, 1 behavioral. Senior+ infra candidates often get a Linux/networking deep-dive. Cycle: 3–5 weeks.
What they actually ask
- Design a multi-tenant GPU cluster scheduler with topology awareness
- Design a Kubernetes-on-GPU control plane with NCCL and InfiniBand support
- Design a high-throughput object store for ML datasets
- Coding: medium DSA, often with networking, scheduling, or distributed-systems framing
- Behavioral: ownership, on-call discipline, customer empathy for AI labs
Levels and comp (2026)
- SE: $185K–$245K total
- Senior SE: $260K–$360K total
- Staff: $370K–$510K total
- Principal: $510K–$700K total
Prep priorities
- Be fluent in Go (control plane), Python (orchestration), and Linux/networking fundamentals
- Understand InfiniBand, NVLink, NCCL, and GPU topology deeply
- Brush up on Kubernetes device plugins, Slurm, and HPC scheduling
Frequently Asked Questions
Is CoreWeave remote-friendly?
Hubs in Roseland NJ (HQ), and presence in datacenter regions. Some engineering roles fully remote within US; many require datacenter or NYC-area proximity.
How does CoreWeave compare to Lambda Labs or RunPod?
CoreWeave is the largest by GPU count and skews enterprise (Microsoft, OpenAI). Lambda is mid-tier and stronger on on-prem. RunPod is developer/spot-friendly. Comp is mid-to-high tier infra; senior+ bands competitive with FAANG.
What is the engineering culture?
Hardware-and-network-aware, customer-driven (sells to AI labs and hyperscalers). Strong on-call expectation; the workloads matter to the customers.