Together AI is the leading inference platform for open-source LLMs — runs Llama, DeepSeek, Qwen, and many others at high throughput. Founded by Vipul Ved Prakash and team. Series B in 2024. The interview emphasizes inference engineering (vLLM, TGI, custom kernels), GPU economics, and developer experience for the LLM stack.
Process
Recruiter screen → 60-minute coding phone (Python/CUDA fluency helpful) → onsite virtual: 2 coding, 1 ML system design, 1 craft deep-dive, 1 behavioral. ML systems candidates may get a research deep-dive. Cycle: 3–4 weeks.
What they actually ask
- Design a high-throughput inference server (continuous batching, paged KV cache)
- Design a multi-tenant GPU pool with token-level billing
- Design fine-tuning infrastructure for LoRA at scale
- Coding: systems-flavored, often with ML throughput or memory framing
- Behavioral: ownership, customer empathy, fast-moving startup
Levels and comp (2026)
- SE: $190K–$260K total (cash + early-stage equity)
- Senior SE: $260K–$355K total
- Staff / ML Systems: $360K–$520K total
- Principal: $500K–$750K+ total
Prep priorities
- Be fluent in Python (control plane) and CUDA/Triton (kernels)
- Understand transformer inference (KV cache, paged attention, speculative decoding)
- Brush up on GPU memory hierarchy and kernel optimization
Frequently Asked Questions
Is Together AI remote-friendly?
Hubs in San Francisco (HQ) and remote across US/Europe. Most engineering roles fully remote.
How does Together compare to Modal or Replicate?
Together is open-source-LLM-first inference. Modal is general serverless compute. Replicate is opinionated model-hub. Together pays competitively for ML systems and offers strong equity upside.
What is the engineering culture?
Technically dense, research-engineering hybrid, fast iteration. Strong open-source contributor culture.