Sourcegraph Interview Guide 2026: Code Intelligence, Cody AI Coding, and Fully-Remote Culture

⏱ 8 min read

Sourcegraph

Sourcegraph Interview Process: Complete 2026 Guide

Overview

Sourcegraph is the code intelligence and AI coding platform serving enterprise engineering teams with code search, navigation, batch refactoring, and the Cody AI coding assistant. Founded 2013 by Quinn Slack and Beyang Liu, private with valuations reflecting the AI-coding tailwinds (Series D in 2021; secondary tenders since). ~350 employees in 2026 after post-2022 rightsizing that narrowed focus back to core code intelligence plus Cody AI. The product’s lineage is deep code-understanding across millions of repositories; Cody layers LLM capabilities on top of this context-rich foundation, which Sourcegraph argues (plausibly) is a durable differentiation vs generic AI coding tools. Sourcegraph is fully remote with no offices — one of the earliest fully-distributed tech companies. Engineering is Go-heavy on the backend, TypeScript on the client, with growing Python presence for ML / AI work. Interviews reflect both the deep-tech developer-tools heritage and the AI-product reality.

Interview Structure

Recruiter screen (30 min): background, why Sourcegraph, team preference. The engineering surface spans code intelligence (search, navigation, symbol resolution), platform infrastructure, Cody AI, enterprise product, and developer-experience (DX) tooling. Remote-work effectiveness is probed early — the company hires remote-only and needs candidates who genuinely thrive async.

Technical phone screen (60 min): one coding problem, medium-hard. Go for backend; TypeScript for client; Python for ML. Problems tilt applied — implement a tree-walk for a parsed AST, handle a streaming search, implement context-ranking logic.

Take-home (many senior / staff roles): 4–6 hours on a realistic engineering problem. Historically involves code-intelligence primitives — build a small language-aware search, implement a symbol-resolution system for a tiny language, or extend a Cody-style context retrieval pipeline.

Onsite / virtual onsite (4–5 rounds):

Coding (1–2 rounds): one algorithms round, one applied round. The applied round often involves code-intelligence primitives — AST traversal, trie-based identifier search, language-server-like symbol lookup, or context-window management for LLM prompts.
System design (1 round): code-intelligence-flavored prompts. “Design the code-search index for 50K repositories with billions of symbols.” “Design Cody’s context-retrieval pipeline producing 8K tokens of relevant code within 500ms.” “Design the batch-change system that applies code modifications across thousands of repositories.”
Domain / code-intelligence deep-dive (1 round): distinctive to Sourcegraph. Discussion of language servers, AST-based search, LSIF / SCIP (code-intelligence formats), symbol resolution across languages, and the specific challenges of code-at-scale.
Product / craft round (1 round): engagement with developer-experience and AI-coding-tool quality. Candidates are expected to have opinions about Cody, Copilot, Cursor, Windsurf, and the broader AI-coding landscape.
Behavioral / hiring manager: past projects, remote-work practices, async collaboration effectiveness.

Technical Focus Areas

Coding: Go idioms (channels, context propagation, graceful shutdown), TypeScript fluency for client work, Python for ML / AI. Clean code with clear error handling. Candidates are evaluated on code-review friendliness as much as correctness.

Code intelligence: parsers and abstract syntax trees, language-server protocol (LSP), code-indexing formats (LSIF, the newer SCIP), symbol tables and name resolution, cross-repository navigation, incremental indexing as code changes.

Search: trigram-based code search, Zoekt (Sourcegraph’s underlying search engine), regex search at scale, structural search via tree-sitter-based queries, semantic search with embeddings.

Cody / AI coding: context retrieval for LLM prompts (combining lexical search, embeddings, symbol resolution), prompt engineering for coding tasks, evaluation methodology for code-generation quality, latency budgets (users expect fast responses), fallback handling when the LLM produces bad code.

Infrastructure: Kubernetes-based deployment (both cloud and customer-managed), multi-tenant isolation for cloud customers, enterprise deployment patterns for air-gapped environments, data ingestion for customer-code corpora.

Batch changes: large-scale refactoring across thousands of repositories, dependency management, change-review workflows, integration with version-control hosts (GitHub, GitLab, Bitbucket).

Coding Interview Details

Two coding rounds, 60 minutes each. Difficulty is medium-hard — comparable to Datadog or HashiCorp. Interviewers actively engage and push on clarity and edge cases. Go is strongly preferred for backend rounds; TypeScript for client-adjacent work.

Typical problem shapes:

AST-walk: given a tree of language tokens, implement queries like “find all references to this identifier” or “find all matches of this pattern”
Trie-based prefix search with ranking (find identifiers starting with X, ordered by relevance)
Context-window management: given a set of candidate code snippets and a token budget, select the most relevant subset
Streaming search: process a large document corpus with bounded memory
Classic algorithm problems (graphs, trees) with code-intelligence twists (dependency graph traversal, call-graph analysis)

System Design Interview

One round, 60 minutes. Prompts focus on code-intelligence scale:

“Design the code-search index for 100K repositories with incremental updates and 200ms query latency.”
“Design Cody’s context-retrieval system producing 8K tokens of relevant context within 500ms.”
“Design the batch-change system applying a refactor across 5K repositories with failure recovery.”
“Design the enterprise deployment model for air-gapped customers with self-managed Kubernetes clusters.”

What works: explicit treatment of code-at-scale realities (many repos, incremental updates, language-specific indexing), LLM-specific operational concerns (latency budgets, context-window limits, fallback), and enterprise constraints (air-gap, data residency). What doesn’t: generic search designs that ignore code’s specific structure.

Code-Intelligence Deep-Dive

Distinctive round. Sample topics:

Walk through how you’d build “find references” for a specific language.
Explain the trade-offs of LSIF vs SCIP as indexing formats.
Discuss how tree-sitter compares to language-specific parsers.
Reason about how to handle incremental indexing when code changes.
Describe approaches for cross-language symbol resolution.

Candidates with language-server or compiler background have an edge. Strong generalists close the gap with focused prep.

Product / Craft Round

Sample prompts:

“What AI coding tools have you used? What’s strong and weak about each?”
“If you were designing Cody, how would you prioritize features for the next quarter?”
“Describe a time an AI coding tool failed you. What would you have fixed?”
“How do you measure the quality of generated code beyond ‘it compiles’?”

Candidates with authentic engagement with AI coding tools (having used Copilot, Cody, Cursor, Windsurf deeply) do better. Those without real usage often fall into generic takes.

Behavioral Interview

Key themes:

Remote effectiveness: “Describe your practices for working effectively in a fully-distributed team.”
Written communication: “Tell me about a time written documentation changed a project outcome.”
Ownership: “Describe a production incident you owned end-to-end.”
Customer focus: “How have you directly engaged with developer / customer feedback?”

Preparation Strategy

Weeks 4-6 out: Go LeetCode medium/medium-hard. Emphasize trie / trees / AST-walk problems. Practice idiomatic Go patterns.

Weeks 2-4 out: use Cody for a real project — compare with Copilot, Cursor, Windsurf. Read about tree-sitter, LSP, and code-intelligence formats. Sourcegraph has published content about their architecture — especially Zoekt and their approach to embeddings for code.

Weeks 1-2 out: set up Sourcegraph on a sample codebase (the free tier / public-code version works). Form opinions about the product. Prepare remote-work stories with specific practices.

Day before: review AST / trie fundamentals; prepare 3 behavioral stories; form 3 concrete opinions about AI coding tools.

Difficulty: 7/10

Medium-hard. Below Google L5 on pure algorithms; the code-intelligence specialty and AI-coding depth filter meaningfully. Candidates with compiler / language-server background or production AI-coding experience have clear edges. Remote-work effectiveness is a genuine filter — candidates who haven’t worked fully distributed sometimes struggle with the culture fit.

Compensation (2025 data, engineering roles)

Software Engineer: $170k–$215k base (location-adjusted), $100k–$180k equity/yr, modest bonus. Total: ~$260k–$410k / year (US).
Senior Software Engineer: $220k–$280k base, $180k–$340k equity/yr. Total: ~$360k–$560k / year.
Staff Engineer: $285k–$345k base, $330k–$600k equity/yr. Total: ~$520k–$800k / year.

Private-company equity valued at recent tender prices. 4-year vest with 1-year cliff. Compensation is competitive with mid-tier private-company tech. Location-factored comp is transparent — Sourcegraph publishes its pay methodology publicly. Non-US comp runs proportionally to local-market bands.

Culture & Work Environment

Fully remote, written-first culture. Sourcegraph has been fully distributed since founding; the Handbook (inspired by GitLab’s approach) documents engineering processes publicly. Decisions happen in written RFCs and async discussions. Pace is deliberate rather than frantic; post-2022 rightsizing brought tighter focus and cost discipline. The engineering culture is deeply technical — most engineers care about compiler / language / AI systems beyond just shipping features. Cody is the fastest-growing part of the company.

Things That Surprise People

The technical heritage is real. Sourcegraph engineers are among the most knowledgeable people about code intelligence anywhere.
Cody is a significant engineering investment, not a wrapper over third-party LLMs. The context-retrieval work is distinctive.
The Handbook is real — read it before interviewing.
Remote work is not “remote-friendly,” it’s the company’s structural default with no offices anywhere.

Red Flags to Watch

Not having used Cody or competitor AI coding tools seriously before interviewing.
Treating code search as “just grep at scale.” The indexing depth is real engineering.
Weak async communication practices in past roles.
No opinions about the AI-coding landscape. Sourcegraph operates in it daily.

Tips for Success

Use Cody and competitors. Authentic opinions win in the product round.
Read the Handbook. At minimum the engineering and values sections.
Engage with code intelligence. Tree-sitter, LSP, AST — even a weekend exploring these pays off.
Prepare remote-work stories with specifics. “We wrote a 3-page RFC before the engineering design session” beats generic “I communicate well.”
Ask about Cody’s roadmap. Signals you’re aware of strategic priorities.

Resources That Help

Sourcegraph engineering blog (posts on Zoekt, Cody, context retrieval, embeddings for code)
The Sourcegraph Handbook (engineering, values, and operating practices)
Crafting Interpreters by Robert Nystrom for AST / compiler fundamentals
Tree-sitter documentation and examples
LSP and LSIF / SCIP specifications
Cody itself — use it for a real project over 1–2 weeks before interviewing

Frequently Asked Questions

Do I need compiler / language-server background?

Helpful but not strictly required. For core code-intelligence roles, real compiler / AST / language-server depth is valued and gives candidates a significant edge. For platform, infrastructure, and Cody AI roles, strong backend generalists transition well. What’s required is curiosity about code as a structured domain rather than a black box.

How does Cody compare to GitHub Copilot / Cursor / Windsurf?

Cody’s differentiation is context depth — the argument is that Sourcegraph’s existing code intelligence provides richer context to LLMs than generic code-completion tools can. In practice, Copilot has broader brand awareness; Cursor has shipped faster at the IDE-UX level; Cody has deeper code-context capabilities for enterprise codebases. The competitive dynamic shifts rapidly; candidates should form their own current opinion based on recent usage.

How real is the fully-remote culture?

Genuinely real. Sourcegraph has never had offices and operates entirely remotely. The Handbook documents practices, written RFCs drive most decisions, and meetings are optional for most work. Candidates who’ve only worked in-office or hybrid environments should expect a real cultural adjustment; the written-first rhythm is different from even “remote-friendly” hybrid companies.

Is Sourcegraph financially stable given the 2022 rightsizing?

Post-rightsizing, Sourcegraph has been focused on core code intelligence and Cody AI with tighter cost discipline. The company has strong enterprise customers (Uber, Dropbox, Yelp, large banks) with recurring revenue. Runway is healthy based on public signals. New-hire equity is valued at recent tender prices, which discount from 2021 peaks. The AI-coding tailwinds support continued growth; treat equity as mid-upside with illiquidity risk.

What’s the team structure like?

Small-team ownership is a Sourcegraph cultural value. Teams typically own specific product areas (search, Cody, batch changes, enterprise product, developer platform) with relatively flat reporting. Staff+ engineers have significant scope given company size. The Handbook documents how teams form and dissolve; the structure evolves based on product priorities more than traditional reorgs.