How does LinkedIn People You May Know algorithm work?

PYMK recommends 2nd-degree connections (friends of friends). For user U with 500 connections, each averaging 500 connections: the candidate pool is up to 250,000 people. Ranking: number of mutual connections (strongest signal), shared companies, shared schools, shared skills, and geographic proximity. The computation is a fan-out: enumerate connections of connections, deduplicate, filter existing connections, and rank. This runs as an offline Spark job and the top-50 suggestions are cached per user in Redis, refreshed daily. PYMK is LinkedIn most important growth feature -- each accepted suggestion creates two new connection opportunities (both users now have a new 2nd-degree network to discover).

How does LinkedIn job matching use embeddings?

LinkedIn encodes both jobs and candidates into the same vector space using embedding models. Similar jobs and matching candidates have nearby embeddings. Features: title similarity, skill match, experience level, location compatibility, company preferences, and salary expectations. The model learns from: applications (positive), views without application (weak negative), and not interested feedback (strong negative). ANN (approximate nearest neighbor) search retrieves matching candidates for a job (or matching jobs for a candidate) in milliseconds from 900M+ profiles. The pipeline runs continuously: new job postings and profile updates trigger recomputation. Recruiter side: candidates are ranked by fit + responsiveness (InMail response rate) + recency of profile update + Open to Work signal.

System Design: Design LinkedIn — Professional Network, Connection Graph, Feed Ranking, Job Matching, People Search

⏱ 5 min read

LinkedIn connects 900+ million professionals with a rich social graph, news feed, job marketplace, and messaging system. Designing LinkedIn tests your understanding of graph-based systems (connections, degrees of separation), content ranking, job recommendation, and search across structured professional profiles. This guide covers the key components for a system design interview.

Connection Graph

LinkedIn connections form an undirected graph (if A connects with B, B is also connected to A). Key queries: (1) Direct connections — list all 1st-degree connections of user U. (2) 2nd-degree connections — friends of friends (people you may know). (3) Degrees of separation — shortest path between two users (LinkedIn shows “2nd,” “3rd,” or “3rd+”). (4) Mutual connections — intersection of two users connection lists. Storage: an adjacency list in a graph database or wide-column store. Each user has a row with their connection list. With 900M users averaging 500 connections: 450 billion edges (stored as bidirectional pairs). “People You May Know” (PYMK): the most important growth feature. Algorithm: for each of U 1st-degree connections, enumerate their connections (2nd-degree). Rank 2nd-degree candidates by: number of mutual connections (more mutual = higher score), shared companies, shared schools, shared skills, and geographic proximity. This is a fan-out computation: for a user with 500 connections, each averaging 500 connections, the candidate pool is up to 250,000 (with deduplication and filtering of existing connections). Pre-compute PYMK scores offline (Spark job) and cache the top-50 suggestions per user in Redis. Refresh daily.

Feed Ranking

The LinkedIn feed shows posts from connections, followed companies, and influencers, ranked by predicted engagement. Two-stage ranking: (1) Candidate generation — collect recent posts from: 1st-degree connections (direct posts and reshares), 2nd-degree (posts liked/commented by your connections — “John Doe commented on this”), followed companies and influencers, and trending content in your industry. (2) Ranking — an ML model scores each candidate by predicted engagement probability. Features: author-viewer relationship (connection strength, interaction history), post content (text quality, topic relevance, media type), engagement velocity (early likes/comments signal quality), and freshness. LinkedIn specifically optimizes for “meaningful professional conversations” — posts that generate comments rank higher than posts with only likes (comments indicate deeper engagement). Viral dampening: LinkedIn intentionally limits viral spread to prevent low-quality content from dominating. A post initial reach is limited to 1st-degree connections. If engagement exceeds a threshold, the post is shown to 2nd-degree connections. This staged rollout prevents clickbait from going viral instantly. Creator-side signals: posts from users who regularly create quality content (high engagement rate) get a boost.

Job Matching and Recommendation

LinkedIn job matching connects 60+ million job listings with 900 million profiles. Two sides: (1) Jobs for candidates — recommend relevant jobs to users. Features: title similarity (the user current/past titles vs job title), skill match (user skills vs job required skills), experience level match, location/remote compatibility, company preferences (user applied to similar companies), and salary expectations. The model learns from: job applications (positive signal), job views without application (weak negative), and explicit “not interested” feedback (strong negative). (2) Candidates for recruiters — given a job posting, rank candidates by fit. Similar features, reversed. Additionally: candidate responsiveness (do they respond to InMail?), recency of profile update (recently updated = more likely job seeking), and “Open to Work” signal. Both sides use embedding models: encode jobs and candidates into the same vector space. Similar jobs and matching candidates have nearby embeddings. ANN search retrieves candidates for a job (or jobs for a candidate) in milliseconds. The matching pipeline runs continuously: as new jobs are posted and profiles are updated, matches are recomputed.

People and Company Search

LinkedIn search indexes structured professional profiles: name, headline, current company, past companies, skills, education, location, and industry. Unlike web search (unstructured text), LinkedIn search operates on structured fields with faceted filtering. Architecture: Elasticsearch (LinkedIn uses Galene, their custom search engine built on Lucene). Each profile is a document with typed fields. Query: text match on name/headline + filters (company, location, industry, skills, school). Results ranked by: text relevance, connection proximity (1st-degree connections rank highest — this is the key differentiator from generic search), profile completeness, activity recency, and recruiter-specific signals (responsiveness, “Open to Work”). Faceted search: users filter by company, location, industry, school, and connection degree. Each facet shows count of matching results. The facet counts are computed from the search result set using Elasticsearch aggregations. Typeahead: as the user types, suggest names, companies, skills, and titles. Pre-computed suggestion lists per entity type. Prioritize suggestions the user is likely to search for (connections, companies in their industry). Privacy: search visibility is controlled by profile settings. “Private mode” hides who viewed your profile. Premium features gate advanced search filters (company size, years of experience, seniority level).

Messaging (InMail)

LinkedIn messaging supports: free messages between connected users, InMail (paid messages to non-connections — limited monthly quota for premium users), and group conversations. Architecture: similar to WhatsApp (see our Chat Application guide) but with professional context. Messages are stored partitioned by conversation_id. Real-time delivery via WebSocket with push notification fallback. Key differences from consumer chat: (1) Read receipts and typing indicators are optional (professional context — less pressure to respond immediately). (2) Smart replies — ML-generated short response suggestions (“Thanks for reaching out!”, “I would be happy to connect”) based on the message content. (3) InMail delivery optimization — LinkedIn ML predicts whether the recipient is likely to respond. Low-response-probability InMails may show a warning to the sender. This protects recipients from spam while improving the experience for senders (their limited InMails go to responsive recipients). (4) Integration with job applications — a message thread can be linked to a job posting, providing context for both recruiter and candidate.