Core Entities
Company: company_id, name, description, logo_url, website, size, industry, verified (boolean). Job: job_id, company_id, title, description, location, is_remote, employment_type (FULL_TIME, PART_TIME, CONTRACT, INTERNSHIP), experience_level (ENTRY, MID, SENIOR, STAFF), salary_min, salary_max, currency, status (DRAFT, ACTIVE, PAUSED, CLOSED), posted_at, expires_at, applicant_count. Application: application_id, job_id, applicant_id, status (APPLIED, REVIEWING, PHONE_SCREEN, INTERVIEW, OFFER, REJECTED, WITHDRAWN), applied_at, resume_url, cover_letter, recruiter_notes. SavedJob: user_id, job_id, saved_at. Alert: alert_id, user_id, query (JSON: title, location, salary_min), frequency (DAILY, WEEKLY), last_sent_at. JobView: view_id, job_id, user_id (nullable), session_id, viewed_at, source (SEARCH, DIRECT, EMAIL, RECOMMENDATION).
Job Search
Job search has two requirements: fast full-text search (job title, description, company name) and structured filters (location, salary range, experience level, employment type). Elasticsearch is the standard choice: index job documents on post/update, query with a combination of full-text and filter clauses.
GET /jobs/_search
{
"query": {
"bool": {
"must": [
{"multi_match": {
"query": "senior python engineer",
"fields": ["title^3", "description", "company_name"]
}}
],
"filter": [
{"term": {"status": "ACTIVE"}},
{"term": {"is_remote": true}},
{"range": {"salary_min": {"gte": 100000}}},
{"term": {"experience_level": "SENIOR"}}
]
}
},
"sort": [{"_score": "desc"}, {"posted_at": "desc"}]
}
The title^3 boost makes title matches 3x more valuable than description matches. Geo search: for location-based search, store lat/lng and use Elasticsearch geo_distance filter. Salary normalization: normalize all salaries to annual (hourly * 2080, monthly * 12) for range filtering. Index freshness: sync database changes to Elasticsearch via Debezium CDC with < 5 second lag.
Application State Machine
Application lifecycle: APPLIED → REVIEWING (recruiter has seen it) → PHONE_SCREEN → INTERVIEW → OFFER → (hired, outside system) or REJECTED at any stage. WITHDRAWN: applicant withdraws. State transitions are enforced in ApplicationService. Each transition is logged in ApplicationHistory for the recruiter’s timeline view. Automated transitions: when an applicant applies, status=APPLIED. When a recruiter opens the application: status=REVIEWING (first open triggers the transition, idempotent). Rejection emails: on REJECTED transition, trigger an email notification to the applicant (via async job). Template: personalized with the job title and company name. Option to suppress (some companies prefer not to send rejection emails). Bulk operations: recruiters often need to move 50 applications from PHONE_SCREEN to INTERVIEW after a batch of calls. Support batch state transitions with confirmation dialog showing count.
Job Alerts and Recommendations
Job alerts: user saves a search query (title keywords, location, salary, employment type). Stored as a JSON blob in the Alert table. A daily batch job: for each active alert, run the saved query against Elasticsearch filtered to jobs posted since last_sent_at. If new matching jobs: compose a digest email, send via SES/SendGrid, update last_sent_at. Alert matching is a fan-in query: for 1M active alerts, running each as a separate Elasticsearch query is too slow. Optimization: reverse-index alerts by key attributes. When a new job is posted, find matching alerts (query alerts by category/location/salary range). Notify only matching alert owners. This push model scales better than pulling all alerts daily. Job recommendations: collaborative filtering (users who viewed job X also viewed Y), content-based filtering (user’s past applications are similar to job Y), and skills matching (extract skills from resume, match to job requirements). Display as a “Recommended for you” section on the homepage.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you design job alert matching to scale with millions of active alerts?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Naive approach: when a new job is posted, run a search query for each of 1M active alerts against Elasticsearch — 1M queries per job post is prohibitive. Two better approaches: (1) Reverse fan-out at post time: when a job is posted, extract its key attributes (title keywords, location, salary range, employment type, industry). Query alerts that match these attributes: SELECT alert_id FROM alerts WHERE … (matching on stored JSON attributes). Notify only those alert owners. This inverts the query: instead of “does this job match alert X?”, ask “which alerts match this job?” This is still O(matching alerts) per job but avoids scanning all 1M alerts. Optimize with a pre-indexed attributes column or a secondary table. (2) Batch daily digests: run alerts as a daily batch job during low-traffic hours. For each alert: run the saved Elasticsearch query filtered to jobs posted since last_sent_at. Aggregate into a digest email. More efficient than real-time but less timely. Combine both: real-time alerts for premium users (reverse fan-out), daily digests for free users (batch). This segments complexity by user tier.”
}
},
{
“@type”: “Question”,
“name”: “How do you prevent job posting spam and fake companies?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Job board integrity requires preventing fraudulent postings that deceive job seekers. Verification layers: (1) Email domain verification: company must have a corporate email domain (not gmail.com, yahoo.com). Verify via DKIM/SPF record lookup or by requiring email confirmation from a corporate domain. (2) Manual verification for new companies: first-time companies go through a manual review queue. Check: does the company website exist? Does the about page list the company? Are there employee profiles on LinkedIn? Flag suspiciously new domains. (3) Credit card requirement: requiring a payment method (even for free postings) adds friction for spam accounts. Free postings can be offered but require a verified payment method on file. (4) ML classifier: train a classifier on job description text to detect spam patterns (excessive salary promises, vague “work from home” schemes, MLM structure, crypto/investment scams). (5) Rate limiting: new accounts can post at most N jobs per week. Flag accounts posting many jobs with low views or high “report” rates. (6) Community reporting: job seekers can report fraudulent postings. Reports above a threshold trigger manual review.”
}
},
{
“@type”: “Question”,
“name”: “How do you implement applicant tracking to give recruiters a clear pipeline view?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Recruiter pipeline view: all applications for a job, organized by stage. Each stage shows a count and a list of applicants. Requirements: real-time counts, drag-to-move between stages, bulk operations. Schema: applications table with (job_id, status, …). Status = current pipeline stage. Pipeline view query: SELECT status, COUNT(*), array_agg(application_id ORDER BY applied_at DESC) FROM applications WHERE job_id=:id GROUP BY status. Cache the counts per (job_id, status) in Redis for fast page loads. Update Redis on each status change: HINCRBY pipeline:{job_id} {old_status} -1; HINCRBY pipeline:{job_id} {new_status} 1. Drag-to-move: client sends PATCH /applications/:id {status: “INTERVIEW”}. Server validates the transition, updates the database, updates Redis counters, logs the transition in application_history. Bulk operations: POST /applications/bulk-update {application_ids: […], new_status: “REJECTED”}. Process each update in a loop with the same validation and logging. Cap bulk updates at 100 per request to prevent timeout. Return a summary: {succeeded: 95, failed: 5, errors: […]}.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle resume storage and parsing for job applications?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Resume upload: accept PDF, DOCX, DOC formats (max 5MB). Upload directly to S3 via pre-signed URL (client uploads to S3 directly, not through your server — avoids server bandwidth cost). Store the S3 key and metadata (filename, size, upload_time) in the Resume table. Pre-signed URL flow: client requests a pre-signed URL from your server. Server generates: s3.generate_presigned_url(method=”PUT”, bucket, key, expires=300). Client uploads the file directly to the S3 URL. Client notifies your server that the upload is complete (PUT /resumes/:id/confirm). Server moves the file from the temp location to the permanent bucket (or just marks it as confirmed). Resume parsing: use a parsing service (Textract for PDF, python-docx for DOCX) to extract structured data: contact info, work experience, education, skills. Store the parsed JSON alongside the original file. Index skills and titles in Elasticsearch for skills-based matching. Privacy: apply data retention policies — delete resumes X months after an application closes (configurable per company, per legal jurisdiction). Anonymize during blind review (hide name, photo, address to reduce bias).”
}
},
{
“@type”: “Question”,
“name”: “How do you design salary data to ensure accuracy and usefulness?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Salary transparency: many job postings include salary ranges to attract candidates and comply with transparency laws (Colorado, New York, California). Data challenges: (1) Normalization: convert all salaries to annual figures (hourly * 2080, monthly * 12, weekly * 52). Store original value + unit + normalized annual. (2) Currency: store as amount + currency code (ISO 4217). For search/comparison: normalize to a base currency (USD) using a daily exchange rate cache. (3) Equity: separate base_salary_min/max from equity_percentage and equity_cliff/vesting. Total compensation (TC) = base + expected equity. (4) Range validation: if salary_min > salary_max: reject. If salary_min < $1/hour: reject (likely a data entry error). If salary range is implausibly wide (min=$50k, max=$500k for the same role): flag for review. Market data for job seekers: aggregate salary data from confirmed applications (with user consent). Show "typical salary for this role in this location" ranges. Anonymize: show P25, P50, P75 — never expose individual salaries. Source compensation data from anonymous surveys (like levels.fyi for tech)."
}
}
]
}
Asked at: LinkedIn Interview Guide
Asked at: Shopify Interview Guide
Asked at: Stripe Interview Guide
Asked at: Cloudflare Interview Guide