Question 1

How do you design job alert matching to scale with millions of active alerts?

Accepted Answer

Naive approach: when a new job is posted, run a search query for each of 1M active alerts against Elasticsearch -- 1M queries per job post is prohibitive. Two better approaches: (1) Reverse fan-out at post time: when a job is posted, extract its key attributes (title keywords, location, salary range, employment type, industry). Query alerts that match these attributes: SELECT alert_id FROM alerts WHERE ... (matching on stored JSON attributes). Notify only those alert owners. This inverts the query: instead of "does this job match alert X?", ask "which alerts match this job?" This is still O(matching alerts) per job but avoids scanning all 1M alerts. Optimize with a pre-indexed attributes column or a secondary table. (2) Batch daily digests: run alerts as a daily batch job during low-traffic hours. For each alert: run the saved Elasticsearch query filtered to jobs posted since last_sent_at. Aggregate into a digest email. More efficient than real-time but less timely. Combine both: real-time alerts for premium users (reverse fan-out), daily digests for free users (batch). This segments complexity by user tier.

Question 2

How do you prevent job posting spam and fake companies?

Accepted Answer

Job board integrity requires preventing fraudulent postings that deceive job seekers. Verification layers: (1) Email domain verification: company must have a corporate email domain (not gmail.com, yahoo.com). Verify via DKIM/SPF record lookup or by requiring email confirmation from a corporate domain. (2) Manual verification for new companies: first-time companies go through a manual review queue. Check: does the company website exist? Does the about page list the company? Are there employee profiles on LinkedIn? Flag suspiciously new domains. (3) Credit card requirement: requiring a payment method (even for free postings) adds friction for spam accounts. Free postings can be offered but require a verified payment method on file. (4) ML classifier: train a classifier on job description text to detect spam patterns (excessive salary promises, vague "work from home" schemes, MLM structure, crypto/investment scams). (5) Rate limiting: new accounts can post at most N jobs per week. Flag accounts posting many jobs with low views or high "report" rates. (6) Community reporting: job seekers can report fraudulent postings. Reports above a threshold trigger manual review.

Question 3

How do you implement applicant tracking to give recruiters a clear pipeline view?

Accepted Answer

Recruiter pipeline view: all applications for a job, organized by stage. Each stage shows a count and a list of applicants. Requirements: real-time counts, drag-to-move between stages, bulk operations. Schema: applications table with (job_id, status, ...). Status = current pipeline stage. Pipeline view query: SELECT status, COUNT(*), array_agg(application_id ORDER BY applied_at DESC) FROM applications WHERE job_id=:id GROUP BY status. Cache the counts per (job_id, status) in Redis for fast page loads. Update Redis on each status change: HINCRBY pipeline:{job_id} {old_status} -1; HINCRBY pipeline:{job_id} {new_status} 1. Drag-to-move: client sends PATCH /applications/:id {status: "INTERVIEW"}. Server validates the transition, updates the database, updates Redis counters, logs the transition in application_history. Bulk operations: POST /applications/bulk-update {application_ids: [...], new_status: "REJECTED"}. Process each update in a loop with the same validation and logging. Cap bulk updates at 100 per request to prevent timeout. Return a summary: {succeeded: 95, failed: 5, errors: [...]}.

Question 4

How do you handle resume storage and parsing for job applications?

Accepted Answer

Resume upload: accept PDF, DOCX, DOC formats (max 5MB). Upload directly to S3 via pre-signed URL (client uploads to S3 directly, not through your server -- avoids server bandwidth cost). Store the S3 key and metadata (filename, size, upload_time) in the Resume table. Pre-signed URL flow: client requests a pre-signed URL from your server. Server generates: s3.generate_presigned_url(method="PUT", bucket, key, expires=300). Client uploads the file directly to the S3 URL. Client notifies your server that the upload is complete (PUT /resumes/:id/confirm). Server moves the file from the temp location to the permanent bucket (or just marks it as confirmed). Resume parsing: use a parsing service (Textract for PDF, python-docx for DOCX) to extract structured data: contact info, work experience, education, skills. Store the parsed JSON alongside the original file. Index skills and titles in Elasticsearch for skills-based matching. Privacy: apply data retention policies -- delete resumes X months after an application closes (configurable per company, per legal jurisdiction). Anonymize during blind review (hide name, photo, address to reduce bias).

Question 5

How do you design salary data to ensure accuracy and usefulness?

Accepted Answer

Salary transparency: many job postings include salary ranges to attract candidates and comply with transparency laws (Colorado, New York, California). Data challenges: (1) Normalization: convert all salaries to annual figures (hourly * 2080, monthly * 12, weekly * 52). Store original value + unit + normalized annual. (2) Currency: store as amount + currency code (ISO 4217). For search/comparison: normalize to a base currency (USD) using a daily exchange rate cache. (3) Equity: separate base_salary_min/max from equity_percentage and equity_cliff/vesting. Total compensation (TC) = base + expected equity. (4) Range validation: if salary_min > salary_max: reject. If salary_min < $1/hour: reject (likely a data entry error). If salary range is implausibly wide (min=$50k, max=$500k for the same role): flag for review. Market data for job seekers: aggregate salary data from confirmed applications (with user consent). Show "typical salary for this role in this location" ranges. Anonymize: show P25, P50, P75 -- never expose individual salaries. Source compensation data from anonymous surveys (like levels.fyi for tech).

Low-Level Design: Job Board Platform — Job Listings, Search, Applications, and Recruiter Workflow

Core Entities

Job Search

Application State Machine

Job Alerts and Recommendations