Question 1

What is SSRF and how does it specifically threaten a link preview service?

Accepted Answer

Server-Side Request Forgery (SSRF) occurs when an attacker causes your server to make HTTP requests to unintended targets by supplying crafted URLs. A link preview service is a natural SSRF target: it fetches arbitrary user-supplied URLs. Attacks: (1) http://169.254.169.254/latest/meta-data/iam/security-credentials/ — AWS metadata endpoint returns IAM credentials, giving the attacker full cloud access; (2) http://redis:6379/ — internal Redis instance responds to raw HTTP with error messages revealing topology; (3) http://10.0.0.1/admin — internal admin panel; (4) file:///etc/passwd — file:// scheme reads local files. Defense: resolve the URL's hostname to an IP address before fetching and block private ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 127.0.0.0/8, 169.254.0.0/16). Also block non-http(s) schemes. Check after every redirect — a redirect from a public URL to a private IP is a common bypass.

Question 2

How does URL normalization increase cache hit rate?

Accepted Answer

The same article might be shared with different URLs: https://Example.com/post?id=1&utm_source=twitter vs https://example.com/post?id=1&utm_medium=email. Without normalization, both are cache misses, both trigger an outbound HTTP fetch. Normalized, they both reduce to https://example.com/post?id=1 — the same cache key. Normalization steps: lowercase scheme and hostname, remove fragment (#section — not sent to server), strip known tracking parameters (utm_*, fbclid, gclid), sort remaining query parameters alphabetically, remove trailing slash inconsistencies. This can double cache hit rates for URLs shared via social media (which always append utm_* parameters). Hash the normalized URL with SHA-256 for a fixed-length primary key.

Question 3

How do you handle slow or hanging origin servers?

Accepted Answer

An origin server that accepts the TCP connection but never sends data will hang the worker thread for the full timeout duration. Mitigations: (1) Set a connect timeout (5 seconds) and a read timeout (5 seconds) separately — requests library: requests.get(url, timeout=(5, 5)). (2) Use streaming mode (stream=True) and read only the first 1 MiB — large pages and binary files are not useful for preview. (3) Run fetches in a worker pool — if one fetch hangs, other workers continue. (4) Use a circuit breaker per domain: if example.com fails 5 times in 60 seconds, stop fetching from it for 10 minutes. Store blocked domains in Redis: SET blocked:domain:example.com EX 600. (5) Set maximum redirects (3) to prevent redirect loops.

Question 4

What is the async fetch pattern and why does it improve user experience?

Accepted Answer

Synchronous fetch: user sends message with URL → server fetches URL (up to 5 seconds) → message sent. The user waits up to 5 seconds for their message to be delivered. Asynchronous fetch: user sends message with URL → message sent immediately with preview status='pending' → background job fetches URL → pushes preview data via WebSocket when ready. The user sees their message instantly; the preview populates within 1–3 seconds on fast origins without blocking the send flow. Prefetch optimization: trigger the fetch when the user pastes the URL in the composer (before send). By the time they click Send, the preview is often already cached. This hides the latency entirely for typical typing cadence. Store previews keyed by normalized URL hash — any future message with the same URL gets an instant cached preview.

Question 5

Why proxy og:image URLs instead of embedding third-party image URLs directly?

Accepted Answer

Embedding a third-party image URL directly creates four problems: (1) Mixed content: if the og:image is http:// and your page is https://, modern browsers block the image. (2) Privacy: the third-party server logs your users' IP addresses when their browser loads the image. (3) Broken images: if the origin deletes or moves the image (common), your preview shows a broken image icon indefinitely. (4) Tracking pixels: the image request can contain tracking parameters. Proxy solution: fetch the image from your server during preview generation, store it in S3 or your CDN, and embed your CDN URL in the preview. Now: the image is always https://, your users' IPs are not exposed, the image persists in your cache even if origin deletes it, and tracking parameters are stripped. Apply the same SSRF validation to og:image URLs.

Link Preview Service Low-Level Design: SSRF Prevention, Open Graph Parsing, and Caching

Core Data Model

SSRF Prevention and URL Validation

Fetching and Parsing Metadata

Key Interview Points