Question 1

What are the four types of DNS servers and what does each do?

Accepted Answer

DNS resolution involves four types of servers: (1) DNS Resolver (Recursive Resolver): provided by ISP or public (8.8.8.8, 1.1.1.1). Receives queries from clients. Does the recursive work of querying root, TLD, and authoritative servers. Caches results. The client only talks to this server. (2) Root Name Server: 13 root server clusters (A through M). Knows where the TLD name servers are (.com, .org, .io). Does not know actual IP addresses of hosts. (3) TLD (Top-Level Domain) Name Server: manages a top-level domain (.com, .net, .io). Knows the authoritative name servers for each registered domain under the TLD. (4) Authoritative Name Server: holds the actual DNS records for a domain (A, CNAME, MX, etc.). Managed by the domain owner or their DNS provider (Cloudflare, Route 53). Returns the final answer. The recursive resolver walks this chain on a cache miss.

Question 2

What is the difference between an A record, CNAME, and ALIAS record?

Accepted Answer

A record: maps a hostname directly to an IPv4 address. Example: api.example.com → 1.2.3.4. Simple, direct. Can have multiple A records for the same name (round-robin load balancing). CNAME (Canonical Name): maps a hostname to another hostname. Example: www.example.com → example.com. The resolver follows the chain until it reaches an A/AAAA record. Restriction: a CNAME cannot coexist with other record types for the same name, and you cannot put a CNAME at the zone apex (example.com itself) — only on subdomains (www.example.com). ALIAS (or ANAME): a non-standard record supported by Route 53 and Cloudflare. Behaves like a CNAME but can exist at the zone apex. Route 53 resolves the alias target and returns its IP addresses — transparent to the resolver. Use case: pointing example.com to an ELB (which only has a hostname, not an IP).

Question 3

How does DNS TTL affect failover and system reliability?

Accepted Answer

TTL (Time-to-Live) specifies how many seconds a DNS record can be cached. Impact on failover: if you change a DNS record (old IP → new IP), clients that have the old record cached won't see the change until their TTL expires. With TTL=3600 (1 hour), failover to a new IP takes up to 1 hour to propagate globally. Lower TTL = faster propagation, more DNS queries (cost, latency). Best practice: before a planned migration or during incident preparation, lower TTL to 60s at least 24 hours in advance (to let existing 3600s caches expire). After the change propagates, restore to 3600s. Automatic failover with health checks: Route 53 health checks can automatically switch a record from unhealthy to backup IP. With TTL=60s, failover is near-instant (clients re-resolve within 60s). For microservices internal DNS: TTL=30s or lower to enable fast instance rotation. Java applications: set networkaddress.cache.ttl=30 to respect DNS TTL (Java caches DNS indefinitely by default).

Question 4

How does GeoDNS enable multi-region routing and CDN acceleration?

Accepted Answer

GeoDNS returns different DNS answers based on the geographic location of the DNS query. Authoritative name server detects the client's approximate location (via EDNS0 Client Subnet extension or the recursive resolver's IP). Returns the IP of the nearest datacenter or CDN PoP. Example: US client → 1.2.3.4 (US-East), EU client → 5.6.7.8 (EU-West). Benefits: (1) Latency reduction — user connects to the nearest server (50ms instead of 200ms). (2) Traffic distribution — organic geo-based load balancing. (3) Compliance — EU traffic can be routed to EU datacenters to satisfy GDPR data residency. (4) CDN acceleration — CDNs (Cloudflare, Akamai) use GeoDNS to route users to the nearest edge PoP. Implementation: major DNS providers (Route 53, Cloudflare, NS1) support GeoDNS natively. Pair with health checks: if the nearest region is down, route to the next nearest.

Question 5

How do you design a system to handle DNS failures and cache misses gracefully?

Accepted Answer

DNS failures can cascade: if your application makes DNS lookups on every connection (e.g., re-resolving a microservice hostname on each request), a DNS outage kills the service. Resilience patterns: (1) Application-level DNS caching: cache resolved IPs in memory with a TTL (respect DNS TTL, default to 30-60s). Libraries: Go's net package caches by default; Java requires explicit configuration (networkaddress.cache.ttl). (2) Connection pooling: maintain persistent TCP connections (HTTP/2, gRPC) — no DNS lookup per request. (3) Retry with backoff: on DNS resolution failure, retry 3 times with exponential backoff before failing. (4) Fallback to stale cache: if DNS is unreachable, serve the last cached IP. Stale is better than no IP. (5) Avoid DNS in hot paths: resolve service IPs at startup and on periodic refresh, not on each request. (6) Internal service mesh (Envoy, Istio): uses service registry instead of DNS for inter-service discovery — eliminates DNS from the microservice call path.

DNS Resolver and DNS System Design

What Does a DNS Resolver Do?

DNS Resolution Hierarchy

DNS Record Types

TTL and Caching Strategy

DNS-Based Load Balancing

GeoDNS

DNS Caching Design (Recursive Resolver)

Key Design Decisions