System Design: DNS Architecture — Recursive Resolver, Authoritative Nameserver, Anycast, DNSSEC, DoH, Failover

DNS (Domain Name System) is the internet phone book — translating human-readable domain names into IP addresses. Every web request begins with a DNS lookup. Understanding DNS architecture is essential for system design (GeoDNS for multi-region routing, DNS failover for high availability) and for troubleshooting production issues (DNS resolution failures are a common cause of outages). This guide covers DNS internals from resolution to production patterns.

DNS Resolution Process

When a user types www.techinterview.org in their browser: (1) Browser cache — check if the domain was recently resolved. If cached and TTL has not expired, use the cached IP. (2) OS cache — the operating system maintains a DNS cache. On Linux: systemd-resolved or nscd. On macOS: mDNSResponder. (3) Recursive resolver — if not cached locally, the OS sends the query to a recursive resolver (typically the ISP DNS server, or a public resolver like 8.8.8.8 or 1.1.1.1). The recursive resolver does the heavy lifting. (4) Root nameserver — the resolver asks a root nameserver (13 root server addresses, operated by organizations like ICANN and Verisign) for the .org TLD nameserver. (5) TLD nameserver — the .org TLD nameserver returns the authoritative nameserver for techinterview.org (e.g., ns1.digitalocean.com). (6) Authoritative nameserver — the authoritative nameserver for techinterview.org returns the IP address (e.g., 67.207.83.166). (7) The recursive resolver caches the result (respecting TTL) and returns the IP to the client. Total time for a cold lookup: 50-200ms (multiple round-trips). With caching: near-instant.

DNS Record Types

Essential record types: (1) A record — maps a domain to an IPv4 address. techinterview.org -> 67.207.83.166. (2) AAAA record — maps to an IPv6 address. (3) CNAME record — alias to another domain. www.techinterview.org CNAME techinterview.org. The resolver follows the chain to the final A record. Cannot be at the zone apex (bare domain). (4) MX record — mail server for the domain. Priority and hostname. Lower priority number = higher preference. (5) TXT record — arbitrary text. Used for: SPF (email sender verification), DKIM (email signing), domain ownership verification (Google Search Console, SSL certificates). (6) NS record — delegates a subdomain to specific nameservers. Points to the authoritative nameservers for the domain. (7) SRV record — specifies host, port, priority, and weight for a service. Used by some protocols (SIP, XMPP) and by Consul for service discovery. (8) CAA record — specifies which Certificate Authorities can issue SSL certificates for the domain. Prevents unauthorized certificate issuance. TTL (Time To Live): how long resolvers cache the record. Short TTL (60s): fast failover but more DNS queries. Long TTL (3600s): fewer queries but slower propagation of changes. Most production sites use 300-3600 seconds.

Anycast DNS and High Availability

Anycast: multiple DNS servers share the same IP address. BGP routing directs each query to the nearest server based on network topology. If a server goes down, BGP reconverges and traffic routes to the next nearest server — automatic failover. All major DNS providers (Cloudflare, Route 53, Google Cloud DNS) use Anycast. The 13 root nameserver addresses each resolve to hundreds of actual servers via Anycast. DNS failover: health-check the primary server. If it fails, update the DNS record to point to the backup. Route 53 health checks: probe the endpoint every 10-30 seconds. If 3 consecutive checks fail, Route 53 stops returning the unhealthy IP. Failover time = detection time + DNS TTL propagation. With a 60-second TTL and 30-second health check interval: approximately 90-120 seconds. Round-robin DNS: return multiple A records. Clients try them in order. If the first IP is down, the client falls back to the next. Simple but unreliable (some clients always use the first record, clients do not health-check). Weighted routing: return different IPs with different weights. Send 90% of traffic to the primary and 10% to the canary. Adjust weights for gradual migration.

GeoDNS for Multi-Region Routing

GeoDNS returns different IP addresses based on the querier geographic location. A user in Europe resolves api.example.com to the EU datacenter IP. A user in Asia resolves to the APAC datacenter IP. Implementation: the DNS resolver IP address reveals the approximate location (ISPs have known IP ranges per region). The EDNS Client Subnet (ECS) extension passes the client subnet to the authoritative nameserver for more precise geolocation. Route 53 geolocation routing: define records per region. US users get the us-east-1 IP, EU users get eu-west-1, and a default record catches everything else. Latency-based routing: Route 53 measures latency from each region and routes to the lowest-latency endpoint. More accurate than geography alone (a user near a country border may be closer to a datacenter in the neighboring country). Use cases: (1) Multi-region active-active — route users to the nearest datacenter for low latency. (2) Data residency — ensure EU user data stays in EU servers. (3) CDN origin selection — route to the nearest origin server. (4) Disaster recovery failover — when the primary region is down, DNS routes all traffic to the backup region.

DNS Security: DNSSEC and DoH

DNS was designed without security. Attacks: (1) DNS spoofing/cache poisoning — an attacker injects a fake DNS response, redirecting users to a malicious IP. The resolver caches the fake response. (2) DNS hijacking — the attacker modifies DNS records at the registrar or nameserver (compromised credentials). (3) Man-in-the-middle — intercepting DNS queries on the network. DNSSEC (DNS Security Extensions): adds cryptographic signatures to DNS records. The authoritative nameserver signs records with a private key. Resolvers verify signatures using the public key (published in the parent zone DS record). If the signature does not match, the response is rejected. This prevents spoofing and cache poisoning. Limitation: DNSSEC does not encrypt queries — an observer can still see which domains you are resolving. DNS over HTTPS (DoH): encrypts DNS queries inside HTTPS. The resolver query is an HTTPS POST to a DoH server (e.g., https://1.1.1.1/dns-query). Benefits: prevents network observers from seeing DNS queries (privacy), prevents ISPs from intercepting/modifying DNS responses. Supported by Firefox, Chrome, and all major OS. DNS over TLS (DoT): similar to DoH but uses a dedicated TLS connection on port 853. Less firewall-friendly than DoH (which uses standard port 443).

Scroll to Top