DNS (Domain Name System) is the distributed hierarchical database that translates human-readable domain names (techinterview.org) into IP addresses (67.207.83.166). Every internet connection begins with a DNS lookup. Understanding DNS internals — resolution chain, caching, TTLs, record types, and failure modes — is essential for system design interviews and for debugging production issues where “it works locally but not in production” is often a DNS or TTL problem.
Resolution Chain: Recursive and Iterative Queries
A DNS query for api.techinterview.org follows this chain: (1) Check local OS cache and /etc/hosts. (2) Query the configured recursive resolver (usually provided by ISP or 8.8.8.8/1.1.1.1). (3) The recursive resolver checks its cache — cache hit returns immediately. (4) On cache miss, the resolver queries a root nameserver (one of 13 root server groups), which returns NS records for .org. (5) Query the .org TLD nameserver, which returns NS records for techinterview.org. (6) Query the authoritative nameserver for techinterview.org, which returns the A record (IPv4 address). The recursive resolver caches and returns the answer.
# Trace DNS resolution with dig
dig +trace api.techinterview.org
# Response shows the full chain:
# . (root) -> .org NS -> techinterview.org NS -> A record
# Key DNS record types:
# A : IPv4 address (api.example.com -> 1.2.3.4)
# AAAA : IPv6 address
# CNAME : canonical name alias (www -> apex domain)
# MX : mail exchange (for email routing)
# NS : nameserver delegation
# TXT : arbitrary text (SPF, DKIM, domain verification)
# SOA : start of authority (zone metadata, TTL defaults)
# SRV : service location (host:port for service discovery)
# Check TTL remaining on a record:
dig api.techinterview.org +noall +answer
# api.techinterview.org. 299 IN A 67.207.83.166
# ^^^ TTL remaining in seconds
TTL and Propagation Delay
Each DNS record has a Time-To-Live (TTL) in seconds. Recursive resolvers cache records for the TTL duration and return cached answers without re-querying the authoritative server. Low TTL (60-300s): faster propagation when you change a record, but more queries to authoritative servers. High TTL (3600-86400s): fewer queries, but changes take up to TTL seconds to propagate globally. Best practice for planned IP changes: lower TTL to 60s 48 hours before the change, change the IP, then raise TTL back after all resolvers have picked up the new record. Negative TTL (NXDOMAIN caching): how long to cache “domain does not exist” responses — configured in the SOA record.
DNS for Load Balancing and High Availability
DNS can distribute traffic across multiple servers with multiple A records for the same name. Resolvers typically round-robin across all A records (DNS round-robin load balancing). Limitations: no health checking — if one server is down, DNS still returns its IP. No session stickiness — clients may get different IPs on each resolution. Anycast DNS: multiple servers share the same IP address; BGP routing directs traffic to the geographically closest server. Used by Cloudflare (1.1.1.1), Google (8.8.8.8), and major CDNs. GeoDNS: return different IPs based on the resolver location — US queries get US-east IPs, EU queries get EU-west IPs.
Service Discovery via DNS
Kubernetes uses DNS for internal service discovery. Each Service gets a DNS name: service-name.namespace.svc.cluster.local. CoreDNS (the Kubernetes DNS server) resolves this to the Service ClusterIP. SRV records enable port-aware service discovery: _http._tcp.service.namespace.svc.cluster.local returns the host and port. Consul uses DNS as a service registry interface: web.service.consul returns healthy instances of the web service. This allows applications to use standard DNS lookups for service discovery without changing DNS clients — only the authoritative server changes.
Key Interview Discussion Points
- DNS caching layers: browser cache, OS cache, recursive resolver cache — a TTL change may take up to max(browser TTL, OS TTL, resolver TTL) to propagate
- DNSSEC: cryptographic signatures on DNS records prevent cache poisoning attacks (Kaminsky attack); adds overhead to resolution but prevents spoofed DNS responses
- Split-horizon DNS: return different DNS records based on whether the query comes from inside or outside the corporate network — internal services resolve to private IPs, external clients get public IPs
- DNS failover: health check + TTL reduction + failover IP update — widely used for cross-region DR (AWS Route53 health checks, Cloudflare health checks)
- DoH/DoT (DNS over HTTPS/TLS): encrypts DNS queries to prevent ISP surveillance and manipulation — Chrome, Firefox, and iOS now support DoH by default