Question 1

Why does DNS TTL tuning matter for deployments and failover scenarios?

Accepted Answer

TTL (Time To Live) controls how long resolvers cache a DNS record before re-querying the authoritative server. A high TTL (e.g., 86400s) reduces authoritative server load and speeds up resolution for clients, but means that after a record change — IP rotation, failover to a backup region — clients keep hitting the old address for up to TTL seconds. A low TTL (e.g., 60s) allows fast propagation of changes but increases query volume and can cause latency spikes if the authoritative server is slow. Best practice: lower TTL (to 60-300s) several hours before a planned migration so caches drain, then make the record change, then raise TTL again. For health-check-driven failover, TTLs of 30-60s are common, trading query cost for fast convergence.

Question 2

How does DNSSEC establish a chain of trust from the root zone down to a leaf zone?

Accepted Answer

DNSSEC uses asymmetric cryptography to sign DNS records. Each zone has a Zone Signing Key (ZSK) that signs its resource records and a Key Signing Key (KSK) that signs the ZSK. The parent zone stores a DS (Delegation Signer) record containing a hash of the child zone's KSK. The chain works as follows: the root zone's public key is a well-known trust anchor distributed with resolvers. The root signs its own DNSKEY records and publishes DS records for TLD zones. Each TLD signs its DNSKEY records and publishes DS records for second-level domains, and so on. A validating resolver walks this chain from the root trust anchor down to the queried zone, verifying each signature. If any link breaks (expired signature, key mismatch, missing DS record), the resolver returns SERVFAIL rather than an unvalidated answer.

Question 3

What are the limitations of DNS-based load balancing compared to application-level load balancing?

Accepted Answer

DNS-based load balancing returns multiple A/AAAA records or rotates records (round-robin DNS) to distribute traffic. Limitations: (1) Client caching — clients and intermediate resolvers cache the DNS response for the TTL duration and keep hitting the same IP, defeating rotation. (2) No health awareness — DNS cannot remove an IP from rotation the moment a backend fails; it depends on external health checks and TTL expiry. (3) No connection-level visibility — DNS cannot balance based on active connection count, CPU load, or response latency. (4) Sticky sessions break — if a client resolves a different IP on re-query, session affinity is lost. Application-level load balancers (L4/L7) operate on every connection or request, can drain connections gracefully, perform active health checks with sub-second failover, inspect HTTP headers for routing, and maintain session persistence — none of which DNS can do.

Question 4

How does GeoDNS route traffic to regional endpoints based on client location?

Accepted Answer

GeoDNS inspects the source IP of the DNS query (or the EDNS Client Subnet extension, which forwards a prefix of the client's IP through recursive resolvers) and looks it up in a GeoIP database to determine the client's country, region, or ASN. The authoritative server then returns different record sets for different geographic buckets — e.g., US clients get the us-east-1 load balancer IP, EU clients get eu-west-1. This routes users to their nearest region without requiring anycast or application-layer redirection. Limitations: GeoIP accuracy varies (VPNs, satellite ISPs, CGNAT all skew location); recursive resolvers may be geographically distant from the actual client, causing misrouting; EDNS Client Subnet improves accuracy but not all resolvers send it. GeoDNS is typically combined with health checks so that if a regional endpoint fails, its IPs are replaced by a fallback region's IPs.

Question 5

How does DNS negative caching work, and what is the impact of NXDOMAIN responses?

Accepted Answer

Negative caching (RFC 2308) allows resolvers to cache the fact that a name does not exist (NXDOMAIN) or that a record type does not exist for a name (NOERROR with empty answer). The cache duration is the minimum of the SOA record's MINIMUM field and the SOA TTL returned in the authority section of the negative response. This prevents repeated queries to authoritative servers for non-existent names, which matters for typo-driven traffic and non-existent subdomains. The impact of NXDOMAIN: if a new hostname is provisioned but the negative cache has not expired, clients continue receiving NXDOMAIN even though the record now exists — a problem during rapid DNS provisioning. Additionally, wildcard DNS records interact with negative caching: a wildcard prevents NXDOMAIN for covered names, so resolvers never cache a negative for those names. High NXDOMAIN rates in logs also indicate misconfigured clients or DNS reconnaissance attacks.

Low Level Design: DNS Resolution System

DNS Hierarchy: Root, TLD, and Authoritative Nameservers

Recursive Resolver: Stub Resolvers, Cache-First Lookup, and Root Priming

Authoritative Nameserver: Zone Files, SOA Records, and Zone Transfer

Caching and TTL: Negative Caching, TTL Tuning, and Propagation Tradeoffs

Anycast for Authoritative Nameservers: BGP Announcement and Stateless Protocol Benefits

DNSSEC Validation: Signatures, Key Hierarchy, and Authenticated Denial

DNS-Based Load Balancing: Round-Robin, Weighted Records, GeoDNS, and Health Checks

DNS over HTTPS and DNS over TLS: Encrypted Resolution and Privacy