DNS Resolver and DNS System Design

What Does a DNS Resolver Do?

The Domain Name System (DNS) translates human-readable domain names (www.example.com) into IP addresses (93.184.216.34). A DNS resolver is the client-side component that performs this translation by querying a hierarchy of DNS servers. Understanding DNS is essential for system design: it underpins all internet communication, affects latency, and is the mechanism behind load balancing, CDN routing, and failover.

DNS Resolution Hierarchy

Client query: "What is the IP of www.example.com?"

1. Check local DNS cache (OS + browser) → HIT: return immediately
2. Query Recursive Resolver (provided by ISP or 8.8.8.8)
   a. Recursive resolver checks its cache → MISS
   b. Query Root Name Server ("Who handles .com?") → returns TLD NS address
   c. Query TLD Name Server (.com) ("Who handles example.com?") → returns Authoritative NS
   d. Query Authoritative Name Server for example.com → returns A record: 93.184.216.34
   e. Recursive resolver caches result (TTL from record)
3. Return IP to client; client caches (TTL)

Full resolution (cache miss): 4 round trips, 50-200ms. Cached resolution: <1ms.

DNS Record Types

  • A: maps hostname to IPv4 address. api.example.com → 1.2.3.4
  • AAAA: maps hostname to IPv6 address
  • CNAME: canonical name alias. www.example.com → example.com (follow the chain)
  • MX: mail exchange — which servers handle email for the domain
  • TXT: arbitrary text — used for SPF, DKIM, domain verification
  • NS: authoritative name servers for the domain
  • SOA: Start of Authority — primary NS, admin email, serial number, refresh intervals
  • SRV: service location — host + port for a specific service (used by SIP, XMPP)

TTL and Caching Strategy

TTL (Time-to-Live) controls how long a DNS record is cached. Tradeoffs: short TTL (30s–5min) enables fast failover and DNS-based load balancing updates. Long TTL (1h–24h) reduces DNS lookup latency and load on name servers. Production guidelines:

  • Stable records (CDN origins, mail servers): TTL=3600 (1 hour)
  • Before planned changes: lower TTL to 60s a day in advance
  • After changes propagate: restore TTL to 3600
  • Internal service discovery: TTL=30s for fast failover

DNS-Based Load Balancing

Multiple A records for the same hostname with round-robin resolution. api.example.com → [1.2.3.4, 5.6.7.8]. Clients receive different IPs on successive queries. Simple but limited: no health checks (DNS cannot detect a dead server), no session affinity, TTL-limited update speed. Used for broad traffic distribution. For production: combine with a load balancer (DNS points to LB VIP; LB does health checking and routing).

GeoDNS

Return different DNS answers based on client location. Client in US → 1.2.3.4 (US datacenter). Client in EU → 5.6.7.8 (EU datacenter). Implemented at the authoritative name server using client subnet extension (EDNS0 Client Subnet). Used by CDNs and multi-region architectures. TTL must be short (60s) to enable fast re-routing on failover.

DNS Caching Design (Recursive Resolver)

class DNSCache:
    def __init__(self):
        self.cache = {}  # {(name, type): (records, expiry)}

    def get(self, name, rtype):
        key = (name, rtype)
        if key in self.cache:
            records, expiry = self.cache[key]
            if time.time() < expiry:
                return records  # cache hit
            del self.cache[key]  # expired
        return None  # cache miss

    def put(self, name, rtype, records, ttl):
        self.cache[(name, rtype)] = (records, time.time() + ttl)

Production DNS caches use a hash table with LRU eviction. Negative caching (NXDOMAIN): cache the non-existence of a record for the SOA TTL. Prevents repeated queries for non-existent domains.

Key Design Decisions

  • Hierarchical resolution with caching at each level — O(1) average with warm cache
  • Short TTL before planned changes — enables fast failover
  • GeoDNS for latency-based routing — users reach their nearest datacenter
  • DNS is not for session affinity — use a load balancer for sticky sessions
  • Negative caching — prevents repeated NXDOMAIN queries from slow applications

Google system design covers DNS and distributed naming. See common questions for Google interview: DNS and distributed naming system design.

Amazon system design covers DNS and Route 53. Review patterns for Amazon interview: Route 53 and DNS system design.

Atlassian system design covers DNS and service discovery. See design patterns for Atlassian interview: DNS and service discovery design.

Scroll to Top