Hash Table and HashMap Interview Deep Dive

⏱ 4 min read

Why Hash Tables?

Hash tables provide O(1) average-case lookup, insertion, and deletion — making them the most versatile data structure in coding interviews. Python’s dict, Java’s HashMap, JavaScript’s Object/Map, and Go’s map all use hash tables. Understanding internals (hash functions, collision handling, resizing) distinguishes strong candidates. Pattern recognition (two-sum, anagram grouping, sliding window with frequency map) is equally important.

How Hash Tables Work Internally

Hash Function

A hash function maps a key to an integer bucket index. Properties of a good hash function: (1) deterministic — same key always gives same hash; (2) uniform distribution — keys spread evenly across buckets to minimize collisions; (3) fast — O(1) to compute. For strings: djb2, FNV, or MurmurHash. Python’s built-in hash() uses SipHash (cryptographic seed per process to prevent hash flooding DoS attacks). The bucket index: index = hash(key) % num_buckets.

Collision Resolution: Separate Chaining

Each bucket holds a linked list (or array) of all entries that hash to that bucket. On collision, append to the list. On lookup, hash to the bucket, then scan the list for the key. Java’s HashMap uses chaining with a tree (TreeMap/red-black tree) when a bucket has more than 8 entries, to prevent O(N) worst-case from many collisions in one bucket.


class HashMap:
    def __init__(self, capacity=16):
        self.capacity = capacity
        self.buckets = [[] for _ in range(capacity)]
        self.size = 0

    def _bucket(self, key):
        return hash(key) % self.capacity

    def put(self, key, value):
        b = self._bucket(key)
        for i, (k, v) in enumerate(self.buckets[b]):
            if k == key:
                self.buckets[b][i] = (key, value)  # update
                return
        self.buckets[b].append((key, value))
        self.size += 1
        if self.size / self.capacity > 0.75:
            self._resize()

    def get(self, key):
        for k, v in self.buckets[self._bucket(key)]:
            if k == key:
                return v
        return None  # not found

    def _resize(self):
        old_buckets = self.buckets
        self.capacity *= 2
        self.buckets = [[] for _ in range(self.capacity)]
        self.size = 0
        for bucket in old_buckets:
            for k, v in bucket:
                self.put(k, v)

Collision Resolution: Open Addressing

Instead of chaining, open addressing stores all entries in the same array. On collision, probe for the next empty slot. Linear probing: try index+1, index+2, etc. Quadratic probing: try index+1², index+2², etc. Double hashing: try index + i×hash2(key). Python’s dict uses open addressing with a combination of probing strategies. Open addressing is cache-friendly (no pointer chasing) but degrades badly at high load factors — never exceed 70-80% full.

Load Factor and Resizing

Load factor = entries / capacity. Java’s HashMap default: 0.75 — resize when 75% full. Resize: double capacity, rehash all entries. Rehashing is O(N) but amortized O(1) per insertion (doubling means each element is rehashed at most O(log N) times across its lifetime). Python’s dict resizes when load factor exceeds 2/3 (≈0.67).

Interview Coding Patterns

Two Sum (Classic)


def two_sum(nums, target):
    seen = {}  # value → index
    for i, n in enumerate(nums):
        complement = target - n
        if complement in seen:
            return [seen[complement], i]
        seen[n] = i
    return []
# O(N) time, O(N) space — single pass

Longest Substring Without Repeating Characters


def length_of_longest_substring(s: str) -> int:
    last_seen = {}  # char → last index
    max_len = left = 0
    for right, c in enumerate(s):
        if c in last_seen and last_seen[c] >= left:
            left = last_seen[c] + 1  # shrink window past last occurrence
        last_seen[c] = right
        max_len = max(max_len, right - left + 1)
    return max_len
# O(N) time, O(min(N, charset)) space
# LeetCode 3

Top K Frequent Elements


import heapq
from collections import Counter

def top_k_frequent(nums, k):
    freq = Counter(nums)
    # Min-heap of size k: O(N log k)
    return heapq.nlargest(k, freq.keys(), key=freq.get)

    # Alternative: bucket sort O(N)
    # buckets[i] = list of elements with frequency i
    # Iterate from high to low, collect until k elements

Group Anagrams


from collections import defaultdict

def group_anagrams(strs):
    groups = defaultdict(list)
    for s in strs:
        groups[tuple(sorted(s))].append(s)
    return list(groups.values())
# O(N × K log K) where K = max string length
# LeetCode 49

Subarray Sum Equals K


def subarray_sum(nums, k) -> int:
    count = prefix = 0
    freq = {0: 1}
    for n in nums:
        prefix += n
        count += freq.get(prefix - k, 0)
        freq[prefix] = freq.get(prefix, 0) + 1
    return count
# The key insight: prefix[j] - prefix[i] = k ↔ prefix[i] = prefix[j] - k
# LeetCode 560

LRU Cache (LinkedHashMap)


from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity: int):
        self.cap = capacity
        self.cache = OrderedDict()  # maintains insertion order

    def get(self, key: int) -> int:
        if key not in self.cache:
            return -1
        self.cache.move_to_end(key)  # mark as recently used
        return self.cache[key]

    def put(self, key: int, value: int) -> None:
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        if len(self.cache) > self.cap:
            self.cache.popitem(last=False)  # evict LRU (front)
# LeetCode 146

HashMap vs HashSet vs Counter

HashMap/dict: key-value pairs — use when you need to associate a value with each key
HashSet/set: unique keys only, O(1) membership test — use when you need “have I seen this before?”
Counter (Python) / frequency map: specialized HashMap counting occurrences — use for anagram/frequency problems
defaultdict(list): HashMap with default value — avoids KeyError on first access; ideal for grouping

Key Interview Takeaways

Hash tables are O(1) average for get/put/delete; O(N) worst case (all keys in one bucket)
Java HashMap: chaining → TreeMap (8+ entries per bucket); Python dict: open addressing with probing
Load factor 0.75 triggers resize (double capacity, rehash all) — amortized O(1) insertion
Two-sum: store complement in map as you scan; one pass, O(N)
Sliding window with character map: shrink from left when a duplicate enters the window
LRU Cache = HashMap + doubly-linked list (OrderedDict wraps both in Python)