Why Hash Tables?
Hash tables provide O(1) average-case lookup, insertion, and deletion — making them the most versatile data structure in coding interviews. Python’s dict, Java’s HashMap, JavaScript’s Object/Map, and Go’s map all use hash tables. Understanding internals (hash functions, collision handling, resizing) distinguishes strong candidates. Pattern recognition (two-sum, anagram grouping, sliding window with frequency map) is equally important.
How Hash Tables Work Internally
Hash Function
A hash function maps a key to an integer bucket index. Properties of a good hash function: (1) deterministic — same key always gives same hash; (2) uniform distribution — keys spread evenly across buckets to minimize collisions; (3) fast — O(1) to compute. For strings: djb2, FNV, or MurmurHash. Python’s built-in hash() uses SipHash (cryptographic seed per process to prevent hash flooding DoS attacks). The bucket index: index = hash(key) % num_buckets.
Collision Resolution: Separate Chaining
Each bucket holds a linked list (or array) of all entries that hash to that bucket. On collision, append to the list. On lookup, hash to the bucket, then scan the list for the key. Java’s HashMap uses chaining with a tree (TreeMap/red-black tree) when a bucket has more than 8 entries, to prevent O(N) worst-case from many collisions in one bucket.
class HashMap:
def __init__(self, capacity=16):
self.capacity = capacity
self.buckets = [[] for _ in range(capacity)]
self.size = 0
def _bucket(self, key):
return hash(key) % self.capacity
def put(self, key, value):
b = self._bucket(key)
for i, (k, v) in enumerate(self.buckets[b]):
if k == key:
self.buckets[b][i] = (key, value) # update
return
self.buckets[b].append((key, value))
self.size += 1
if self.size / self.capacity > 0.75:
self._resize()
def get(self, key):
for k, v in self.buckets[self._bucket(key)]:
if k == key:
return v
return None # not found
def _resize(self):
old_buckets = self.buckets
self.capacity *= 2
self.buckets = [[] for _ in range(self.capacity)]
self.size = 0
for bucket in old_buckets:
for k, v in bucket:
self.put(k, v)
Collision Resolution: Open Addressing
Instead of chaining, open addressing stores all entries in the same array. On collision, probe for the next empty slot. Linear probing: try index+1, index+2, etc. Quadratic probing: try index+1², index+2², etc. Double hashing: try index + i×hash2(key). Python’s dict uses open addressing with a combination of probing strategies. Open addressing is cache-friendly (no pointer chasing) but degrades badly at high load factors — never exceed 70-80% full.
Load Factor and Resizing
Load factor = entries / capacity. Java’s HashMap default: 0.75 — resize when 75% full. Resize: double capacity, rehash all entries. Rehashing is O(N) but amortized O(1) per insertion (doubling means each element is rehashed at most O(log N) times across its lifetime). Python’s dict resizes when load factor exceeds 2/3 (≈0.67).
Interview Coding Patterns
Two Sum (Classic)
def two_sum(nums, target):
seen = {} # value → index
for i, n in enumerate(nums):
complement = target - n
if complement in seen:
return [seen[complement], i]
seen[n] = i
return []
# O(N) time, O(N) space — single pass
Longest Substring Without Repeating Characters
def length_of_longest_substring(s: str) -> int:
last_seen = {} # char → last index
max_len = left = 0
for right, c in enumerate(s):
if c in last_seen and last_seen[c] >= left:
left = last_seen[c] + 1 # shrink window past last occurrence
last_seen[c] = right
max_len = max(max_len, right - left + 1)
return max_len
# O(N) time, O(min(N, charset)) space
# LeetCode 3
Top K Frequent Elements
import heapq
from collections import Counter
def top_k_frequent(nums, k):
freq = Counter(nums)
# Min-heap of size k: O(N log k)
return heapq.nlargest(k, freq.keys(), key=freq.get)
# Alternative: bucket sort O(N)
# buckets[i] = list of elements with frequency i
# Iterate from high to low, collect until k elements
Group Anagrams
from collections import defaultdict
def group_anagrams(strs):
groups = defaultdict(list)
for s in strs:
groups[tuple(sorted(s))].append(s)
return list(groups.values())
# O(N × K log K) where K = max string length
# LeetCode 49
Subarray Sum Equals K
def subarray_sum(nums, k) -> int:
count = prefix = 0
freq = {0: 1}
for n in nums:
prefix += n
count += freq.get(prefix - k, 0)
freq[prefix] = freq.get(prefix, 0) + 1
return count
# The key insight: prefix[j] - prefix[i] = k ↔ prefix[i] = prefix[j] - k
# LeetCode 560
LRU Cache (LinkedHashMap)
from collections import OrderedDict
class LRUCache:
def __init__(self, capacity: int):
self.cap = capacity
self.cache = OrderedDict() # maintains insertion order
def get(self, key: int) -> int:
if key not in self.cache:
return -1
self.cache.move_to_end(key) # mark as recently used
return self.cache[key]
def put(self, key: int, value: int) -> None:
if key in self.cache:
self.cache.move_to_end(key)
self.cache[key] = value
if len(self.cache) > self.cap:
self.cache.popitem(last=False) # evict LRU (front)
# LeetCode 146
HashMap vs HashSet vs Counter
- HashMap/dict: key-value pairs — use when you need to associate a value with each key
- HashSet/set: unique keys only, O(1) membership test — use when you need “have I seen this before?”
- Counter (Python) / frequency map: specialized HashMap counting occurrences — use for anagram/frequency problems
- defaultdict(list): HashMap with default value — avoids KeyError on first access; ideal for grouping
Key Interview Takeaways
- Hash tables are O(1) average for get/put/delete; O(N) worst case (all keys in one bucket)
- Java HashMap: chaining → TreeMap (8+ entries per bucket); Python dict: open addressing with probing
- Load factor 0.75 triggers resize (double capacity, rehash all) — amortized O(1) insertion
- Two-sum: store complement in map as you scan; one pass, O(N)
- Sliding window with character map: shrink from left when a duplicate enters the window
- LRU Cache = HashMap + doubly-linked list (OrderedDict wraps both in Python)