Question 1

How does a hash table achieve O(1) average lookup time?

Accepted Answer

A hash table stores key-value pairs in an array. To look up a key: (1) compute hash(key) to get an integer, (2) map to an array index: index = hash % array_size, (3) retrieve the value at that index. With a good hash function that distributes keys uniformly and a load factor below 0.75, most array slots have 0 or 1 entries. The lookup is a single array access: O(1). When collisions occur (two keys map to the same index), resolution adds a small cost: with chaining, traverse a short linked list (average length = load_factor, typically  1. Cons: pointer overhead, cache-unfriendly (list nodes scattered in memory). Used by: Java HashMap, Go map. Open addressing: all entries stored in the array itself. On collision, probe for the next available slot (linear probing: check i, i+1, i+2; quadratic: i, i+1, i+4, i+9; double hashing: i, i+h2(key), i+2*h2(key)). Lookup probes until finding the key or an empty slot. Deletion uses tombstones (mark as deleted, continue probing past). Pros: cache-friendly (contiguous memory), no pointer overhead. Cons: load factor must stay below ~0.7, clustering with linear probing. Used by: Python dict, Rust HashMap. Choice: open addressing is faster for small-to-medium tables (cache efficiency). Chaining is simpler and handles high load factors better.

Question 2

What happens when a hash table needs to resize?

Accepted Answer

When the load factor (entries/array_size) exceeds a threshold (0.75 for Java, 2/3 for Python), the table resizes to maintain O(1) performance. Process: (1) allocate a new array with double the capacity, (2) rehash every entry (compute new index = hash % new_capacity), (3) insert each entry into the new array. Rehashing is O(N) -- every entry is moved. However, since the table doubles each time, each entry is rehashed O(log N) times across all resizings. The amortized cost per insert remains O(1). Important for interviews: insert is O(1) amortized but O(N) worst case (during rehash). For real-time systems, this spike is problematic. Solutions: incremental rehashing (Redis does this -- moves a few entries per operation instead of all at once), or pre-allocate with known maximum size. Some implementations also shrink when load factor drops below 0.25 to reclaim memory.

Question 3

What makes a good hash function for a hash table?

Accepted Answer

Properties: (1) Uniform distribution -- keys spread evenly across buckets. A function that clusters outputs wastes buckets and creates long chains. (2) Deterministic -- same input always produces the same hash. (3) Fast -- O(1) for fixed-size keys, O(L) for variable-length keys (strings). Hashing should not be the bottleneck. (4) Avalanche effect -- small input changes produce vastly different hashes. This prevents similar keys from clustering. Common approaches: for integers -- multiplication method or modular arithmetic with a prime table size. For strings -- polynomial rolling hash (h = h * 31 + char, used by Java String.hashCode). For general use -- MurmurHash3 or xxHash (fast, excellent distribution). Security consideration: predictable hash functions enable hash flooding DoS attacks (attacker crafts keys that all collide). Python randomizes the hash seed per process. Java 8+ converts long chains to trees. In interviews, know that hash function quality directly affects performance -- a bad hash function turns O(1) into O(N).

Coding Interview: Hash Table Internals — Hash Functions, Collision Resolution, Chaining, Open Addressing, Rehashing

How Hash Tables Work

Hash Functions

Collision Resolution: Chaining

Collision Resolution: Open Addressing

Rehashing and Dynamic Resizing