Question 1

How does a polynomial rolling hash work for strings?

Accepted Answer

A polynomial rolling hash maps a string to an integer: hash(s) = s[0]*p^(L-1) + s[1]*p^(L-2) + ... + s[L-1]*p^0 (mod M). Common choices: p=31 for lowercase letters, p=37 for uppercase+lowercase, M=10^9+7 (prime). Each character maps to its position in the alphabet (a=1, b=2, ...). The polynomial representation ensures different strings map to different values with high probability. For substring hashing in O(1): build a prefix hash array where prefix[i] = sum of character contributions up to index i. Query hash(s[l..r]) = (prefix[r+1] - prefix[l] * p_pow[r-l+1]) mod M. This enables comparing any two substrings in O(1) after O(n) preprocessing.

Question 2

How does Rabin-Karp use rolling hash for pattern matching?

Accepted Answer

Rabin-Karp computes the hash of the pattern and the hash of each window of length m in the text. If hashes match, verify with direct string comparison (to handle hash collisions). The key optimization: computing the hash of the next window from the current window takes O(1) using the rolling hash formula: remove the contribution of the leftmost character, shift all remaining characters, add the new rightmost character. Without rolling: recomputing each window hash from scratch takes O(m) per window → O(nm) total. With rolling: O(1) per window → O(n+m) total. Use case: finding all occurrences of pattern P in text T, or finding if any of k patterns appears in T (compute all k pattern hashes, check each window against the set).

Question 3

How do you find the longest duplicate substring using binary search and hashing?

Accepted Answer

Binary search on the length L: if there exists a duplicate substring of length L, then there also exists one of length L-1 (any duplicate of length L contains duplicates of length L-1). This monotonic property enables binary search. For each candidate length L: use a prefix hash array to compute the hash of every substring of length L in O(n) total. Store hashes in a set — if any hash is seen twice, a duplicate exists. Binary search finds the maximum valid L in O(log n) iterations, each O(n) → O(n log n) total. Handle hash collisions by double-hashing or verifying the actual substring on collision. This is LC 1044. Suffix array approach: O(n log n) build + O(n) LCP → also O(n log n) but collision-free.

Question 4

What is double hashing and why is it used?

Accepted Answer

Double hashing uses two independent hash functions and stores (hash1, hash2) pairs as the effective hash. If M1 and M2 are both ~10^9+7, the collision probability of double hashing is ~1/(M1*M2) ≈ 10^-18 per comparison. Single hashing has collision probability ~1/M ≈ 10^-9 per comparison — with n^2 comparisons, expected collisions = n^2/M, which can be 1 for n=30,000. Adversarial inputs can be crafted to force collisions with a single hash, causing O(n^2) behavior in hash-set solutions. Double hashing prevents this. Implementation: maintain two prefix hash arrays with different (p, M) pairs; store (h1, h2) tuples in the set; compare tuples.

Question 5

When should you use string hashing vs KMP vs Z-algorithm for pattern matching?

Accepted Answer

KMP (Knuth-Morris-Pratt) and Z-algorithm: O(n+m), no collision risk, easy to implement, but handle only one pattern at a time. Use when matching a single fixed pattern, especially in production code where correctness matters more than generality. Rabin-Karp (rolling hash): O(n+m) average, O(nm) worst case (rare with good hash), handles multiple patterns efficiently (store all pattern hashes in a set, check each window against the set in O(1)). Use for multi-pattern matching or substring duplicate problems. Suffix array: O(n log n) build, O(m log n) search per query, handles all substring queries. Use for: longest common substring of multiple strings, number of distinct substrings, lexicographically sorted suffixes. For interviews: rolling hash is usually the right choice — simple, flexible, fast enough.

String Hashing Interview Patterns

Why String Hashing?

Polynomial Rolling Hash

Rabin-Karp Pattern Matching

Longest Duplicate Substring (LC 1044)

Double Hashing (Reduce Collision Probability)

String Hashing vs Suffix Array vs Z-Algorithm

When to Use String Hashing