Strings appear in roughly 25-30% of coding interviews. Unlike arrays or trees, string problems have a concentrated set of patterns – master these and most string problems become recognizable variants.
Sliding Window for Substring Problems
The sliding window pattern solves most “find a substring satisfying condition X” problems in O(n).
from collections import defaultdict
def min_window_substring(s: str, t: str) -> str:
"""LC 76 - Minimum Window Substring"""
need = defaultdict(int)
for c in t:
need[c] += 1
have, required = 0, len(need)
window = defaultdict(int)
left = 0
result = ""
min_len = float('inf')
for right, c in enumerate(s):
window[c] += 1
if c in need and window[c] == need[c]:
have += 1
while have == required:
# Update result
if right - left + 1 < min_len:
min_len = right - left + 1
result = s[left:right+1]
# Shrink window
left_c = s[left]
window[left_c] -= 1
if left_c in need and window[left_c] int:
"""LC 3 - Longest Substring Without Repeating Characters"""
seen = {}
left = 0
best = 0
for right, c in enumerate(s):
if c in seen and seen[c] >= left:
left = seen[c] + 1
seen[c] = right
best = max(best, right - left + 1)
return best
def find_anagrams(s: str, p: str) -> list:
"""LC 438 - Find All Anagrams in a String"""
if len(p) > len(s):
return []
p_count = [0] * 26
w_count = [0] * 26
for c in p:
p_count[ord(c) - ord('a')] += 1
result = []
for i, c in enumerate(s):
w_count[ord(c) - ord('a')] += 1
if i >= len(p):
out = s[i - len(p)]
w_count[ord(out) - ord('a')] -= 1
if w_count == p_count:
result.append(i - len(p) + 1)
return result
Template recognition: If the problem asks for a substring (contiguous) satisfying some count/frequency constraint, reach for sliding window first.
KMP (Knuth-Morris-Pratt) Pattern Matching
Brute-force string search is O(n*m). KMP achieves O(n+m) by precomputing a failure function that tells us how much to advance the pattern pointer on mismatch.
def build_failure_function(pattern: str) -> list:
"""
failure[i] = length of longest proper prefix of pattern[0..i]
that is also a suffix.
"""
m = len(pattern)
failure = [0] * m
j = 0 # length of previous longest prefix-suffix
for i in range(1, m):
while j > 0 and pattern[i] != pattern[j]:
j = failure[j - 1] # fall back
if pattern[i] == pattern[j]:
j += 1
failure[i] = j
return failure
def kmp_search(text: str, pattern: str) -> list:
"""Returns all start indices where pattern occurs in text."""
if not pattern:
return list(range(len(text) + 1))
failure = build_failure_function(pattern)
matches = []
j = 0 # number of chars matched in pattern
for i, c in enumerate(text):
while j > 0 and c != pattern[j]:
j = failure[j - 1]
if c == pattern[j]:
j += 1
if j == len(pattern):
matches.append(i - j + 1)
j = failure[j - 1]
return matches
# Example
print(kmp_search("ababcababcabc", "ababc")) # [0, 5, 7]
When KMP matters in interviews: Most interviewers accept Python’s str.find() or in operator. Knowing KMP signals depth. Bring it up proactively: “I’d use KMP here for O(n+m) – want me to implement the failure function?”
Rabin-Karp Rolling Hash
Rolling hash enables O(1) hash recomputation as you slide a window, making O(n) substring search possible.
def rabin_karp(text: str, pattern: str) -> list:
"""Find all occurrences of pattern in text using rolling hash."""
n, m = len(text), len(pattern)
if m > n:
return []
BASE = 31
MOD = 10**9 + 7
matches = []
def char_val(c):
return ord(c) - ord('a') + 1
# Precompute BASE^m mod MOD
base_m = pow(BASE, m, MOD)
# Initial hash for pattern and first window
p_hash = 0
t_hash = 0
for i in range(m):
p_hash = (p_hash * BASE + char_val(pattern[i])) % MOD
t_hash = (t_hash * BASE + char_val(text[i])) % MOD
for i in range(n - m + 1):
if t_hash == p_hash:
# Verify to handle hash collisions
if text[i:i+m] == pattern:
matches.append(i)
if i str:
"""LC 1044 - binary search on length + Rabin-Karp"""
def has_duplicate(length: int) -> str:
BASE, MOD = 31, 10**9 + 7
base_l = pow(BASE, length, MOD)
h = 0
for c in s[:length]:
h = (h * BASE + ord(c)) % MOD
seen = {h: [0]}
for i in range(1, len(s) - length + 1):
h = (h * BASE - ord(s[i-1]) * base_l + ord(s[i+length-1])) % MOD
for start in seen.get(h, []):
if s[start:start+length] == s[i:i+length]:
return s[i:i+length]
seen.setdefault(h, []).append(i)
return ""
lo, hi, result = 1, len(s) - 1, ""
while lo <= hi:
mid = (lo + hi) // 2
found = has_duplicate(mid)
if found:
result = found
lo = mid + 1
else:
hi = mid - 1
return result
Anagram and Character Frequency Patterns
from collections import Counter
def group_anagrams(strs: list) -> list:
"""LC 49 - Group Anagrams"""
groups = defaultdict(list)
for s in strs:
key = tuple(sorted(s)) # or Counter(s) as frozenset
groups[key].append(s)
return list(groups.values())
def is_anagram(s: str, t: str) -> bool:
"""LC 242 - Valid Anagram - O(n) with Counter"""
return Counter(s) == Counter(t)
def find_anagrams_freq(s: str, p: str) -> list:
"""LC 438 using Counter diff - cleaner but slightly slower"""
result = []
p_count = Counter(p)
w_count = Counter(s[:len(p)])
if w_count == p_count:
result.append(0)
for i in range(len(p), len(s)):
# Add new character
w_count[s[i]] += 1
# Remove old character
old = s[i - len(p)]
w_count[old] -= 1
if w_count[old] == 0:
del w_count[old]
if w_count == p_count:
result.append(i - len(p) + 1)
return result
Interview tip: Counter comparison is O(k) where k = unique characters. For fixed alphabets (26 letters) this is O(1) effectively. Mention this.
Palindrome Patterns
def longest_palindrome_expand(s: str) -> str:
"""LC 5 - expand around center, O(n^2) time O(1) space"""
def expand(left, right):
while left >= 0 and right int:
"""LC 647 - count all palindromic substrings"""
count = 0
for i in range(len(s)):
for left, right in [(i, i), (i, i+1)]:
while left >= 0 and right < len(s) and s[left] == s[right]:
count += 1
left -= 1
right += 1
return count
Manacher’s algorithm solves LC 5 in O(n) time by reusing previously computed palindrome lengths. The key insight: if we are inside a larger palindrome, the mirror position gives us a lower bound on the current position’s palindrome length. In practice, expand-around-center O(n^2) is accepted in most interviews – bring up Manacher’s as an optimization if asked.
String Encoding and Decoding
def encode(strs: list) -> str:
"""LC 271 - encode list of strings for transmission"""
# Format: length#string for each entry
# "4#lint6#coding4#is3#fun" -> ["lint","coding","is","fun"]
return ''.join(f"{len(s)}#{s}" for s in strs)
def decode(s: str) -> list:
result = []
i = 0
while i str:
"""Classic RLE: 'aaabbc' -> '3a2b1c'"""
if not s:
return ""
result = []
count = 1
for i in range(1, len(s)):
if s[i] == s[i-1]:
count += 1
else:
result.append(f"{count}{s[i-1]}")
count = 1
result.append(f"{count}{s[-1]}")
return ''.join(result)
Two-Pointer on Strings
def reverse_vowels(s: str) -> str:
"""LC 345"""
vowels = set('aeiouAEIOU')
chars = list(s)
left, right = 0, len(chars) - 1
while left < right:
while left < right and chars[left] not in vowels:
left += 1
while left str:
"""LC 151 - Reverse Words in a String"""
return ' '.join(s.split()[::-1])
def is_palindrome(s: str) -> bool:
"""LC 125 - Valid Palindrome (alphanumeric only)"""
cleaned = [c.lower() for c in s if c.isalnum()]
return cleaned == cleaned[::-1]
# Two-pointer version for O(1) space:
# left, right = 0, len(s)-1
# while left < right:
# while left<right and not s[left].isalnum(): left+=1
# while left<right and not s[right].isalnum(): right-=1
# if s[left].lower() != s[right].lower(): return False
# left+=1; right-=1
# return True
Pattern Recognition Table
| Problem Type | Technique | Key Examples |
|---|---|---|
| Find substring with constraint | Sliding window | LC 3, 76, 438 |
| Find pattern in text (one-off) | str.find() or KMP | LC 28 |
| Find repeated/duplicate substrings | Rabin-Karp + binary search | LC 1044, 718 |
| Are two strings anagrams | Counter / frequency array | LC 49, 242, 438 |
| Find longest palindromic substring | Expand around center | LC 5, 647 |
| Serialize/deserialize strings | Length-prefix encoding | LC 271 |
| In-place string reversal | Two pointers | LC 125, 151, 345 |
| Edit distance / alignment | DP (2D table) | LC 72, 1143 |
| Multiple pattern search | Aho-Corasick (rare in interviews) | LC 212 (use Trie instead) |
Interview approach: State the pattern aloud before coding. “This looks like a sliding window with frequency tracking – similar to minimum window substring.” This signals pattern recognition and gives the interviewer confidence before you write a single line of code.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is the sliding window template for substring problems?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use two pointers (left, right) and a window state (typically a dict/counter). Expand right pointer one step at a time, adding the new character to the window state. When the window violates a constraint, shrink from the left until valid again. Record the answer when the window is valid. Template: left=0, for right in range(len(s)): update_window(s[right]); while invalid: undo_window(s[left]); left+=1; update_answer(). For minimum window substring (LC 76): invalid = any required character count not satisfied. For longest without repeating (LC 3): invalid = any character appears more than once. For find all anagrams (LC 438): valid = window counter equals target counter.”}},{“@type”:”Question”,”name”:”How does KMP (Knuth-Morris-Pratt) pattern matching work?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”KMP avoids re-examining characters after a mismatch by precomputing a failure function. The failure function fail[i] = length of the longest proper prefix of pattern[0..i] that is also a suffix. When a mismatch occurs at pattern position j, instead of restarting at j=0, jump to j=fail[j-1]. This amortizes the work: each character in the text is processed at most twice. Build failure function: O(m) where m is pattern length. Search: O(n) where n is text length. Total: O(n+m) vs O(n*m) naive. Key use case: substring search in a large document, or finding a pattern in a circular string (concatenate string with itself).”}},{“@type”:”Question”,”name”:”What is the rolling hash technique and when do you use it?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Rolling hash computes a hash of a sliding window in O(1) by adding the new character and removing the old: hash = hash * base + ord(new_char) – ord(old_char) * base^window_len. Use modular arithmetic (mod a large prime) to prevent overflow. When the rolling hash matches the target, do a full string comparison to confirm (avoid false positives). Use cases: find repeated DNA sequences (LC 187), longest duplicate substring (LC 1044, binary search on length + rolling hash), Rabin-Karp string matching when KMP is overkill for average-case. Rolling hash is O(n) average vs O(n^2) naive substring comparison.”}},{“@type”:”Question”,”name”:”How do you check if two strings are anagrams and group anagrams efficiently?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Two strings are anagrams if they have the same character frequency. Check: Counter(s) == Counter(t) or sorted(s) == sorted(t). O(n) vs O(n log n). Group anagrams (LC 49): use sorted characters as the hash key – all anagrams of a word sort to the same key. group = defaultdict(list); for word in words: group[tuple(sorted(word))].append(word). Alternative key: tuple of 26 character counts – avoids sorting, always O(n) per word. For large alphabets or Unicode strings, use the Counter/tuple of character counts approach.”}},{“@type”:”Question”,”name”:”How do you find the longest palindromic substring efficiently?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Expand around center: for each index i, expand outward as long as characters match. Handle both odd-length (center at i) and even-length (center between i and i+1) palindromes. Track the longest palindrome found. O(n^2) time, O(1) space. Better: Manacher's algorithm achieves O(n) by reusing palindrome information from previous centers. Transform the string to "#a#b#a#" to unify odd/even cases. Maintain a rightmost palindrome boundary and its center; use the mirror palindrome to initialize the expansion. For interviews, expand-around-center is expected; mention Manacher's for bonus points. LC 5 (Longest Palindromic Substring), LC 647 (Count Palindromic Substrings).”}}]}
Meta coding interviews heavily test string patterns. See common questions for Meta interview: string and substring problems.
Apple coding rounds test string algorithms including KMP and sliding window. See patterns for Apple interview: string manipulation and matching.
Atlassian coding rounds include string manipulation and pattern matching. See patterns for Atlassian interview: string and text processing problems.