What is the most efficient way, memory-wise, to store 1 million phone numbers? Apparently this is an interview question at Google, although this seems like its a bit too easy.
2026 Update: Storing Phone Numbers at Scale — Compression and Trie
Storing 1 million 10-digit US phone numbers efficiently tests knowledge of data compression and data structure selection.
import array
import bisect
def phone_to_int(phone: str) -> int:
"""Convert formatted phone string to integer."""
digits = ''.join(c for c in phone if c.isdigit())
return int(digits)
def int_to_phone(n: int) -> str:
s = f"{n:010d}"
return f"({s[:3]}) {s[3:6]}-{s[6:]}"
# Memory comparison
def memory_analysis():
n = 1_000_000
string_bytes = n * 12 # "8005551234" ~12 bytes per string
int64_bytes = n * 8 # 8 bytes per uint64
packed_bytes = n * 5 # 5 bytes per 34-bit number (ceil(34/8)=5)
print(f"String storage: {string_bytes/1e6:.0f} MB") # 12 MB
print(f"int64 storage: {int64_bytes/1e6:.0f} MB") # 8 MB
print(f"Packed storage: {packed_bytes/1e6:.0f} MB") # 5 MB
memory_analysis()
# Approach 1: Sorted array + binary search
class PhoneBook:
def __init__(self):
self.numbers = array.array('Q') # unsigned long long, 8 bytes each
def add(self, phone: str) -> None:
n = phone_to_int(phone)
pos = bisect.bisect_left(self.numbers, n)
if pos == len(self.numbers) or self.numbers[pos] != n:
self.numbers.insert(pos, n)
def contains(self, phone: str) -> bool:
n = phone_to_int(phone)
pos = bisect.bisect_left(self.numbers, n)
return pos None:
digits = ''.join(c for c in phone if c.isdigit())
node = self.root
for d in digits:
if d not in node.children:
node.children[d] = TrieNode()
node = node.children[d]
node.is_end = True
def search(self, phone: str) -> bool:
digits = ''.join(c for c in phone if c.isdigit())
node = self.root
for d in digits:
if d not in node.children:
return False
node = node.children[d]
return node.is_end
def starts_with(self, prefix: str) -> bool:
digits = ''.join(c for c in prefix if c.isdigit())
node = self.root
for d in digits:
if d not in node.children:
return False
node = node.children[d]
return True # Prefix exists
# Test
pb = PhoneBook()
pb.add("(800) 555-1234")
pb.add("(212) 867-5309")
print(pb.contains("(800) 555-1234")) # True
print(pb.contains("(999) 999-9999")) # False
trie = PhoneTrie()
trie.insert("8005551234")
print(trie.search("8005551234")) # True
print(trie.starts_with("800")) # True (area code autocomplete)
2026 context: This problem type appears in mobile contacts apps, telecom caller-ID lookups, and ad-tech phone number matching. Modern solutions use delta encoding (store sorted differences) + variable-length encoding to achieve ~2-3 bytes per phone number. Redis handles phone lookups via sorted sets (O(log n)); Cassandra uses bloom filters to avoid disk reads for non-existent numbers.
💡Strategies for Solving This Problem
System Design Question
Got this at Google. It's not about writing code - it's about discussing data structures, storage, and trade-offs. This is testing your systems thinking.
What They're Really Asking
- How would you store them?
- How would you search them?
- What about memory constraints?
- Need to support prefix search? ("Find all numbers starting with 415")
Approach 1: Hash Table
If you just need fast lookup by exact number: O(1) search, O(n) space.
Pros: Super fast exact matches
Cons: Can't do prefix search, can't get sorted order
Approach 2: Trie (Prefix Tree)
If you need prefix search ("all numbers starting with 650"):
Pros: Efficient prefix search, O(k) where k is number length
Cons: More memory than hash table
This is what my interviewer wanted to hear about.
Approach 3: Database
In reality, you'd use a database with indexes. Discuss:
- B-tree index for range queries
- Partitioning by area code
- Replication for availability
Phone Number Specifics
- Fixed length (10 digits in US)
- Can represent as integer (but watch out for leading zeros!)
- Area code provides natural partitioning
My Interview Discussion
I talked about Trie for prefix search. Interviewer asked:
- "What's the space complexity?" (O(n × k) where k=10)
- "Can you compress it?" (Yes, path compression)
- "What if numbers don't fit in memory?" (Disk-based B-tree)
Be ready to dive deep into whichever structure you choose.