Storing 1 million phone numbers

What is the most efficient way, memory-wise, to store 1 million phone numbers?  Apparently this is an interview question at Google, although this seems like its a bit too easy.

2026 Update: Storing Phone Numbers at Scale — Compression and Trie

Storing 1 million 10-digit US phone numbers efficiently tests knowledge of data compression and data structure selection.

import array
import bisect

def phone_to_int(phone: str) -> int:
    """Convert formatted phone string to integer."""
    digits = ''.join(c for c in phone if c.isdigit())
    return int(digits)

def int_to_phone(n: int) -> str:
    s = f"{n:010d}"
    return f"({s[:3]}) {s[3:6]}-{s[6:]}"

# Memory comparison
def memory_analysis():
    n = 1_000_000
    string_bytes = n * 12        # "8005551234" ~12 bytes per string
    int64_bytes = n * 8          # 8 bytes per uint64
    packed_bytes = n * 5         # 5 bytes per 34-bit number (ceil(34/8)=5)

    print(f"String storage:   {string_bytes/1e6:.0f} MB")   # 12 MB
    print(f"int64 storage:    {int64_bytes/1e6:.0f} MB")    #  8 MB
    print(f"Packed storage:   {packed_bytes/1e6:.0f} MB")   #  5 MB

memory_analysis()

# Approach 1: Sorted array + binary search
class PhoneBook:
    def __init__(self):
        self.numbers = array.array('Q')  # unsigned long long, 8 bytes each

    def add(self, phone: str) -> None:
        n = phone_to_int(phone)
        pos = bisect.bisect_left(self.numbers, n)
        if pos == len(self.numbers) or self.numbers[pos] != n:
            self.numbers.insert(pos, n)

    def contains(self, phone: str) -> bool:
        n = phone_to_int(phone)
        pos = bisect.bisect_left(self.numbers, n)
        return pos  None:
        digits = ''.join(c for c in phone if c.isdigit())
        node = self.root
        for d in digits:
            if d not in node.children:
                node.children[d] = TrieNode()
            node = node.children[d]
        node.is_end = True

    def search(self, phone: str) -> bool:
        digits = ''.join(c for c in phone if c.isdigit())
        node = self.root
        for d in digits:
            if d not in node.children:
                return False
            node = node.children[d]
        return node.is_end

    def starts_with(self, prefix: str) -> bool:
        digits = ''.join(c for c in prefix if c.isdigit())
        node = self.root
        for d in digits:
            if d not in node.children:
                return False
            node = node.children[d]
        return True  # Prefix exists

# Test
pb = PhoneBook()
pb.add("(800) 555-1234")
pb.add("(212) 867-5309")
print(pb.contains("(800) 555-1234"))  # True
print(pb.contains("(999) 999-9999"))  # False

trie = PhoneTrie()
trie.insert("8005551234")
print(trie.search("8005551234"))   # True
print(trie.starts_with("800"))     # True (area code autocomplete)

2026 context: This problem type appears in mobile contacts apps, telecom caller-ID lookups, and ad-tech phone number matching. Modern solutions use delta encoding (store sorted differences) + variable-length encoding to achieve ~2-3 bytes per phone number. Redis handles phone lookups via sorted sets (O(log n)); Cassandra uses bloom filters to avoid disk reads for non-existent numbers.

💡Strategies for Solving This Problem

System Design Question

Got this at Google. It's not about writing code - it's about discussing data structures, storage, and trade-offs. This is testing your systems thinking.

What They're Really Asking

  • How would you store them?
  • How would you search them?
  • What about memory constraints?
  • Need to support prefix search? ("Find all numbers starting with 415")

Approach 1: Hash Table

If you just need fast lookup by exact number: O(1) search, O(n) space.

Pros: Super fast exact matches
Cons: Can't do prefix search, can't get sorted order

Approach 2: Trie (Prefix Tree)

If you need prefix search ("all numbers starting with 650"):

Pros: Efficient prefix search, O(k) where k is number length
Cons: More memory than hash table

This is what my interviewer wanted to hear about.

Approach 3: Database

In reality, you'd use a database with indexes. Discuss:

  • B-tree index for range queries
  • Partitioning by area code
  • Replication for availability

Phone Number Specifics

  • Fixed length (10 digits in US)
  • Can represent as integer (but watch out for leading zeros!)
  • Area code provides natural partitioning

My Interview Discussion

I talked about Trie for prefix search. Interviewer asked:

  • "What's the space complexity?" (O(n × k) where k=10)
  • "Can you compress it?" (Yes, path compression)
  • "What if numbers don't fit in memory?" (Disk-based B-tree)

Be ready to dive deep into whichever structure you choose.

Scroll to Top