Detect Phone Numbers Missing a Country Code: Heuristics and libphonenumber

⏱ 5 min read

Parse Phone Numbers Without a Country Code: Heuristic Detection

“Given a list of phone numbers in mixed formats, identify which are missing a country code.” This is a classic Apple / consumer-product engineering interview question that tests string parsing, regular expression skill, and the ability to reason about heuristics under ambiguous data. The base problem is simple — check whether a phone number has a country code prefix — but real-world phone-number data is messy enough that pure rule-based detection misses cases. This guide covers the standard approaches and the trade-offs between strict pattern matching and heuristic-based detection.

Problem Statement

Given a string representing a phone number in arbitrary format, determine whether it includes a country code. Country codes typically range from 1 to 3 digits and may be prefixed with “+” or “00”.

Examples:

"+1 555-123-4567" → has country code (1 = US)
"555-123-4567" → missing country code
"+44 20 7123 4567" → has country code (44 = UK)
"00 49 30 12345678" → has country code (49 = Germany, prefix 00)
"(555) 123-4567" → missing country code

Approach 1: Strict Prefix Detection

Check for “+” or “00” at the start. Naive but works for clean data.

def has_country_code_strict(phone: str) -> bool:
    """Detect country code via prefix."""
    s = phone.strip()
    return s.startswith("+") or s.startswith("00")


# Tests
print(has_country_code_strict("+1 555-123-4567"))      # True
print(has_country_code_strict("555-123-4567"))          # False
print(has_country_code_strict("00 49 30 12345678"))     # True
print(has_country_code_strict("(555) 123-4567"))        # False

Acceptable for input with consistent formatting. Misses cases where the country code is implied without an explicit “+” — e.g., “1-800-…” in US format where “1” is the country code, or a leading “1” before a 10-digit number.

Approach 2: Length-Based Heuristic

Count the digits in the number. Most national formats have 7-10 digits; numbers with 11+ digits often imply an embedded country code.

import re

def count_digits(phone: str) -> int:
    return len(re.sub(r"D", "", phone))


def has_country_code_heuristic(phone: str) -> bool:
    """Heuristic combining prefix and length."""
    s = phone.strip()
    if s.startswith("+") or s.startswith("00"):
        return True
    digit_count = count_digits(s)
    # US format with leading 1 (11 digits): country code present
    if digit_count == 11 and re.sub(r"D", "", s).startswith("1"):
        return True
    # Strictly more than 10 digits: probably has country code
    if digit_count > 10:
        return True
    return False


# Tests
print(has_country_code_heuristic("+1 555-123-4567"))       # True
print(has_country_code_heuristic("1-555-123-4567"))         # True (11 digits, leading 1)
print(has_country_code_heuristic("555-123-4567"))           # False (10 digits, no prefix)
print(has_country_code_heuristic("(555) 123-4567"))         # False

The heuristic catches “1-555-…” style US numbers without explicit “+”. Trade-off: heuristics are imperfect; some valid 10-digit US numbers without country code may be ambiguous if they happen to start with a country-code-like digit.

Approach 3: Library-Based (Production)

For real-world phone-number parsing, use Google’s libphonenumber. It handles every country’s formats, normalizes input, and provides validation.

# pip install phonenumbers
import phonenumbers

def has_country_code_lib(phone: str, default_region: str = "US") -> bool:
    """Use libphonenumber for production-quality parsing."""
    try:
        # Parse with default_region as fallback
        parsed = phonenumbers.parse(phone, default_region)
        # If the parsed country code matches the default, no explicit country code was given
        explicit_country_code = phonenumbers.parse(phone, None)
        return explicit_country_code.country_code is not None
    except phonenumbers.phonenumberutil.NumberParseException:
        return False

For interview purposes, mention libphonenumber as the production answer; implement a manual heuristic to demonstrate skill.

Approach 4: Country Code Lookup

Maintain a list of valid country codes (1, 7, 20, 27, 30, 31, …, 998). Strip non-digits; check whether the leading digits match any valid country code.

COUNTRY_CODES = {
    "1", "7", "20", "27", "30", "31", "32", "33", "34", "36", "39", "40", "41",
    "43", "44", "45", "46", "47", "48", "49", "51", "52", "53", "54", "55",
    "56", "57", "58", "60", "61", "62", "63", "64", "65", "66", "81", "82",
    "84", "86", "90", "91", "92", "93", "94", "95", "98",
    # ... (full list ~250 codes)
}

def has_country_code_lookup(phone: str) -> bool:
    digits = re.sub(r"D", "", phone)
    # Try 1, 2, 3-digit prefixes
    for length in (1, 2, 3):
        if digits[:length] in COUNTRY_CODES:
            # Validate remaining length
            remaining = len(digits) - length
            if 7 <= remaining <= 12:  # plausible national number length
                return True
    return False

More accurate than length-based heuristic but requires maintaining the country-code list. libphonenumber is preferable for production.

Common Variations

Format / normalize phone numbers

Convert various formats to a canonical E.164 format (e.g., +14155551234). libphonenumber’s format_number handles this.

Detect emergency numbers

Numbers like 911 (US), 999 (UK), 112 (EU). Special-case these; they don’t follow standard country-code rules.

Detect short codes / SMS shortcodes

Marketing shortcodes (4-6 digits) aren’t traditional phone numbers. Different validation; libphonenumber distinguishes them.

Detect VoIP / mobile / landline

By number prefix within a country, you can sometimes infer line type. libphonenumber provides this; rule-based detection is fragile.

Common Mistakes

Assuming all phone numbers are 10 digits. US/Canada are 10; UK is 10–11; Germany is 10–13; Sweden is 7–10. National lengths vary widely.
Not normalizing whitespace and separators. Strip all non-digits before pattern matching, except the leading “+” which is meaningful.
Confusing “+” with “00” prefixes. Both indicate international format. “+” is more common; “00” is older but still seen in Europe.
Assuming “1” is always a US country code prefix. “1” is also the start of valid 10-digit US numbers (just rare). Length disambiguates: 11 digits starting with 1 → has country code; 10 digits → no country code.
Hand-rolling regex when libphonenumber exists. For production, use libphonenumber. Hand-rolled regex misses edge cases (extensions, special services, regional formats). For interview purposes, hand-rolling demonstrates skill, but mention the production answer.

Frequently Asked Questions

What’s the expected interview answer?

Combine prefix detection (“+”, “00”) with digit-count heuristics (11+ digits typically indicates country code). Mention libphonenumber for production; implement the heuristic to demonstrate skill. Walk through edge cases: 10-digit US, 11-digit US-with-1, “+44”, “00 49”. Strong candidates anticipate that real data is messy and design for it.

Why is this hard?

Phone-number formats vary by country, region, and provider. Real-world data is dirty: missing prefixes, mixed separators, inconsistent formatting. A clean rule that works for one country fails for another. The interview question tests whether you account for this messiness instead of writing a one-size-fits-all rule.

How accurate are heuristic approaches?

For most US-centric data: 95%+ accurate with simple heuristics. For globally-mixed data: drops to ~80% with hand-rolled rules. Library-based parsing (libphonenumber) approaches 99% accuracy when given correct default region. Interview answers should mention these accuracy bounds; don’t claim 100%.

What about “extensions” like “+1 555-123-4567 ext. 200”?

Extensions are part of the original phone number’s format but typically separate from the routable number. libphonenumber parses them as a distinct field. For interview purposes, strip “ext.”, “x”, “extension” before processing the main number.

How does this generalize to other phone-number tasks?

The same parsing pipeline (strip non-digits → check prefix → look up country code → validate remaining length) applies to formatting, validating, and routing. Building a robust phone-number system is a real engineering investment; the interview question is a small slice of that broader problem.

💡Strategies for Solving This Problem

Finding Missing Element

This is usually "find missing number in array" in disguise. Common variation: array has numbers 1 to n, but one is missing. Find it.

The Setup

Imagine you have phone numbers with country codes 1 to n. One country code is missing from your dataset. Which one?

Approach 1: Sum Formula

Sum of 1 to n = n(n+1)/2

Calculate expected sum, subtract actual sum. The difference is the missing number.

O(n) time, O(1) space. Clean and simple.

Approach 2: XOR

XOR has property: a ⊕ a = 0 and a ⊕ 0 = a

XOR all numbers 1 to n, then XOR all array elements. The result is the missing number.

O(n) time, O(1) space. No overflow risk (unlike sum).

Approach 3: Hash Set

Add all array elements to set. Check which number from 1 to n is not in set.

O(n) time, O(n) space. Works but uses more memory.

Why XOR is Better

Sum formula can overflow if n is large. XOR doesn't have this problem.

XOR is also more versatile - works for finding duplicate, missing, or single number problems.

At Various Companies

This shows up all the time with different stories: missing file number, missing ID, missing floor number, etc. The solution is always the same.

Key question: What if there are multiple missing numbers? Or one duplicate? Each variation has a trick.

✅Solution

Solution: XOR Approach

function findMissingCode(codes, n) {
    // XOR all numbers from 1 to n
    let xorAll = 0;
    for (let i = 1; i <= n; i++) {
        xorAll ^= i;
    }

    // XOR all codes in array
    let xorArray = 0;
    for (const code of codes) {
        xorArray ^= code;
    }

    // Missing number is xorAll XOR xorArray
    return xorAll ^ xorArray;
}

// Test
console.log(findMissingCode([1, 2, 4, 5, 6], 6));  // 3
console.log(findMissingCode([1, 2, 3, 5], 5));     // 4

Alternative: Sum Formula

function findMissingCodeSum(codes, n) {
    // Expected sum: 1 + 2 + ... + n = n(n+1)/2
    const expectedSum = (n * (n + 1)) / 2;

    // Actual sum
    const actualSum = codes.reduce((sum, code) => sum + code, 0);

    return expectedSum - actualSum;
}

console.log(findMissingCodeSum([1, 2, 4, 5, 6], 6));  // 3

Alternative: Hash Set

function findMissingCodeSet(codes, n) {
    const set = new Set(codes);

    for (let i = 1; i <= n; i++) {
        if (!set.has(i)) {
            return i;
        }
    }

    return -1;  // Not found (shouldn't happen if input valid)
}

console.log(findMissingCodeSet([1, 2, 4, 5, 6], 6));  // 3

Variation: Two Missing Numbers

function findTwoMissing(codes, n) {
    // XOR gives us xor of two missing numbers
    let xorAll = 0;
    for (let i = 1; i <= n; i++) xorAll ^= i;
    for (const code of codes) xorAll ^= code;

    // xorAll = a ^ b (where a, b are missing)
    // Find rightmost set bit
    const rightBit = xorAll & -xorAll;

    // Partition numbers by this bit
    let xor1 = 0, xor2 = 0;

    for (let i = 1; i <= n; i++) {
        if (i & rightBit) {
            xor1 ^= i;
        } else {
            xor2 ^= i;
        }
    }

    for (const code of codes) {
        if (code & rightBit) {
            xor1 ^= code;
        } else {
            xor2 ^= code;
        }
    }

    return [xor1, xor2];
}

console.log(findTwoMissing([1, 2, 5, 6], 6));  // [3, 4] or [4, 3]

Variation: Find Duplicate

function findDuplicate(codes) {
    // Array has n+1 elements with values 1 to n
    // One value appears twice
    // Floyd's cycle detection

    let slow = codes[0];
    let fast = codes[0];

    // Find intersection point
    do {
        slow = codes[slow];
        fast = codes[codes[fast]];
    } while (slow !== fast);

    // Find entrance to cycle (the duplicate)
    slow = codes[0];
    while (slow !== fast) {
        slow = codes[slow];
        fast = codes[fast];
    }

    return slow;
}

console.log(findDuplicate([1, 3, 4, 2, 2]));  // 2
console.log(findDuplicate([3, 1, 3, 4, 2]));  // 3

Variation: Missing and Duplicate

function findMissingAndDuplicate(codes, n) {
    // One number appears twice, one is missing

    // Using XOR
    let xor = 0;
    for (let i = 1; i <= n; i++) xor ^= i;
    for (const code of codes) xor ^= code;

    // xor = missing ^ duplicate
    const rightBit = xor & -xor;

    let xor1 = 0, xor2 = 0;
    for (let i = 1; i <= n; i++) {
        if (i & rightBit) xor1 ^= i;
        else xor2 ^= i;
    }

    for (const code of codes) {
        if (code & rightBit) xor1 ^= code;
        else xor2 ^= code;
    }

    // Determine which is missing and which is duplicate
    for (const code of codes) {
        if (code === xor1) {
            return {duplicate: xor1, missing: xor2};
        }
        if (code === xor2) {
            return {duplicate: xor2, missing: xor1};
        }
    }
}

console.log(findMissingAndDuplicate([1, 2, 2, 4, 5], 5));
// {duplicate: 2, missing: 3}

Why XOR Works

Properties:

a ⊕ a = 0
a ⊕ 0 = a
Commutative and associative

Example: [1, 2, 4, 5, 6], n=6, missing 3

xorAll = 1 ⊕ 2 ⊕ 3 ⊕ 4 ⊕ 5 ⊕ 6
xorArray = 1 ⊕ 2 ⊕ 4 ⊕ 5 ⊕ 6

xorAll ⊕ xorArray = (1 ⊕ 2 ⊕ 3 ⊕ 4 ⊕ 5 ⊕ 6) ⊕ (1 ⊕ 2 ⊕ 4 ⊕ 5 ⊕ 6)
                  = 3 ⊕ (1 ⊕ 1) ⊕ (2 ⊕ 2) ⊕ (4 ⊕ 4) ⊕ (5 ⊕ 5) ⊕ (6 ⊕ 6)
                  = 3 ⊕ 0 ⊕ 0 ⊕ 0 ⊕ 0 ⊕ 0
                  = 3 ✓

Complexity Comparison

Approach	Time	Space	Overflow Risk
Sum formula	O(n)	O(1)	Yes
XOR	O(n)	O(1)	No
Hash set	O(n)	O(n)	No
Sort + scan	O(n log n)	O(1)	No

Common Mistakes

Integer overflow: Sum formula fails for large n
Modifying array: Some solutions mark visited indices - doesn't work if array is readonly
Wrong range: Are numbers 0 to n or 1 to n? Clarify!
Multiple missing: Basic XOR only works for one missing

Follow-Up Questions

Q: What if range is 0 to n instead of 1 to n?
A: Same algorithms work, adjust expected sum

Q: What if array is huge and doesn't fit in memory?
A: Stream through it, XOR or sum still works in one pass

Q: Find missing in unsorted array without extra space?
A: XOR or sum formula

Parse Phone Numbers Without a Country Code: Heuristic Detection

Problem Statement

Approach 1: Strict Prefix Detection

Approach 2: Length-Based Heuristic

Approach 3: Library-Based (Production)

Approach 4: Country Code Lookup

Common Variations

Format / normalize phone numbers

Detect emergency numbers

Detect short codes / SMS shortcodes

Detect VoIP / mobile / landline

Common Mistakes

Frequently Asked Questions

What’s the expected interview answer?

Why is this hard?

How accurate are heuristic approaches?

What about “extensions” like “+1 555-123-4567 ext. 200”?

How does this generalize to other phone-number tasks?

💡Strategies for Solving This Problem

Finding Missing Element

The Setup

Approach 1: Sum Formula

Approach 2: XOR

Approach 3: Hash Set

Why XOR is Better

At Various Companies

✅Solution

Solution: XOR Approach

Alternative: Sum Formula

Alternative: Hash Set

Variation: Two Missing Numbers

Variation: Find Duplicate

Variation: Missing and Duplicate

Why XOR Works

Complexity Comparison

Common Mistakes

Follow-Up Questions

Related Problems