Missing or Duplicate Number in an Array: XOR, Floyd, Cyclic Sort

Missing or Duplicate Number in an Array: XOR, Sum, Cyclic Sort, and Floyd’s Algorithm

Finding the missing number or the duplicate number in an array of integers is a foundational problem family asked at FAANG, AI labs, and most coding interviews. The variants — single missing in [0, n], single duplicate in [1, n], multiple missing, multiple duplicates — share a common toolkit: XOR tricks, arithmetic sums, cyclic sort, hash sets, and Floyd’s cycle detection. This guide covers the standard approaches with working code, when to use each, and the trade-offs in time and space.

Variant 1: Missing Number in [0, n]

(LeetCode #268) Given an array of n distinct numbers in the range [0, n], find the one number missing from the range.

Approach: Sum formula

Sum of [0, n] is n(n+1)/2. Subtract the array sum to get the missing number.

def missing_number_sum(nums: list[int]) -> int:
    """O(n) time, O(1) space."""
    n = len(nums)
    expected = n * (n + 1) // 2
    return expected - sum(nums)

Risk: integer overflow in languages without arbitrary precision integers. Python is safe; C++ and Java may overflow for large n.

Approach: XOR

XOR all numbers in [0, n] with all numbers in the array. Pairs cancel; the unmatched value is the missing number. Avoids overflow.

def missing_number_xor(nums: list[int]) -> int:
    """O(n) time, O(1) space — no overflow risk."""
    result = len(nums)
    for i, num in enumerate(nums):
        result ^= i ^ num
    return result

Why XOR works: XOR is associative, commutative, and self-inverse (x ^ x = 0; x ^ 0 = x). XOR-ing all expected values with all actual values cancels everything except the missing one.

Variant 2: Duplicate Number in [1, n]

(LeetCode #287) Given an array of n+1 integers in [1, n], find the duplicate without modifying the array, in O(1) space and O(n) time.

Approach: Floyd’s Cycle Detection

Treat the array as a function: index i “points to” nums[i]. The duplicate creates a cycle. Floyd’s algorithm finds the cycle entry, which is the duplicate.

def find_duplicate(nums: list[int]) -> int:
    """O(n) time, O(1) space, no array modification."""
    slow = fast = nums[0]

    # Phase 1: detect cycle
    while True:
        slow = nums[slow]
        fast = nums[nums[fast]]
        if slow == fast:
            break

    # Phase 2: find cycle entry
    pointer = nums[0]
    while pointer != slow:
        pointer = nums[pointer]
        slow = nums[slow]
    return pointer

Why this works: Treating nums[i] as the next pointer creates a graph. Since values are in [1, n] and there are n+1 elements, by pigeonhole at least one value repeats. The duplicate creates two indices that map to the same next-value, forming a cycle. The cycle entry is the duplicate.

Approach: Hash set (O(n) space)

def find_duplicate_set(nums: list[int]) -> int:
    """O(n) time, O(n) space."""
    seen = set()
    for num in nums:
        if num in seen:
            return num
        seen.add(num)
    return -1

Simpler but uses extra space. Acceptable as a warm-up; interviewers will push for O(1) space.

Approach: Sort then scan (O(n log n))

def find_duplicate_sort(nums: list[int]) -> int:
    nums.sort()
    for i in range(1, len(nums)):
        if nums[i] == nums[i - 1]:
            return nums[i]
    return -1

Modifies the array (often disallowed). O(n log n) time. Mention as an alternative; not the expected answer.

Variant 3: Multiple Missing in [1, n]

(LeetCode #448) Find all numbers in [1, n] that don’t appear in the array.

Approach: Cyclic sort / negation marking

Use the array itself as a hash. For each value v, negate nums[v - 1] (mark that v is present). After processing, indices with positive values are missing.

def find_disappeared_numbers(nums: list[int]) -> list[int]:
    """O(n) time, O(1) extra space (output not counted)."""
    for i in range(len(nums)):
        idx = abs(nums[i]) - 1
        if nums[idx] > 0:
            nums[idx] = -nums[idx]
    return [i + 1 for i, v in enumerate(nums) if v > 0]

Modifies the input. Restore by re-iterating with abs() if needed.

Variant 4: Multiple Duplicates in [1, n]

(LeetCode #442) Find all elements that appear twice in an array of n integers in [1, n].

Approach: Negation marking

def find_duplicates(nums: list[int]) -> list[int]:
    """O(n) time, O(1) extra space."""
    result = []
    for i in range(len(nums)):
        idx = abs(nums[i]) - 1
        if nums[idx] < 0:
            result.append(idx + 1)
        else:
            nums[idx] = -nums[idx]
    return result

Same negation trick as above; collect the values that get marked twice.

Variant 5: Two Numbers — One Missing, One Duplicate

(LeetCode #645 — Set Mismatch) Given an array originally containing [1, n] but with one value duplicated and one missing, find both.

Approach: XOR variant

XOR the array with [1, n]. The result is missing ^ duplicate. Find a bit where they differ; partition both groups by that bit; XOR each group separately.

def find_error_nums(nums: list[int]) -> list[int]:
    """O(n) time, O(1) space."""
    n = len(nums)
    xor_all = 0
    for i in range(1, n + 1):
        xor_all ^= i
    for num in nums:
        xor_all ^= num
    # xor_all = missing ^ duplicate; partition by lowest set bit
    diff_bit = xor_all & -xor_all
    a = b = 0
    for i in range(1, n + 1):
        if i & diff_bit:
            a ^= i
        else:
            b ^= i
    for num in nums:
        if num & diff_bit:
            a ^= num
        else:
            b ^= num
    # a and b are missing and duplicate; figure out which
    duplicate = a if a in nums else b
    missing = b if duplicate == a else a
    return [duplicate, missing]

Tricky implementation; rarely required in interviews. The hash-set or sum-and-sum-of-squares approach is easier and acceptable.

Common Mistakes

  • Integer overflow on the sum approach. n(n+1)/2 grows with n²; for n = 10⁵, the sum is ~5×10⁹ which fits in 64-bit int but not 32-bit. Use 64-bit or XOR to avoid the issue.
  • Modifying input when not allowed. The negation-marking approach mutates the array. If the problem requires preserving input, use Floyd’s algorithm or a hash set.
  • Off-by-one in range. Some problems use [0, n], others [1, n]. Check the range before applying any approach; the index calculations differ.
  • Forgetting the cycle detection edge cases. Floyd’s algorithm assumes a cycle exists. If the input is malformed (no duplicate), the algorithm doesn’t terminate. Add a sanity check or rely on the problem’s guarantees.
  • Confusing XOR with set difference. XOR finds elements appearing an odd number of times; for finding the missing single element when all others appear exactly twice, XOR is perfect. For finding multiple missings or duplicates, XOR alone isn’t enough — you need bit partitioning or other tricks.

Frequently Asked Questions

What’s the expected interview answer for “find the duplicate number”?

Floyd’s cycle detection. O(n) time, O(1) space, no array modification. Walk through the analogy: treat indices as a graph, the duplicate creates a cycle, find the cycle entry. Mention hash set as a warm-up that uses O(n) space; mention sorting as another option that’s slower and modifies input. Strong candidates explain why the cycle entry is the duplicate.

What about find-the-missing-number?

XOR or sum formula. Both are O(n) time and O(1) space. XOR avoids overflow concerns; sum is more intuitive. Either is acceptable; mention both. The trick to know is the property that XOR is its own inverse.

How do I handle multiple missing numbers efficiently?

The negation-marking trick on the input array (LeetCode #448). For each value, negate the corresponding index; after processing, positive indices indicate missing values. O(n) time, O(1) extra space (output not counted). The trade-off is mutation of the input.

Why do interviewers ask these questions so frequently?

The problem family teaches multiple foundational tricks: XOR’s self-inverse property, arithmetic sum tricks, treating arrays as hash maps via index mapping, cycle detection on functional graphs. Each is a powerful tool with broader applications. Strong candidates recognize the problem family and choose the right tool; weak candidates default to hash sets and miss the elegance.

What if the input range isn’t [1, n] or [0, n]?

The XOR and sum approaches still work — adjust the expected sum or XOR to match the range. The negation-marking approach requires the values to map to valid indices; for arbitrary ranges, normalize first or use a hash set. The cycle-detection approach requires the values to map to valid indices specifically; if the range doesn’t match, you can’t use Floyd’s directly.

See also: Two Sum: Find a Pair in an ArrayCheck if a Number is a Power of TwoDetect a Cycle in a Linked List

💡Strategies for Solving This Problem

Multiple Solutions, Different Trade-offs

I've gotten variants of this at Amazon and Microsoft. The problem: array of n numbers from 1 to n, one is missing or duplicated. Find it.

Approach 1: Math Formula (Cleanest)

Sum of 1 to n is: n(n+1)/2

Expected sum - actual sum = missing number (or duplicate with negative)

This is O(n) time, O(1) space. My go-to solution.

Approach 2: XOR Bit Manipulation (Clever)

XOR has cool properties:

  • a ⊕ a = 0
  • a ⊕ 0 = a
  • XOR is commutative and associative

XOR all numbers 1 to n, then XOR with array elements. Result is the missing/duplicate number.

Also O(n) time, O(1) space, but shows you know bit tricks.

Approach 3: Hash Set (Straightforward)

Put all numbers in a Set, then check which from 1 to n is missing.

O(n) time, O(n) space. Works but uses extra space.

Approach 4: In-Place Marking (Tricky)

Use array indices as a hash. For each number, mark array[number] as negative. The index that stays positive (or goes negative twice) is your answer.

O(n) time, O(1) space, but modifies the array.

Which to Use?

I usually start with the math approach (it's cleanest). If interviewer asks to optimize or show another method, I'll do XOR. Hash set is the fallback if I'm blanking.

Scroll to Top