Find the Duplicate Number: The Constraint-Driven Design Problem

You have an array of n+1 integers where each integer is in the range [1, n]. By the pigeonhole principle, at least one duplicate exists. Find any duplicate. But there are constraints: you cannot modify the array, you must use only O(1) extra space, and your runtime must be less than O(n²).

This is LeetCode 287, “Find the Duplicate Number”, and it is the canonical “constraint-driven design” problem in the modern interview canon. The constraints — read-only input, O(1) space, sub-quadratic time — rule out all the obvious solutions: cannot use a hash set (violates O(1) space), cannot sort (modifies the array), cannot use a counting array (also O(n) space). The only remaining option is something genuinely clever: Floyd’s cycle detection algorithm, applied in a non-obvious way.

Why the obvious solutions fail

The interview value of this problem is entirely in the constraints. Each of the natural solutions hits one of them:

  • Hash set. Walk the array, add each element to a set, return the first one already present. O(n) time, O(n) space. Violates the space constraint.
  • Sort. Sort the array, find adjacent duplicates. O(n log n) time, O(1) space if in-place. Violates the read-only constraint.
  • Counting array. Use an auxiliary array of size n+1 indexed by value, count occurrences. O(n) time, O(n) space. Violates the space constraint.
  • Mark visited (using sign). Negate a[a[i]] as a way to mark “seen”. Brilliant but requires modifying the array; violates that constraint.
  • Brute force compare-all-pairs. Two nested loops. O(n²) time, O(1) space. Satisfies the constraints but is too slow.
  • Binary search on value space. For each candidate value v, count how many array elements are ≤ v; find the v where the count first exceeds v. O(n log n) time, O(1) space. Satisfies all constraints. Often given as a “good” answer.
  • Floyd’s cycle detection. Treat the array as a linked list (index → value as next pointer). The duplicate creates a cycle. Find the cycle entry. O(n) time, O(1) space. The intended optimal answer.

The binary search and Floyd’s are both acceptable; Floyd’s is the cleaner answer.

The Floyd’s cycle insight

The key reframing: think of the array as a function. f(i) = a[i]. Starting from any index and repeatedly applying f produces a sequence: i, a[i], a[a[i]], a[a[a[i]]], ... All these values are in [1, n], so the sequence stays within the indexable range. Because there are only n possible values but the sequence is infinite, it must eventually repeat — that is, there must be a cycle.

The duplicate creates the cycle, because two distinct indices i and j with a[i] = a[j] both map to the same next value, creating a “rho” shape (a tail leading into a cycle). The duplicate value is the entry point of the cycle.

So the problem becomes: find the entry point of the cycle in this implicit linked list. Floyd’s tortoise-and-hare algorithm does exactly this in O(n) time and O(1) space.

The algorithm

def find_duplicate(nums):
    # Phase 1: find any point inside the cycle (tortoise and hare meet)
    tortoise = nums[0]
    hare = nums[0]
    while True:
        tortoise = nums[tortoise]
        hare = nums[nums[hare]]
        if tortoise == hare:
            break

    # Phase 2: find the cycle entry (the duplicate)
    tortoise = nums[0]
    while tortoise != hare:
        tortoise = nums[tortoise]
        hare = nums[hare]

    return hare

O(n) time. Each pointer moves at most 2n steps. O(1) space — just two integer variables.

The two phases:

  1. Phase 1. Tortoise moves one step at a time, hare moves two steps. They will meet inside the cycle, at some position that is not the cycle entry.
  2. Phase 2. Reset tortoise to the start. Move both at one step at a time. They will meet at the cycle entry. The cycle entry is the duplicate.

The math behind why phase 2 works is a clean number-theory argument: if the tail length is μ and the cycle length is λ, then the meeting point in phase 1 is at distance from the cycle entry for some integer k. Resetting one pointer to the start and moving both at one step at a time means they will meet at distance μ from the start — which is the cycle entry.

Why this question is canonical

Find the Duplicate is famous because it is the cleanest example of how artificial constraints in an interview question can force a candidate toward a non-obvious solution. The constraints individually look reasonable; together they rule out everything except Floyd’s. Recognizing that the array can be reinterpreted as a linked list — and then knowing Floyd’s algorithm well enough to apply it in this non-cycle context — is the kind of insight that signals genuinely deep algorithmic preparation.

The problem also has the rare property that the obvious framing is wrong. The array does not look like a linked list; it does not have explicit “next” pointers. The candidate has to discover the framing themselves before the algorithm is even applicable. This step — reframing the problem in terms of a familiar abstraction — is one of the highest-signal moments in a coding interview.

What interviewers grade

This is a senior+ FAANG question, often appearing in onsites for L5+ engineering roles. Signal layers:

  1. State the obvious solutions and explain why they fail. Hash set (space), sort (modifies), counting array (space). The candidate should walk through these systematically.
  2. Propose binary search as a fallback. Most candidates can derive the binary search solution. It is acceptable as the final answer if the candidate cannot get to Floyd’s.
  3. Recognize the linked list framing. The candidate articulates that the array can be treated as a graph where each index has an outgoing edge.
  4. Apply Floyd’s. The two-phase algorithm is non-trivial; the candidate should know it.
  5. Argue correctness. Why does phase 2 give the cycle entry? The number-theory argument is what separates strong from average candidates.

Variations interviewers ask

  • “Find all duplicates” (LC 442). Different problem; uses the negate-as-mark trick that violates the read-only constraint of LC 287.
  • “Missing number” (LC 268). Sum-based or XOR-based approach; different technique entirely.
  • “First missing positive” (LC 41). Uses cyclic sort, which is related but a different specific algorithm.
  • “Linked list cycle” (LC 141 / 142). The pure form of Floyd’s algorithm. Often asked as a warmup before LC 287 to confirm the candidate knows the algorithm.

Is it asked in 2026?

Yes, regularly at senior+ FAANG and AI labs. Find the Duplicate is in the canonical “hard but elegant” tier of interview questions, and the constraint-driven framing means the question continues to differentiate even when candidates have heard about Floyd’s algorithm. The follow-up “now do it without the read-only constraint” or “now do it in O(n) without the linked list trick” extends the question into harder territory.

Frequently Asked Questions

What is the time complexity?

O(n) for Floyd’s cycle detection. Each pointer moves at most 2n steps in total.

What is the space complexity?

O(1). Just two integer pointers.

Is binary search an acceptable answer?

Yes, with O(n log n) time and O(1) space. Floyd’s is faster, but binary search demonstrates that the candidate can solve the problem under the constraints. Many interviewers accept it.

Why does Floyd’s apply when the input is not a linked list?

Because the function f(i) = nums[i] defines an implicit graph where each node has out-degree 1. Such a graph is a collection of “rho” shapes (tail + cycle). Floyd’s finds cycle entries in any such graph.

Is this problem still asked in 2026?

Yes, regularly at senior+ FAANG and AI labs. It is one of the most-asked Mediums on the LeetCode top 100.

Scroll to Top