The 100 Prisoners and the Light Bulb: Signaling Under Constraint

⏱ 7 min read

One hundred prisoners are sent to solitary cells. Each day, a guard picks one prisoner uniformly at random — possibly the same prisoner many times — and brings them to a room with a single light bulb. The prisoner can choose to flip the switch (or not) and then is sent back to their cell. The bulb starts in the off position. At any point, any prisoner can declare “every prisoner has been to this room at least once”. If they are correct, all prisoners are freed. If they are wrong, all prisoners are executed. The prisoners can confer once before being separated to agree on a strategy. What strategy maximizes their chance of survival?

The 100 prisoners and the light bulb is a classic puzzle in the quant interview canon, asked at Citadel, Jane Street, Two Sigma, and many smaller trading firms. It is one of the most beautiful “communicate under constraint” problems in interview literature, and the elegant solution — the counter-leader strategy — is a staple of game theory and information theory.

The setup matters

Several details of the setup are doing real work and should be confirmed up front in any interview:

The bulb starts in a known state (off). If it could start in either state, the strategy needs an extra correction step.
The prisoner brought to the room is chosen uniformly at random with replacement. This means each prisoner has probability 1/100 of being picked each day independently.
The prisoners can confer once before being separated. They can agree on a strategy but cannot communicate after.
Any prisoner can declare; only one declaration is allowed.
The light bulb is the only communication channel between prisoners.

A polished candidate confirms these constraints before attempting a strategy. Different problem variants exist (random selection without replacement, multiple declarations allowed, multiple bulbs), and the optimal strategy changes for each.

The counter-leader strategy

Designate one prisoner as the “leader” (counter). All other 99 prisoners are “regular” prisoners.

Regular prisoner rule: The first time a regular prisoner enters the room and finds the light off, they turn it on. Otherwise (light already on, or this is not their first off-finding), they do nothing. Each regular prisoner therefore turns on the light exactly once in their lifetime.
Leader rule: Every time the leader enters the room and finds the light on, they turn it off and increment their internal counter. The leader does nothing when the light is off.
Termination rule: When the leader’s counter reaches 99, they declare. Every regular prisoner has turned on the light at least once, which means every regular prisoner has been in the room. Combined with the leader having been in the room (they made the count), all 100 prisoners have visited.

The strategy is correct: if the leader’s counter reaches 99, all 99 regular prisoners must have been in the room (each turned on the light), and the leader has been in the room (they observed the on-events). All 100 have visited.

Why the strategy works

The strategy converts the multi-prisoner communication problem into a single-bit signaling channel where the channel can carry exactly one “I have been here” event per regular prisoner. The leader counts the events. The cleverness is in ensuring no event is double-counted or missed:

Each regular prisoner can only turn on the light once ever. This guarantees no double-counting.
The leader resets the light each time they observe an “on” state. This frees the channel for the next regular prisoner.
The light bulb’s “on” state is the persistent message; the leader’s accumulator is the counter.

One subtle case: what if a regular prisoner enters when the light is already on? They do nothing. The signal they would have sent has already been queued for the leader; doing nothing preserves the invariant.

Expected time to completion

The expected time for the strategy to terminate is the sum of expected waiting times across all 99 “transmissions”. For each transmission, we need:

The next un-transmitted regular prisoner to be selected (probability 1/100 per day, expected wait 100 days).
The leader to be selected after the prisoner has turned on the light (probability 1/100 per day, expected wait 100 days).

So each transmission takes an expected 200 days. Across 99 transmissions, expected total time is approximately 99 × 200 = 19,800 days, or about 54 years.

The first transmissions go faster (any of 99 un-transmitted prisoners can be selected) and the last transmissions go slower (only one un-transmitted prisoner remains, and the wait for them is exactly 100 days expected). The careful expected-value calculation:

E[total time] = E[wait for first regular prisoner] + E[wait for leader to count] + E[wait for second regular prisoner] + E[wait for leader to count] + … = sum from i=1 to 99 of (100/i + 100) ≈ 100 × H₉₉ + 99 × 100 ≈ 517 + 9900 ≈ 10,417 days. Roughly 28 years.

The exact figure depends on whether we model the regular-prisoner waits as parallel (any un-counted prisoner can serve) or sequential. Different sources give figures in the 27–32 year range. Either way, the answer is a number measured in decades, which is itself part of the interview signal — does the candidate intuitively expect “the strategy works fast” and have to recalibrate when the math comes out at 28 years?

Variations and improvements

The standard counter-leader strategy is not optimal. Several improvements exist:

Two-phase strategy. For the first 1,000 days or so, use a different signaling protocol to identify a leader more efficiently. Then switch to the standard strategy. Saves expected time.
Multiple counters. Designate several “tier-2 leaders” who each count partial groups, then signal up to the master leader. Complicated but reduces variance.
The bulb starts in unknown state. A more challenging variant — the strategy needs an initial calibration phase where one trusted prisoner establishes the starting state.
The selection is not uniform. If the guard’s selection is adversarial or biased, the strategy needs robustness modifications.
Prisoners cannot tell if the bulb is on. A communication-theory variant where the channel is even noisier.

A polished candidate, after producing the standard strategy, may be asked to discuss the optimal one. The optimal strategy is genuinely hard and only the strongest candidates produce it under interview pressure.

What this puzzle tests

The 100 prisoners and the light bulb tests three skills:

Information-theoretic thinking. Recognizing the bulb as a 1-bit channel and reasoning about how to use it efficiently.
Invariant design. The “regular prisoners only signal once” invariant prevents double-counting. Articulating invariants under pressure is a generalizable skill.
Asymmetry. Designating one prisoner as the leader is the key insight. Symmetric strategies (everyone behaves the same) cannot work because there is no way to count without a fixed observer.

For quant trading firms, the puzzle is also a test of “can this person reason about randomness and time scales without panicking when the math comes out at 28 years”. A trader who is uncomfortable with the answer being a long time is going to be uncomfortable with options pricing at distant strikes.

Common failure modes

Symmetric strategies. “Every prisoner increments a count.” This fails because the bulb can only hold one bit, not a count.
Counting without a leader. “All prisoners count by some shared rule.” This fails because the prisoners cannot tell whose turn it is.
Time confusion. Stating that the strategy completes in days or weeks. Failing to recognize that 99 transmissions × 200 days each is decades.
Forgetting the bulb’s initial state. If the bulb starts on by accident, the standard strategy miscounts. The setup matters.

Is it asked in 2026?

Yes, regularly at quant firms — Jane Street, Citadel, Two Sigma, HRT, and others use this puzzle and its variants. It is not commonly asked in tech interviews, where the brainteaser tradition is essentially dead, but in quant tradition it remains one of the most beloved puzzles because the underlying skills (information theory, invariant design, expected-value calculation under uncertainty) are job-relevant for traders.

The puzzle has not been retired despite being well-known because the standard solution does not exhaust the question. Follow-ups about optimal strategies, expected-value calculations, and adversarial variants give the interviewer plenty of material even when the candidate has seen the basic version before.

Frequently Asked Questions

What is the optimal strategy?

The simple counter-leader strategy described here is the standard answer; it is not optimal. Several refinements exist (two-phase strategies, tier-2 leaders), but the standard version is what most interviewers expect to see in the time available. Asking for the optimal version after the standard is a common follow-up.

How long does the strategy take?

Approximately 28 years in expectation, depending on the exact modeling. Long but finite. The fact that the answer is in decades is part of the interview signal.

What if the light bulb starts on instead of off?

The standard strategy needs a calibration phase. One designated prisoner (different from the leader) “burns” the initial on-state by leaving the bulb in a known state on their first visit, after which the standard strategy proceeds. This adds expected time but preserves correctness.

Why is asymmetry necessary?

Without a designated counter, no prisoner has the information needed to know when to declare. Counting requires accumulation, and accumulation requires a fixed observer. The leader is what makes counting possible.

Can the puzzle be solved without a light bulb?

Not without some persistent communication channel. The bulb is the channel; without it, prisoners cannot share information across visits, and no strategy works.