Coin Flip Streaks: HT vs TT and the Waiting-Time Paradox

⏱ 6 min read

You flip a fair coin repeatedly. What is the expected number of flips until you see the pattern HT? And how does that compare to the expected wait for HH?

Most people guess that both waits are equal — both patterns have probability 1/4 in any pair of flips, so symmetry suggests their expected first-occurrence times should be equal. The actual answer: E[HT] = 4, E[HH] = 6. This is the waiting-time paradox, and it is one of the most-asked probability puzzles at quant firms because it reveals something deep about how overlap structure affects waiting times for patterns.

Setup and the answer

Flip a fair coin. Wait for the first occurrence of pattern P. Let E[P] be the expected number of flips.

E[HT] = 4
E[TH] = 4
E[HH] = 6
E[TT] = 6

Both single-flip patterns have probability 1/4 in a fixed pair, but the expected first-occurrence time differs by 50%. The reason is a structural property of patterns called the autocorrelation, and it captures how a partial match for the pattern can or cannot continue into a complete match.

Why HT is faster than HH

Think about what happens when you have just seen one flip that could start the pattern. You have an H. Now you flip again.

For pattern HT: The next flip is either T (you complete HT and stop) or H (you have not seen HT, but you now have an H, which is the start of HT — you can continue from here). Either way, the H you already saw is not “wasted”. You make progress with every flip.

For pattern HH: The next flip is either H (you complete HH and stop) or T (you fail; you have to start over from scratch — your one H is now wasted because T cannot extend toward HH). When you fail, you reset.

The reset behavior in HH is what makes it slower. After a failed match, you are back to “no progress”, whereas in HT a failed match (H followed by H) still leaves you with an H, ready to try again with the next flip.

The formal calculation: E[HT] = 4

Set up states by tracking how much of the pattern has been matched so far.

State 0: No part of HT seen. Need a fresh H to advance.
State 1: Seen H, waiting for T.
State 2: Seen HT (absorbing).

Let e0 = expected flips from state 0 to state 2. Let e1 = expected flips from state 1 to state 2.

From state 0: with probability 1/2 the next flip is H (go to state 1, used 1 flip), with probability 1/2 the next flip is T (stay in state 0, used 1 flip). So e0 = 1 + 0.5 × e1 + 0.5 × e0, which gives e0 = 2 + e1.

From state 1: with probability 1/2 the next flip is T (go to state 2, used 1 flip), with probability 1/2 the next flip is H (stay in state 1, used 1 flip). So e1 = 1 + 0.5 × 0 + 0.5 × e1, which gives e1 = 2.

Therefore e0 = 2 + 2 = 4. The expected wait for HT is 4 flips.

The formal calculation: E[HH] = 6

Same approach with three states.

State 0: No part of HH seen.
State 1: Seen H, waiting for second H.
State 2: Seen HH (absorbing).

From state 0: e0 = 1 + 0.5 × e1 + 0.5 × e0, giving e0 = 2 + e1.

From state 1: with probability 1/2 the next flip is H (go to state 2), with probability 1/2 the next flip is T (back to state 0, since the lone H is now followed by T and cannot contribute to HH). So e1 = 1 + 0.5 × 0 + 0.5 × e0.

Substituting: e1 = 1 + 0.5 × (2 + e1) = 2 + 0.5 × e1, giving e1 = 4. So e0 = 2 + 4 = 6.

The “back to state 0” transition is what makes HH slower. In HT, the failed-match transition was “stay at state 1” (the H is still useful). In HH, the failed-match transition is “go back to state 0” (the H followed by T provides no continuation).

The Conway leading number

For longer patterns, the structural property generalizing this insight is captured by Conway’s leading number, defined for any pattern P. The leading number measures how often the pattern “overlaps with itself” in a specific sense, and the expected wait for pattern P is approximately 2^|P| × LeadNumber(P) for fair coin flips (with some exact-formula refinement). Patterns that overlap a lot with themselves wait longer; patterns that do not overlap wait less.

For HT: Conway leading number = 2. Expected wait = 2² × 1 = 4. Confirms.

For HH: Conway leading number = 3. Expected wait = 2² × 1.5 = 6. Confirms.

For HHH: Conway leading number = 7. Expected wait = 2³ × 1.75 = 14.

Penney’s game: the most counterintuitive consequence

Walter Penney in 1969 noticed that the unequal waiting times have a surprising consequence for two-player betting. In Penney’s game, two players each pick a 3-flip pattern. Coins are flipped until one of the patterns appears. The player whose pattern appears first wins.

The shocker: this game is not transitive. For any pattern player 1 picks, player 2 (who picks second, with knowledge of player 1’s choice) can always pick a pattern that beats it more than 50% of the time. In fact, for any 3-flip pattern, there exists a different 3-flip pattern that beats it with at least 2/3 probability.

The standard rule for picking a winning response to player 1’s pattern: take the opposite of the second flip and prepend it to the pattern, dropping the last flip. So if player 1 picks HHT, player 2 picks THH and wins more than 2/3 of the time.

Penney’s game appears regularly in advanced quant interviews because it tests whether the candidate understands waiting-time paradox at a deep level. The game is featured in many puzzle-book chapters and has been analyzed in dozens of papers.

What this question tests

Quant firms ask coin-flip streak questions because the underlying skill — recognizing that “probability of pattern occurring” and “expected first-occurrence time” are not the same thing — is exactly the skill that prevents traders from making systematic errors when reasoning about market patterns. Many real-world trading strategies are based on rare-event reasoning, and confusing “frequency” with “first-arrival time” causes consistent miscalibration.

The signal layers in a strong answer:

Recognize the structural difference. The candidate articulates that HT and HH have different “reset” behavior, without writing equations.
Set up Markov chain states cleanly. The state-by-state expected-value derivation is the standard tool. Speed and cleanliness matter.
Solve the linear system. The chain gives a small system of equations; the candidate should solve it efficiently.
Generalize to longer patterns. A polished candidate extends to HHH or HTT and computes the expected wait.

Variations interviewers ask

Compare HHT vs HTH vs THH. Different waiting times for different 3-flip patterns. Tests case analysis.
Penney’s game variant. Pick the winning response to a given pattern.
Biased coin. What happens when the coin is biased? Generalize the formulas.
Expected number of HT subsequences in n flips. Linearity of expectation gives a clean answer; tests whether the candidate uses the right tool.
Variance of the waiting time. Harder calculation; advanced interview territory.

Is it asked in 2026?

Yes, regularly at quant firms. The HH vs HT comparison is essentially required knowledge for entry-level quant interviews at Jane Street, Citadel, HRT, Two Sigma, Optiver, and SIG. Penney’s game appears as a follow-up. The puzzle has not been retired despite being well-known because the underlying skill is genuinely useful and the question can be extended endlessly with longer patterns and biased coins.

In tech interviews, this puzzle is rare. Some statistics-heavy data science roles include it as a probability sanity check, but it is far more common in finance.

Frequently Asked Questions

Why is E[HH] = 6 and not 4?

When you fail to extend HH (you get HT instead of HH), you reset to state 0 because the T cannot start a new HH. With HT, failing to extend HH (you get HH instead of HT) still leaves you with an H, ready to continue. The reset behavior makes HH slower.

Are all 4-flip patterns equally fast?

No. HHHH is the slowest (E = 30). HHHT is faster (E = 16). The Conway leading number captures the differences cleanly.

Why is Penney’s game non-transitive?

Because the waiting-time paradox makes some patterns fundamentally faster than others. Player 2, picking with knowledge of player 1’s pattern, can always exploit the structural difference. The non-transitivity is the surprising consequence.

How is this different from “probability of pattern in n flips”?

Probability of pattern in a fixed window is symmetric — HT and HH each have probability 1/4 of appearing in any given pair. But “expected wait for first occurrence in an infinite sequence” is a different question, and asymmetry shows up there because of overlap structure.

Is the Markov chain approach the only way to derive this?

It is the most standard. Generating functions also work and are more general. Conway’s leading number gives a closed-form shortcut. For interview purposes, the Markov chain derivation is the bar.