Return True with Probability p/q: Sampling Without Bias

Return True with Probability p/q: Sampling Without Bias

“Given a function that returns a uniform random value, write a function that returns True with probability p/q.” This is a foundational probability / programming interview question that tests your understanding of uniform random sampling and your ability to convert between random sources. Variants include “given rand10, generate rand7,” “given a biased coin, simulate a fair coin,” and “implement weighted random selection.” This guide covers the core technique, the standard variants with working code, and the common pitfalls.

Problem Statement

Given a function random() that returns a uniform float in [0, 1), implement a function that returns True with probability p/q and False with probability 1 - p/q.

Examples:

  • p = 1, q = 2 → fair coin (50/50)
  • p = 1, q = 4 → True 25% of the time
  • p = 3, q = 7 → True ~42.86% of the time

Approach 1: Direct Comparison

If you have random() returning a uniform value in [0, 1), comparing against the threshold p/q works directly.

import random

def biased_bool(p: int, q: int) -> bool:
    """Return True with probability p/q."""
    return random.random() < p / q


# Test (run many times to verify distribution)
true_count = sum(biased_bool(3, 7) for _ in range(100_000))
print(true_count / 100_000)  # roughly 0.4286

Complexity: O(1) per call. The standard answer when uniform random() is available.

Variant 1: Generate rand_k from rand_n

Given rand7() (returns 1–7 uniformly), implement rand10(). Or vice versa.

Easy direction: smaller from larger

Given rand7 → rand5: just call rand7, retry if > 5.

def rand5_from_rand7(rand7) -> int:
    """Generate uniform 1..5 from rand7."""
    while True:
        x = rand7()  # 1..7
        if x <= 5:
            return x

Expected calls: 7/5 = 1.4 (from geometric distribution). Some calls “waste” 6 and 7 outcomes; the retry loop ensures uniformity.

Hard direction: larger from smaller

Given rand5 → rand7: combine multiple rand5 calls. The standard trick is to compute (rand5() - 1) * 5 + rand5(), which gives a uniform 1..25. Then map 1..21 to 1..7 (each value of rand7 corresponds to 3 values), and retry if the result is 22..25.

def rand7_from_rand5(rand5) -> int:
    """Generate uniform 1..7 from rand5."""
    while True:
        x = (rand5() - 1) * 5 + rand5()  # uniform 1..25
        if x <= 21:
            return ((x - 1) % 7) + 1

Expected calls: 2 × (25/21) ≈ 2.38 calls to rand5 per call to rand7.

Variant 2: Fair Coin from Biased Coin (Von Neumann)

Given a biased coin that lands heads with probability p (unknown), produce a fair 50/50 outcome.

def fair_from_biased(biased) -> bool:
    """Von Neumann's trick: pair flips, accept HT or TH."""
    while True:
        first = biased()
        second = biased()
        if first != second:
            return first  # H if first=T,second=H else T

Why this works: Pairs (H,T) and (T,H) each have probability p(1-p) — equal. Pairs (H,H) and (T,T) get rejected (retry). Result: 50/50 fair coin.

Expected calls: 2 / (2p(1-p)). For p = 0.5 (already fair), expected 4 calls. For p near 0 or 1, much higher.

Variant 3: Weighted Random Selection

Given an array of weights, return index i with probability weights[i] / sum(weights). (LeetCode #528)

import bisect

class Solution:
    def __init__(self, weights: list[int]):
        # Cumulative weights
        self.total = sum(weights)
        self.cumulative = []
        running = 0
        for w in weights:
            running += w
            self.cumulative.append(running)

    def pick_index(self) -> int:
        """O(log n) per call."""
        target = random.random() * self.total
        return bisect.bisect_left(self.cumulative, target)


# Test
sol = Solution([1, 3, 2])  # weights
counts = [0, 0, 0]
for _ in range(100_000):
    counts[sol.pick_index()] += 1
print(counts)  # roughly [16667, 50000, 33333]

Complexity: O(n) preprocessing, O(log n) per call. The standard answer for weighted selection.

Variant 4: Reservoir Sampling

Given a stream of unknown length, select k random elements uniformly. Standard technique: keep first k elements; for each subsequent element at index i, replace one of the k with probability k/i.

import random

def reservoir_sample(stream, k: int) -> list:
    """Sample k elements uniformly from stream of unknown length."""
    reservoir = []
    for i, item in enumerate(stream):
        if i < k:
            reservoir.append(item)
        else:
            j = random.randint(0, i)
            if j < k:
                reservoir[j] = item
    return reservoir

Useful when the stream is too large to fit in memory or unknown in length. Each element ends up with probability k/n where n is the (unknown) total length.

Common Mistakes

  • Bias from non-uniform mapping. Mapping rand7 to {0, 1} via mod 2 gives 4/7 zeros and 3/7 ones — biased. Always retry on out-of-range outcomes to maintain uniformity.
  • Forgetting the retry loop. Direct mapping that “wraps around” introduces bias. The retry loop preserves uniformity at the cost of extra calls.
  • Confusing rand_n() = “1 to n” with “0 to n-1”. Conventions vary. Clarify with the interviewer; adjust formulas accordingly.
  • Bad expected-call analysis. “Some calls retry” doesn’t mean infinite loop; the retry probability is bounded, so expected calls are finite. Compute via geometric distribution: expected calls = 1 / acceptance probability.
  • Using floor/ceiling incorrectly. When mapping a uniform [0, 1) value to discrete outcomes, off-by-one errors can shift the distribution. int(random() * n) gives 0..n-1 uniformly; int(random() * n) + 1 gives 1..n uniformly.

Frequently Asked Questions

What’s the expected interview answer for “rand7 from rand5”?

Compute (rand5() - 1) * 5 + rand5() to get uniform 1..25. Accept if ≤ 21; map to 1..7 via mod. Otherwise retry. Expected ~2.38 calls per output. Walk through the uniformity reasoning: 25 outcomes / 7 buckets = 3 each + 4 leftover that we discard.

Why not just (rand5() – 1) % 7 + 1 to map?

That gives biased output. rand5 returns values 1..5, so (rand5() – 1) is 0..4. Mod 7 doesn’t wrap to cover all 7 outputs. Even if you use rand7, mod operations on partial-range inputs are biased. Always use the “scale up, accept-or-retry” pattern.

How does Von Neumann’s fair-coin trick work?

Pair consecutive flips. The pair “HT” has probability p(1-p); the pair “TH” has probability (1-p)p. They’re equal, so accepting one or the other based on which comes first gives 50/50. Pairs “HH” and “TT” get rejected (retry). The bias of the coin doesn’t matter because the trick exploits the symmetry of pair probabilities.

What’s the time complexity of weighted random selection?

O(n) preprocessing to compute cumulative weights, O(log n) per query via binary search. For a small number of weights or rare changes to weights, this is optimal. For frequently-changing weights, consider segment trees or alias methods (constant time per query at the cost of O(n) reconstruction).

Why is reservoir sampling useful?

It samples k elements from a stream without knowing the total length. Memory: O(k). Critical for large datasets that don’t fit in memory or for online algorithms where the stream is infinite. Used in many production data pipelines (log sampling, A/B test cohort selection).

See also: Generate Random NumbersTwo SumPower of Two Check

💡Strategies for Solving This Problem

Continued Fractions and Approximation

This appeared at a quant firm interview. It's about approximating decimals as fractions with tolerance. Tests understanding of numerical methods and algorithmic precision.

The Problem

Convert decimal to fraction: 0.25 → 1/4. With tolerance: 0.24 with tolerance 0.01 still gives 1/4 (since |0.25 - 0.24| < 0.01).

Naive Approach: Brute Force

Try all denominators from 1 to some limit. For each, find best numerator. Check if within tolerance.

O(n²) where n is denominator limit. Works but slow.

Better: Continued Fractions

Use Euclidean algorithm to find best rational approximation. Produces sequence of "convergents" that are optimal approximations.

For π = 3.14159...:

  • 3/1
  • 22/7 (error 0.0013)
  • 333/106
  • 355/113 (error 0.000000027!)

Each convergent is the best approximation for its denominator size.

Algorithm Steps

  1. Start with value v
  2. Integer part: floor(v)
  3. Fractional part: v - floor(v)
  4. Invert fractional part: 1 / frac
  5. Repeat until frac < tolerance or denominator too large

Key Insight

Each step of Euclidean algorithm gives next convergent. These are provably optimal approximations.

Scroll to Top