What is reservoir sampling and when do you use it?

Reservoir sampling selects k items from a stream of unknown size n with uniform probability, using O(k) memory. Algorithm: fill a reservoir with the first k items. For each subsequent item i (0-indexed), generate a random index j in [0, i]. If j < k, replace reservoir[j] with item i. Correctness: any item ends up in the final reservoir with exactly probability k/n. Used when the stream is too large to fit in memory or when n is not known in advance. Example: select 1000 random log lines from a multi-gigabyte log file in a single pass.

How do you implement a fair random shuffle in O(n) time?

Use the Fisher-Yates algorithm: iterate from index n-1 down to 1. At each index i, pick a random index j in [0, i] inclusive, then swap arr[i] with arr[j]. This produces all n! permutations with equal probability. The critical mistake is using random.randint(0, n-1) at every step instead of random.randint(0, i) - this biases the result because it produces n^n outcomes instead of n!. Fisher-Yates runs in O(n) time and O(1) extra space. This is LC 384 (Shuffle an Array).

How does a Bloom filter work and what are its tradeoffs?

A Bloom filter uses m bits and k hash functions to test set membership. Insert: hash the item k times, set each bit position to 1. Query: hash k times, check if all positions are 1. False positives are possible (all k positions happen to be set by other insertions). False negatives are impossible. False positive rate = (1 - e^(-kn/m))^k where n is the number of items inserted. Optimal k = (m/n) * ln(2). For 1% false positive rate, need ~9.6 bits per element. Applications: Chrome Safe Browsing (URL blacklist lookup before DB check), Cassandra (skip SSTables that do not contain a key), cache penetration prevention.

How do you generate a weighted random selection efficiently?

Prefix sum approach: compute cumulative weights, generate a random number r in [0, total_weight), binary search for the first prefix sum > r - O(log n) per query after O(n) build. Alias Method: O(n) build, O(1) per query. Preprocess weights into probability and alias tables where each bucket has a primary item and at most one alias. On query, pick a random bucket, then flip a biased coin to choose the bucket item or its alias. The Alias Method is preferred for high-throughput sampling (word2vec negative sampling, A/B test traffic splitting).

What is the birthday problem and why does it matter for system design?

With n people, the probability that two share a birthday exceeds 50% at just 23 people. Formula: P(collision) ~ 1 - e^(-n^2 / 2m) where m is the number of equally likely outcomes. For hash tables with m buckets and n inserts, expected first collision at n ~ sqrt(m). For UUIDs (128-bit, m ~ 3.4e38): need ~sqrt(3.4e38) ~ 1.8e19 UUIDs before expecting a collision - effectively never. For 32-bit IDs (m ~ 4e9): collision expected after ~65K inserts. This guides ID space selection. Also applies to hash function collision analysis.

Probability and Statistics Interview Patterns

Q: How do you implement a fair random shuffle in O(n) time?

Use the Fisher-Yates algorithm: iterate from index n-1 down to 1. At each index i, pick a random index j in [0, i] inclusive, then swap arr[i] with arr[j]. This produces all n! permutations with equal probability. The critical mistake is using random.randint(0, n-1) at every step instead of random.randint(0, i) - this biases the result because it produces n^n outcomes instead of n!. Fisher-Yates runs in O(n) time and O(1) extra space. This is LC 384 (Shuffle an Array).

Q: How do you generate a weighted random selection efficiently?

Prefix sum approach: compute cumulative weights, generate a random number r in [0, total_weight), binary search for the first prefix sum > r - O(log n) per query after O(n) build. Alias Method: O(n) build, O(1) per query. Preprocess weights into probability and alias tables where each bucket has a primary item and at most one alias. On query, pick a random bucket, then flip a biased coin to choose the bucket item or its alias. The Alias Method is preferred for high-throughput sampling (word2vec negative sampling, A/B test traffic splitting).

⏱ 7 min read

Core Probability Concepts

Probability questions test your ability to reason about uncertainty and expected behavior. Three concepts appear repeatedly in interviews:

Expected value is the probability-weighted average of all outcomes. For a fair die: E[X] = (1+2+3+4+5+6)/6 = 3.5. Linearity of expectation lets you sum expected values of independent (or even dependent) events.

Birthday problem: how many people do you need in a room before there is a 50% chance two share a birthday? The answer is 23. The complement approach: P(all different) = 365/365 * 364/365 * … * (365-n+1)/365. This drops below 0.5 at n=23. The approximation is n ~ sqrt(2 * 365 * ln(2)) ~ 23.

Geometric distribution: the number of trials until the first success when each trial succeeds with probability p. E[trials] = 1/p. Example: expected rolls of a die to see a 6 is 6.

Reservoir Sampling

Problem: given a stream of unknown length N, select k items uniformly at random using O(k) memory.

def reservoir_sample(stream, k):
    reservoir = []
    for i, item in enumerate(stream):
        if i < k:
            reservoir.append(item)
        else:
            # Replace element at random index with probability k/(i+1)
            j = random.randint(0, i)
            if j < k:
                reservoir[j] = item
    return reservoir

Correctness proof: at step i (0-indexed), item i is selected with probability k/(i+1). An earlier item at position j survives step i with probability 1 – (k/(i+1)) * (1/k) = i/(i+1). By induction, after processing N items each item has probability k/N of being in the reservoir.

Weighted reservoir sampling (Algorithm A-Res): assign each item a key = uniform(0,1)^(1/weight), keep the k items with the largest keys. This gives exact weighted sampling over a stream.

Random Shuffle / Fisher-Yates

The Fisher-Yates (Knuth) shuffle produces a uniformly random permutation in O(n) time.

def fisher_yates_shuffle(arr):
    n = len(arr)
    for i in range(n - 1, 0, -1):
        j = random.randint(0, i)  # j in [0, i] inclusive
        arr[i], arr[j] = arr[j], arr[i]
    return arr

Why the naive approach is wrong: picking a random position from [0, n-1] for every element produces n^n possible outcomes, which is not divisible by n! for most n. The result is a biased permutation. Fisher-Yates produces exactly n! outcomes.

Weighted Random Selection

Given items with weights [w0, w1, …, wn-1], select item i with probability wi / sum(weights).

import bisect
import random

class WeightedRandom:
    def __init__(self, weights):
        self.prefix = []
        total = 0
        for w in weights:
            total += w
            self.prefix.append(total)
        self.total = total

    def pick(self):
        target = random.uniform(0, self.total)
        # bisect_left finds leftmost index where prefix[i] >= target
        return bisect.bisect_left(self.prefix, target)

Construction is O(n), each pick is O(log n). The Alias Method achieves O(1) picks after O(n) preprocessing by decomposing the distribution into a fair coin flip plus a bias table. Worth mentioning in interviews when the pick frequency is very high.

Random Point in Circle

To generate a uniformly random point inside a circle of radius R:

def random_point_in_circle(R):
    while True:
        x = random.uniform(-R, R)
        y = random.uniform(-R, R)
        if x*x + y*y <= R*R:
            return (x, y)  # rejection sampling

Acceptance rate is pi/4 ~ 78.5%, so expected iterations is about 1.27.

Why you cannot use r = random.uniform(0, R) with angle: the area element in polar coordinates is r*dr*dtheta. Uniform r gives density proportional to 1/r, which oversamples the center. Correct polar approach: r = R * sqrt(random.uniform(0, 1)). The sqrt corrects for the area scaling.

Bloom Filters in Probability

A Bloom filter uses m bits and k hash functions to test set membership with no false negatives and a tunable false positive rate.

False positive rate: p = (1 – e^(-kn/m))^k where n is the number of inserted elements. After inserting n elements, each bit is set with probability 1 – e^(-kn/m).

Optimal k: k = (m/n) * ln(2). With optimal k, p = (1/2)^k ~ 0.6185^(m/n). To achieve 1% false positive rate, you need about 9.6 bits per element.

Applications:

Chrome Safe Browsing: bloom filter of malicious URLs loaded locally, full check only on positive hit
Cassandra: per-SSTable bloom filters to avoid disk reads for missing keys
Akamai CDN: one-hit-wonder filter to avoid caching items requested only once
Bitcoin: SPV clients use bloom filters to request only relevant transactions from full nodes

Monte Carlo Estimation

def estimate_pi(n_samples):
    inside = 0
    for _ in range(n_samples):
        x, y = random.random(), random.random()
        if x*x + y*y <= 1.0:
            inside += 1
    return 4 * inside / n_samples

Convergence rate is O(1/sqrt(N)) by the Central Limit Theorem. To gain one extra decimal place of accuracy, you need 100x more samples. With N=10^6, expect about 3 correct digits.

Coupon Collector Problem

How many random draws (with replacement) from n coupons until you have collected all n types?

E[draws] = n * H(n) = n * (1 + 1/2 + 1/3 + … + 1/n) ~ n * ln(n)

When you have k distinct coupons, the expected additional draws to get the next new one is n/(n-k). Sum these from k=0 to n-1: n/n + n/(n-1) + … + n/1 = n * H(n).

For n=365 (birthday problem variant): expected draws to see all birthdays is 365 * ln(365) + O(1) ~ 2153.

rand5 from rand7 / rand7 from rand5

The general pattern is rejection sampling: generate a uniform distribution over a larger range, reject values outside the target range.

# rand7 from rand5
def rand7():
    while True:
        # Generate uniform over [0, 24] using 5*rand5() + rand5()
        val = 5 * (rand5() - 1) + (rand5() - 1)  # [0, 24]
        if val < 21:  # 21 = 3*7, reject values 21-24
            return val % 7 + 1

# rand5 from rand7 (simpler - just reject 6 and 7)
def rand5_from_rand7():
    while True:
        val = rand7()
        if val <= 5:
            return val

For rand7 from rand5: acceptance rate is 21/25 = 84%. Expected calls to rand5 per call to rand7 is 2 * (1/0.84) ~ 2.38.

The key insight: if you have randN, you can generate uniform integers in [0, N^k – 1] using base-N representation with k calls, then reject the remainder of the range that does not divide evenly into your target range.

Meta coding and data science interviews frequently test probability. See common questions for Meta interview: probability and statistics questions.

Databricks interviews cover probability and ML-related statistics. See patterns for Databricks interview: probability and ML statistics.

Coinbase interviews test probability reasoning and algorithmic thinking. Review patterns for Coinbase interview: probability and financial algorithms.