Probability and Statistics Interview Patterns

Core Probability Concepts

Probability questions test your ability to reason about uncertainty and expected behavior. Three concepts appear repeatedly in interviews:

Expected value is the probability-weighted average of all outcomes. For a fair die: E[X] = (1+2+3+4+5+6)/6 = 3.5. Linearity of expectation lets you sum expected values of independent (or even dependent) events.

Birthday problem: how many people do you need in a room before there is a 50% chance two share a birthday? The answer is 23. The complement approach: P(all different) = 365/365 * 364/365 * … * (365-n+1)/365. This drops below 0.5 at n=23. The approximation is n ~ sqrt(2 * 365 * ln(2)) ~ 23.

Geometric distribution: the number of trials until the first success when each trial succeeds with probability p. E[trials] = 1/p. Example: expected rolls of a die to see a 6 is 6.

Reservoir Sampling

Problem: given a stream of unknown length N, select k items uniformly at random using O(k) memory.

def reservoir_sample(stream, k):
    reservoir = []
    for i, item in enumerate(stream):
        if i < k:
            reservoir.append(item)
        else:
            # Replace element at random index with probability k/(i+1)
            j = random.randint(0, i)
            if j < k:
                reservoir[j] = item
    return reservoir

Correctness proof: at step i (0-indexed), item i is selected with probability k/(i+1). An earlier item at position j survives step i with probability 1 – (k/(i+1)) * (1/k) = i/(i+1). By induction, after processing N items each item has probability k/N of being in the reservoir.

Weighted reservoir sampling (Algorithm A-Res): assign each item a key = uniform(0,1)^(1/weight), keep the k items with the largest keys. This gives exact weighted sampling over a stream.

Random Shuffle / Fisher-Yates

The Fisher-Yates (Knuth) shuffle produces a uniformly random permutation in O(n) time.

def fisher_yates_shuffle(arr):
    n = len(arr)
    for i in range(n - 1, 0, -1):
        j = random.randint(0, i)  # j in [0, i] inclusive
        arr[i], arr[j] = arr[j], arr[i]
    return arr

Why the naive approach is wrong: picking a random position from [0, n-1] for every element produces n^n possible outcomes, which is not divisible by n! for most n. The result is a biased permutation. Fisher-Yates produces exactly n! outcomes.

Weighted Random Selection

Given items with weights [w0, w1, …, wn-1], select item i with probability wi / sum(weights).

import bisect
import random

class WeightedRandom:
    def __init__(self, weights):
        self.prefix = []
        total = 0
        for w in weights:
            total += w
            self.prefix.append(total)
        self.total = total

    def pick(self):
        target = random.uniform(0, self.total)
        # bisect_left finds leftmost index where prefix[i] >= target
        return bisect.bisect_left(self.prefix, target)

Construction is O(n), each pick is O(log n). The Alias Method achieves O(1) picks after O(n) preprocessing by decomposing the distribution into a fair coin flip plus a bias table. Worth mentioning in interviews when the pick frequency is very high.

Random Point in Circle

To generate a uniformly random point inside a circle of radius R:

def random_point_in_circle(R):
    while True:
        x = random.uniform(-R, R)
        y = random.uniform(-R, R)
        if x*x + y*y <= R*R:
            return (x, y)  # rejection sampling

Acceptance rate is pi/4 ~ 78.5%, so expected iterations is about 1.27.

Why you cannot use r = random.uniform(0, R) with angle: the area element in polar coordinates is r*dr*dtheta. Uniform r gives density proportional to 1/r, which oversamples the center. Correct polar approach: r = R * sqrt(random.uniform(0, 1)). The sqrt corrects for the area scaling.

Bloom Filters in Probability

A Bloom filter uses m bits and k hash functions to test set membership with no false negatives and a tunable false positive rate.

False positive rate: p = (1 – e^(-kn/m))^k where n is the number of inserted elements. After inserting n elements, each bit is set with probability 1 – e^(-kn/m).

Optimal k: k = (m/n) * ln(2). With optimal k, p = (1/2)^k ~ 0.6185^(m/n). To achieve 1% false positive rate, you need about 9.6 bits per element.

Applications:

Chrome Safe Browsing: bloom filter of malicious URLs loaded locally, full check only on positive hit
Cassandra: per-SSTable bloom filters to avoid disk reads for missing keys
Akamai CDN: one-hit-wonder filter to avoid caching items requested only once
Bitcoin: SPV clients use bloom filters to request only relevant transactions from full nodes

Monte Carlo Estimation

def estimate_pi(n_samples):
    inside = 0
    for _ in range(n_samples):
        x, y = random.random(), random.random()
        if x*x + y*y <= 1.0:
            inside += 1
    return 4 * inside / n_samples

Convergence rate is O(1/sqrt(N)) by the Central Limit Theorem. To gain one extra decimal place of accuracy, you need 100x more samples. With N=10^6, expect about 3 correct digits.

Coupon Collector Problem

How many random draws (with replacement) from n coupons until you have collected all n types?

E[draws] = n * H(n) = n * (1 + 1/2 + 1/3 + … + 1/n) ~ n * ln(n)

When you have k distinct coupons, the expected additional draws to get the next new one is n/(n-k). Sum these from k=0 to n-1: n/n + n/(n-1) + … + n/1 = n * H(n).

For n=365 (birthday problem variant): expected draws to see all birthdays is 365 * ln(365) + O(1) ~ 2153.

rand5 from rand7 / rand7 from rand5

The general pattern is rejection sampling: generate a uniform distribution over a larger range, reject values outside the target range.

# rand7 from rand5
def rand7():
    while True:
        # Generate uniform over [0, 24] using 5*rand5() + rand5()
        val = 5 * (rand5() - 1) + (rand5() - 1)  # [0, 24]
        if val < 21:  # 21 = 3*7, reject values 21-24
            return val % 7 + 1

# rand5 from rand7 (simpler - just reject 6 and 7)
def rand5_from_rand7():
    while True:
        val = rand7()
        if val <= 5:
            return val

For rand7 from rand5: acceptance rate is 21/25 = 84%. Expected calls to rand5 per call to rand7 is 2 * (1/0.84) ~ 2.38.

The key insight: if you have randN, you can generate uniform integers in [0, N^k – 1] using base-N representation with k calls, then reject the remainder of the range that does not divide evenly into your target range.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is reservoir sampling and when do you use it?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Reservoir sampling selects k items from a stream of unknown size n with uniform probability, using O(k) memory. Algorithm: fill a reservoir with the first k items. For each subsequent item i (0-indexed), generate a random index j in [0, i]. If j < k, replace reservoir[j] with item i. Correctness: any item ends up in the final reservoir with exactly probability k/n. Used when the stream is too large to fit in memory or when n is not known in advance. Example: select 1000 random log lines from a multi-gigabyte log file in a single pass.”}},{“@type”:”Question”,”name”:”How do you implement a fair random shuffle in O(n) time?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use the Fisher-Yates algorithm: iterate from index n-1 down to 1. At each index i, pick a random index j in [0, i] inclusive, then swap arr[i] with arr[j]. This produces all n! permutations with equal probability. The critical mistake is using random.randint(0, n-1) at every step instead of random.randint(0, i) – this biases the result because it produces n^n outcomes instead of n!. Fisher-Yates runs in O(n) time and O(1) extra space. This is LC 384 (Shuffle an Array).”}},{“@type”:”Question”,”name”:”How does a Bloom filter work and what are its tradeoffs?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A Bloom filter uses m bits and k hash functions to test set membership. Insert: hash the item k times, set each bit position to 1. Query: hash k times, check if all positions are 1. False positives are possible (all k positions happen to be set by other insertions). False negatives are impossible. False positive rate = (1 – e^(-kn/m))^k where n is the number of items inserted. Optimal k = (m/n) * ln(2). For 1% false positive rate, need ~9.6 bits per element. Applications: Chrome Safe Browsing (URL blacklist lookup before DB check), Cassandra (skip SSTables that do not contain a key), cache penetration prevention.”}},{“@type”:”Question”,”name”:”How do you generate a weighted random selection efficiently?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Prefix sum approach: compute cumulative weights, generate a random number r in [0, total_weight), binary search for the first prefix sum > r – O(log n) per query after O(n) build. Alias Method: O(n) build, O(1) per query. Preprocess weights into probability and alias tables where each bucket has a primary item and at most one alias. On query, pick a random bucket, then flip a biased coin to choose the bucket item or its alias. The Alias Method is preferred for high-throughput sampling (word2vec negative sampling, A/B test traffic splitting).”}},{“@type”:”Question”,”name”:”What is the birthday problem and why does it matter for system design?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”With n people, the probability that two share a birthday exceeds 50% at just 23 people. Formula: P(collision) ~ 1 – e^(-n^2 / 2m) where m is the number of equally likely outcomes. For hash tables with m buckets and n inserts, expected first collision at n ~ sqrt(m). For UUIDs (128-bit, m ~ 3.4e38): need ~sqrt(3.4e38) ~ 1.8e19 UUIDs before expecting a collision – effectively never. For 32-bit IDs (m ~ 4e9): collision expected after ~65K inserts. This guides ID space selection. Also applies to hash function collision analysis.”}}]}

Meta coding and data science interviews frequently test probability. See common questions for Meta interview: probability and statistics questions.

Databricks interviews cover probability and ML-related statistics. See patterns for Databricks interview: probability and ML statistics.

Coinbase interviews test probability reasoning and algorithmic thinking. Review patterns for Coinbase interview: probability and financial algorithms.