Core Probability Concepts
Probability questions test your ability to reason about uncertainty and expected behavior. Three concepts appear repeatedly in interviews:
Expected value is the probability-weighted average of all outcomes. For a fair die: E[X] = (1+2+3+4+5+6)/6 = 3.5. Linearity of expectation lets you sum expected values of independent (or even dependent) events.
Birthday problem: how many people do you need in a room before there is a 50% chance two share a birthday? The answer is 23. The complement approach: P(all different) = 365/365 * 364/365 * … * (365-n+1)/365. This drops below 0.5 at n=23. The approximation is n ~ sqrt(2 * 365 * ln(2)) ~ 23.
Geometric distribution: the number of trials until the first success when each trial succeeds with probability p. E[trials] = 1/p. Example: expected rolls of a die to see a 6 is 6.
Reservoir Sampling
Problem: given a stream of unknown length N, select k items uniformly at random using O(k) memory.
def reservoir_sample(stream, k):
reservoir = []
for i, item in enumerate(stream):
if i < k:
reservoir.append(item)
else:
# Replace element at random index with probability k/(i+1)
j = random.randint(0, i)
if j < k:
reservoir[j] = item
return reservoir
Correctness proof: at step i (0-indexed), item i is selected with probability k/(i+1). An earlier item at position j survives step i with probability 1 – (k/(i+1)) * (1/k) = i/(i+1). By induction, after processing N items each item has probability k/N of being in the reservoir.
Weighted reservoir sampling (Algorithm A-Res): assign each item a key = uniform(0,1)^(1/weight), keep the k items with the largest keys. This gives exact weighted sampling over a stream.
Random Shuffle / Fisher-Yates
The Fisher-Yates (Knuth) shuffle produces a uniformly random permutation in O(n) time.
def fisher_yates_shuffle(arr):
n = len(arr)
for i in range(n - 1, 0, -1):
j = random.randint(0, i) # j in [0, i] inclusive
arr[i], arr[j] = arr[j], arr[i]
return arr
Why the naive approach is wrong: picking a random position from [0, n-1] for every element produces n^n possible outcomes, which is not divisible by n! for most n. The result is a biased permutation. Fisher-Yates produces exactly n! outcomes.
Weighted Random Selection
Given items with weights [w0, w1, …, wn-1], select item i with probability wi / sum(weights).
import bisect
import random
class WeightedRandom:
def __init__(self, weights):
self.prefix = []
total = 0
for w in weights:
total += w
self.prefix.append(total)
self.total = total
def pick(self):
target = random.uniform(0, self.total)
# bisect_left finds leftmost index where prefix[i] >= target
return bisect.bisect_left(self.prefix, target)
Construction is O(n), each pick is O(log n). The Alias Method achieves O(1) picks after O(n) preprocessing by decomposing the distribution into a fair coin flip plus a bias table. Worth mentioning in interviews when the pick frequency is very high.
Random Point in Circle
To generate a uniformly random point inside a circle of radius R:
def random_point_in_circle(R):
while True:
x = random.uniform(-R, R)
y = random.uniform(-R, R)
if x*x + y*y <= R*R:
return (x, y) # rejection sampling
Acceptance rate is pi/4 ~ 78.5%, so expected iterations is about 1.27.
Why you cannot use r = random.uniform(0, R) with angle: the area element in polar coordinates is r*dr*dtheta. Uniform r gives density proportional to 1/r, which oversamples the center. Correct polar approach: r = R * sqrt(random.uniform(0, 1)). The sqrt corrects for the area scaling.
Bloom Filters in Probability
A Bloom filter uses m bits and k hash functions to test set membership with no false negatives and a tunable false positive rate.
False positive rate: p = (1 – e^(-kn/m))^k where n is the number of inserted elements. After inserting n elements, each bit is set with probability 1 – e^(-kn/m).
Optimal k: k = (m/n) * ln(2). With optimal k, p = (1/2)^k ~ 0.6185^(m/n). To achieve 1% false positive rate, you need about 9.6 bits per element.
Applications:
- Chrome Safe Browsing: bloom filter of malicious URLs loaded locally, full check only on positive hit
- Cassandra: per-SSTable bloom filters to avoid disk reads for missing keys
- Akamai CDN: one-hit-wonder filter to avoid caching items requested only once
- Bitcoin: SPV clients use bloom filters to request only relevant transactions from full nodes
Monte Carlo Estimation
def estimate_pi(n_samples):
inside = 0
for _ in range(n_samples):
x, y = random.random(), random.random()
if x*x + y*y <= 1.0:
inside += 1
return 4 * inside / n_samples
Convergence rate is O(1/sqrt(N)) by the Central Limit Theorem. To gain one extra decimal place of accuracy, you need 100x more samples. With N=10^6, expect about 3 correct digits.
Coupon Collector Problem
How many random draws (with replacement) from n coupons until you have collected all n types?
E[draws] = n * H(n) = n * (1 + 1/2 + 1/3 + … + 1/n) ~ n * ln(n)
When you have k distinct coupons, the expected additional draws to get the next new one is n/(n-k). Sum these from k=0 to n-1: n/n + n/(n-1) + … + n/1 = n * H(n).
For n=365 (birthday problem variant): expected draws to see all birthdays is 365 * ln(365) + O(1) ~ 2153.
rand5 from rand7 / rand7 from rand5
The general pattern is rejection sampling: generate a uniform distribution over a larger range, reject values outside the target range.
# rand7 from rand5
def rand7():
while True:
# Generate uniform over [0, 24] using 5*rand5() + rand5()
val = 5 * (rand5() - 1) + (rand5() - 1) # [0, 24]
if val < 21: # 21 = 3*7, reject values 21-24
return val % 7 + 1
# rand5 from rand7 (simpler - just reject 6 and 7)
def rand5_from_rand7():
while True:
val = rand7()
if val <= 5:
return val
For rand7 from rand5: acceptance rate is 21/25 = 84%. Expected calls to rand5 per call to rand7 is 2 * (1/0.84) ~ 2.38.
The key insight: if you have randN, you can generate uniform integers in [0, N^k – 1] using base-N representation with k calls, then reject the remainder of the range that does not divide evenly into your target range.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is reservoir sampling and when do you use it?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Reservoir sampling selects k items from a stream of unknown size n with uniform probability, using O(k) memory. Algorithm: fill a reservoir with the first k items. For each subsequent item i (0-indexed), generate a random index j in [0, i]. If j < k, replace reservoir[j] with item i. Correctness: any item ends up in the final reservoir with exactly probability k/n. Used when the stream is too large to fit in memory or when n is not known in advance. Example: select 1000 random log lines from a multi-gigabyte log file in a single pass.”}},{“@type”:”Question”,”name”:”How do you implement a fair random shuffle in O(n) time?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use the Fisher-Yates algorithm: iterate from index n-1 down to 1. At each index i, pick a random index j in [0, i] inclusive, then swap arr[i] with arr[j]. This produces all n! permutations with equal probability. The critical mistake is using random.randint(0, n-1) at every step instead of random.randint(0, i) – this biases the result because it produces n^n outcomes instead of n!. Fisher-Yates runs in O(n) time and O(1) extra space. This is LC 384 (Shuffle an Array).”}},{“@type”:”Question”,”name”:”How does a Bloom filter work and what are its tradeoffs?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A Bloom filter uses m bits and k hash functions to test set membership. Insert: hash the item k times, set each bit position to 1. Query: hash k times, check if all positions are 1. False positives are possible (all k positions happen to be set by other insertions). False negatives are impossible. False positive rate = (1 – e^(-kn/m))^k where n is the number of items inserted. Optimal k = (m/n) * ln(2). For 1% false positive rate, need ~9.6 bits per element. Applications: Chrome Safe Browsing (URL blacklist lookup before DB check), Cassandra (skip SSTables that do not contain a key), cache penetration prevention.”}},{“@type”:”Question”,”name”:”How do you generate a weighted random selection efficiently?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Prefix sum approach: compute cumulative weights, generate a random number r in [0, total_weight), binary search for the first prefix sum > r – O(log n) per query after O(n) build. Alias Method: O(n) build, O(1) per query. Preprocess weights into probability and alias tables where each bucket has a primary item and at most one alias. On query, pick a random bucket, then flip a biased coin to choose the bucket item or its alias. The Alias Method is preferred for high-throughput sampling (word2vec negative sampling, A/B test traffic splitting).”}},{“@type”:”Question”,”name”:”What is the birthday problem and why does it matter for system design?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”With n people, the probability that two share a birthday exceeds 50% at just 23 people. Formula: P(collision) ~ 1 – e^(-n^2 / 2m) where m is the number of equally likely outcomes. For hash tables with m buckets and n inserts, expected first collision at n ~ sqrt(m). For UUIDs (128-bit, m ~ 3.4e38): need ~sqrt(3.4e38) ~ 1.8e19 UUIDs before expecting a collision – effectively never. For 32-bit IDs (m ~ 4e9): collision expected after ~65K inserts. This guides ID space selection. Also applies to hash function collision analysis.”}}]}Meta coding and data science interviews frequently test probability. See common questions for Meta interview: probability and statistics questions.
Databricks interviews cover probability and ML-related statistics. See patterns for Databricks interview: probability and ML statistics.
Coinbase interviews test probability reasoning and algorithmic thinking. Review patterns for Coinbase interview: probability and financial algorithms.