Core Probability Concepts
Probability questions test your ability to reason about uncertainty and expected behavior. Three concepts appear repeatedly in interviews:
Expected value is the probability-weighted average of all outcomes. For a fair die: E[X] = (1+2+3+4+5+6)/6 = 3.5. Linearity of expectation lets you sum expected values of independent (or even dependent) events.
Birthday problem: how many people do you need in a room before there is a 50% chance two share a birthday? The answer is 23. The complement approach: P(all different) = 365/365 * 364/365 * … * (365-n+1)/365. This drops below 0.5 at n=23. The approximation is n ~ sqrt(2 * 365 * ln(2)) ~ 23.
Geometric distribution: the number of trials until the first success when each trial succeeds with probability p. E[trials] = 1/p. Example: expected rolls of a die to see a 6 is 6.
Reservoir Sampling
Problem: given a stream of unknown length N, select k items uniformly at random using O(k) memory.
def reservoir_sample(stream, k):
reservoir = []
for i, item in enumerate(stream):
if i < k:
reservoir.append(item)
else:
# Replace element at random index with probability k/(i+1)
j = random.randint(0, i)
if j < k:
reservoir[j] = item
return reservoir
Correctness proof: at step i (0-indexed), item i is selected with probability k/(i+1). An earlier item at position j survives step i with probability 1 – (k/(i+1)) * (1/k) = i/(i+1). By induction, after processing N items each item has probability k/N of being in the reservoir.
Weighted reservoir sampling (Algorithm A-Res): assign each item a key = uniform(0,1)^(1/weight), keep the k items with the largest keys. This gives exact weighted sampling over a stream.
Random Shuffle / Fisher-Yates
The Fisher-Yates (Knuth) shuffle produces a uniformly random permutation in O(n) time.
def fisher_yates_shuffle(arr):
n = len(arr)
for i in range(n - 1, 0, -1):
j = random.randint(0, i) # j in [0, i] inclusive
arr[i], arr[j] = arr[j], arr[i]
return arr
Why the naive approach is wrong: picking a random position from [0, n-1] for every element produces n^n possible outcomes, which is not divisible by n! for most n. The result is a biased permutation. Fisher-Yates produces exactly n! outcomes.
Weighted Random Selection
Given items with weights [w0, w1, …, wn-1], select item i with probability wi / sum(weights).
import bisect
import random
class WeightedRandom:
def __init__(self, weights):
self.prefix = []
total = 0
for w in weights:
total += w
self.prefix.append(total)
self.total = total
def pick(self):
target = random.uniform(0, self.total)
# bisect_left finds leftmost index where prefix[i] >= target
return bisect.bisect_left(self.prefix, target)
Construction is O(n), each pick is O(log n). The Alias Method achieves O(1) picks after O(n) preprocessing by decomposing the distribution into a fair coin flip plus a bias table. Worth mentioning in interviews when the pick frequency is very high.
Random Point in Circle
To generate a uniformly random point inside a circle of radius R:
def random_point_in_circle(R):
while True:
x = random.uniform(-R, R)
y = random.uniform(-R, R)
if x*x + y*y <= R*R:
return (x, y) # rejection sampling
Acceptance rate is pi/4 ~ 78.5%, so expected iterations is about 1.27.
Why you cannot use r = random.uniform(0, R) with angle: the area element in polar coordinates is r*dr*dtheta. Uniform r gives density proportional to 1/r, which oversamples the center. Correct polar approach: r = R * sqrt(random.uniform(0, 1)). The sqrt corrects for the area scaling.
Bloom Filters in Probability
A Bloom filter uses m bits and k hash functions to test set membership with no false negatives and a tunable false positive rate.
False positive rate: p = (1 – e^(-kn/m))^k where n is the number of inserted elements. After inserting n elements, each bit is set with probability 1 – e^(-kn/m).
Optimal k: k = (m/n) * ln(2). With optimal k, p = (1/2)^k ~ 0.6185^(m/n). To achieve 1% false positive rate, you need about 9.6 bits per element.
Applications:
- Chrome Safe Browsing: bloom filter of malicious URLs loaded locally, full check only on positive hit
- Cassandra: per-SSTable bloom filters to avoid disk reads for missing keys
- Akamai CDN: one-hit-wonder filter to avoid caching items requested only once
- Bitcoin: SPV clients use bloom filters to request only relevant transactions from full nodes
Monte Carlo Estimation
def estimate_pi(n_samples):
inside = 0
for _ in range(n_samples):
x, y = random.random(), random.random()
if x*x + y*y <= 1.0:
inside += 1
return 4 * inside / n_samples
Convergence rate is O(1/sqrt(N)) by the Central Limit Theorem. To gain one extra decimal place of accuracy, you need 100x more samples. With N=10^6, expect about 3 correct digits.
Coupon Collector Problem
How many random draws (with replacement) from n coupons until you have collected all n types?
E[draws] = n * H(n) = n * (1 + 1/2 + 1/3 + … + 1/n) ~ n * ln(n)
When you have k distinct coupons, the expected additional draws to get the next new one is n/(n-k). Sum these from k=0 to n-1: n/n + n/(n-1) + … + n/1 = n * H(n).
For n=365 (birthday problem variant): expected draws to see all birthdays is 365 * ln(365) + O(1) ~ 2153.
rand5 from rand7 / rand7 from rand5
The general pattern is rejection sampling: generate a uniform distribution over a larger range, reject values outside the target range.
# rand7 from rand5
def rand7():
while True:
# Generate uniform over [0, 24] using 5*rand5() + rand5()
val = 5 * (rand5() - 1) + (rand5() - 1) # [0, 24]
if val < 21: # 21 = 3*7, reject values 21-24
return val % 7 + 1
# rand5 from rand7 (simpler - just reject 6 and 7)
def rand5_from_rand7():
while True:
val = rand7()
if val <= 5:
return val
For rand7 from rand5: acceptance rate is 21/25 = 84%. Expected calls to rand5 per call to rand7 is 2 * (1/0.84) ~ 2.38.
The key insight: if you have randN, you can generate uniform integers in [0, N^k – 1] using base-N representation with k calls, then reject the remainder of the range that does not divide evenly into your target range.
Meta coding and data science interviews frequently test probability. See common questions for Meta interview: probability and statistics questions.
Databricks interviews cover probability and ML-related statistics. See patterns for Databricks interview: probability and ML statistics.
Coinbase interviews test probability reasoning and algorithmic thinking. Review patterns for Coinbase interview: probability and financial algorithms.