Probability Distribution Functions: PMF, PDF, CDF, and Common Distributions

Q: What books should I use for distribution prep?

Sheldon Ross's Introduction to Probability Models is the classical text. For quant-flavored, Zhou's "green book" (A Practical Guide to Quantitative Finance Interviews) has many distribution-related problems. Stat 110 by Joe Blitzstein (Harvard, free online lectures) covers the same material at lecture pace. Most candidates over-prepare here; familiarity with the basic distributions and how to compute moments is sufficient.

Probability Distribution Functions: PMF, PDF, CDF, and the Common Distributions

Understanding probability distribution functions is foundational for quant trading interviews, ML / statistics interviews, and any role involving uncertainty modeling. The standard objects — PMF, PDF, CDF, expectation, variance — appear constantly. This guide covers the definitions, the relationships between them, and the standard distributions (normal, exponential, Poisson, geometric, binomial) that interviewers expect candidates to recognize and reason about.

Probability Mass Function (PMF) — Discrete

For a discrete random variable X taking values in a countable set, the PMF gives the probability of each outcome:

PMF: p(x) = P(X = x)

Properties:

p(x) ≥ 0 for all x
Σ p(x) = 1 (sum over all possible values)

Example: a fair die has PMF p(x) = 1/6 for x ∈ {1, 2, 3, 4, 5, 6}.

Probability Density Function (PDF) — Continuous

For a continuous random variable, single points have probability 0; instead, the PDF gives a density:

PDF: f(x), where P(a ≤ X ≤ b) = ∫[a..b] f(x) dx

Properties:

f(x) ≥ 0 for all x
∫[-∞..∞] f(x) dx = 1
f(x) is NOT a probability — it can exceed 1

Example: Standard normal has PDF f(x) = (1/√(2π)) × exp(-x²/2). At x = 0, f(0) ≈ 0.399 — a density, not a probability.

Cumulative Distribution Function (CDF)

For both discrete and continuous random variables, the CDF gives the probability that X is at most some value:

CDF: F(x) = P(X ≤ x)

For discrete: F(x) = Σ p(y) for all y ≤ x.

For continuous: F(x) = ∫[-∞..x] f(t) dt.

Properties:

F is non-decreasing
F(-∞) = 0, F(∞) = 1
F is right-continuous

The CDF is universal — works for any random variable, discrete or continuous, mixed or singular.

Expectation and Variance

Expectation (mean):

Discrete: E[X] = Σ x × p(x)
Continuous: E[X] = ∫ x × f(x) dx

Variance:

Var(X) = E[(X – E[X])²] = E[X²] – E[X]²

Standard deviation: σ = √Var(X), in the same units as X.

Common Distributions to Know

Bernoulli

Single trial with success probability p. PMF: p(1) = p, p(0) = 1-p. E[X] = p; Var(X) = p(1-p).

Use case: Coin flip, success/failure events.

Binomial

Sum of n independent Bernoulli trials. PMF: p(k) = C(n, k) × p^k × (1-p)^(n-k). E[X] = np; Var(X) = np(1-p).

Use case: Number of heads in n flips, conversion counts in A/B tests.

Geometric

Number of Bernoulli trials until first success. PMF: p(k) = (1-p)^(k-1) × p for k = 1, 2, …. E[X] = 1/p; Var(X) = (1-p)/p².

Use case: First failure after k successes, gambler’s ruin variant.

Poisson

Number of events in fixed interval, rate λ. PMF: p(k) = λ^k × e^(-λ) / k!. E[X] = λ; Var(X) = λ.

Use case: Arrivals in queueing models, rare events, network packet counts.

Uniform (Continuous)

Equal density on [a, b]. PDF: f(x) = 1/(b-a) for a ≤ x ≤ b, else 0. E[X] = (a+b)/2; Var(X) = (b-a)²/12.

Use case: rand() in most languages, baseline for sampling.

Exponential

Time until first arrival in a Poisson process, rate λ. PDF: f(x) = λ × e^(-λx) for x ≥ 0. E[X] = 1/λ; Var(X) = 1/λ².

Use case: Time-until-event modeling, memoryless processes (network packet arrival, radioactive decay).

Normal (Gaussian)

Bell curve. PDF: f(x) = (1/(σ√(2π))) × exp(-(x-μ)²/(2σ²)). E[X] = μ; Var(X) = σ².

Use case: Most aggregate statistics (Central Limit Theorem), measurement errors, asset returns at long horizons.

Lognormal

X is lognormal if ln(X) is normal. Skewed right; common for asset prices, income distributions.

Memorylessness Property

The geometric and exponential distributions are memoryless: P(X > s + t | X > s) = P(X > t). The waiting time doesn’t depend on how long you’ve already waited.

This is unique to these two distributions. In real-world processes, memorylessness is rarely true exactly, but exponential is often a good approximation for short-horizon events.

Common Interview Problems

Computing expectation

“X is uniform on [0, 1]. What’s E[X²]?” Answer: ∫₀¹ x² dx = 1/3. Strong candidates do this without paper.

Variance under transformation

“If Var(X) = σ², what’s Var(aX + b)?” Answer: a²σ². Variance is invariant to constant shift; scales by the square of multiplier.

Sum of independent random variables

“X and Y are independent. Var(X + Y) = ?” Answer: Var(X) + Var(Y). Without independence, you’d need covariance terms.

Recognize the distribution

“You count k events in time t, where each event is independent. What distribution is this?” Poisson with λ = expected count.

Compute a tail probability

“X is exponential with rate 1. What’s P(X > 2)?” Answer: e^(-2) ≈ 0.135. Use the survival function (1 – CDF).

Common Mistakes

Treating PDF values as probabilities. f(x) is a density, not a probability. P(X = x) = 0 for any single point in a continuous distribution.
Forgetting the area-under-curve = 1 constraint. A function isn’t a valid PDF unless it integrates to 1.
Confusing variance and standard deviation. SD = √Var. SD is in the original units; variance is squared.
Misapplying independence. Sum of variances applies only for independent random variables. With covariance, Var(X+Y) = Var(X) + Var(Y) + 2Cov(X,Y).
Mixing discrete and continuous. Interval [a, b] for continuous includes both endpoints; for discrete, you need to specify whether endpoints count. The CDF unifies them, but PMF/PDF behavior differs.

Frequently Asked Questions

What’s the practical difference between PMF and PDF?

PMF gives a probability at each outcome (discrete). PDF gives a density (continuous; integrate to get probability over an interval). The CDF unifies them. For interviews: know which type the problem implies, and use the right formulas.

Why is the normal distribution so important?

Central Limit Theorem: sums of many independent random variables converge to normal. This makes normal a good approximation for aggregate statistics across diverse domains. In quant finance specifically, normality is assumed in many models (Black-Scholes, mean-variance portfolio optimization) even when returns are leptokurtic in practice.

What does memorylessness mean and why does it matter?

Future behavior doesn’t depend on past history (given a current state). Exponential and geometric have this property; almost no other distributions do. Important in queueing theory and Markov chain analysis where memorylessness simplifies the math considerably.

How do I quickly compute moments?

For known distributions, memorize the formulas (E[X] and Var(X)). For functions of random variables, use moment-generating functions (MGFs) or characteristic functions when the algebra gets ugly. Most interview problems can be solved with basic E[X] and Var(X) formulas plus linearity of expectation.

What books should I use for distribution prep?

Sheldon Ross’s Introduction to Probability Models is the classical text. For quant-flavored, Zhou’s “green book” (A Practical Guide to Quantitative Finance Interviews) has many distribution-related problems. Stat 110 by Joe Blitzstein (Harvard, free online lectures) covers the same material at lecture pace. Most candidates over-prepare here; familiarity with the basic distributions and how to compute moments is sufficient.

💡Strategies for Solving This Problem

Statistics and Sampling

Got this at Two Sigma in 2024. Tests understanding of probability distributions, sampling, and implementing statistical functions from scratch. Common in quantitative trading interviews.

The Problem

Implement a function that samples from a custom probability distribution given as discrete probabilities.

Example: Given array [0.1, 0.3, 0.4, 0.2], return index 0 with 10% probability, index 1 with 30%, index 2 with 40%, index 3 with 20%.

Approach 1: Linear Search

Generate random number [0, 1). Walk through array adding probabilities until sum exceeds random number.

Algorithm:

probs = [0.1, 0.3, 0.4, 0.2]
r = random()  // e.g. 0.65
sum = 0
for i, p in probs:
    sum += p
    if sum >= r:
        return i

Example: r=0.65

0.1 < 0.65, continue
0.1 + 0.3 = 0.4 < 0.65, continue
0.4 + 0.4 = 0.8 >= 0.65, return index 2 ✓

Time: O(n) per sample

Approach 2: Binary Search with CDF (Optimal) ✓

Pre-compute cumulative distribution function (CDF), then use binary search.

CDF: [0.1, 0.4, 0.8, 1.0]

To sample:

Generate r = random()
Binary search CDF for first value >= r
Return that index

Time: O(n) setup, O(log n) per sample

Much better when sampling multiple times from same distribution.

Approach 3: Alias Method (Advanced)

Pre-process into O(n) space structure that allows O(1) sampling. Complex but optimal for many samples.

Used in high-frequency systems where sampling speed is critical.

Edge Cases

Probabilities don't sum to 1: Normalize first
Zero probabilities: Skip in CDF
Floating point errors: Use epsilon comparisons
Empty array: Error or return null
Single element: Always return 0

At Two Sigma

I initially did linear search. Interviewer said "You'll sample millions of times. Can you do better?" Then I did CDF with binary search. He asked about space-time tradeoff and mentioned alias method. We discussed when each approach is best.

✅Solution

Solution: CDF with Binary Search

class ProbabilityDistribution {
    constructor(probabilities) {
        if (!probabilities || probabilities.length === 0) {
            throw new Error('Empty probability array');
        }

        this.probabilities = probabilities;
        this.cdf = this.buildCDF(probabilities);
    }

    buildCDF(probs) {
        // Normalize if needed
        const sum = probs.reduce((a, b) => a + b, 0);
        if (Math.abs(sum - 1.0) > 1e-6) {
            probs = probs.map(p => p / sum);
        }

        // Build cumulative distribution
        const cdf = [];
        let cumulative = 0;

        for (const p of probs) {
            cumulative += p;
            cdf.push(cumulative);
        }

        // Ensure last value is exactly 1.0 (handle float errors)
        cdf[cdf.length - 1] = 1.0;

        return cdf;
    }

    sample() {
        const r = Math.random();

        // Binary search in CDF
        let left = 0;
        let right = this.cdf.length - 1;

        while (left < right) {
            const mid = Math.floor((left + right) / 2);

            if (this.cdf[mid] < r) {
                left = mid + 1;
            } else {
                right = mid;
            }
        }

        return left;
    }

    // Sample multiple times and return frequency distribution
    sampleMultiple(n) {
        const counts = Array(this.probabilities.length).fill(0);

        for (let i = 0; i < n; i++) {
            counts[this.sample()]++;
        }

        // Convert to percentages
        return counts.map(c => (c / n * 100).toFixed(2) + '%');
    }
}

// Test
const probs = [0.1, 0.3, 0.4, 0.2];
const dist = new ProbabilityDistribution(probs);

console.log('Expected probabilities:', probs.map(p => (p*100) + '%'));
console.log('Sampled 100,000 times:', dist.sampleMultiple(100000));

/* Expected output:
Expected probabilities: ['10%', '30%', '40%', '20%']
Sampled 100,000 times: ['10.01%', '29.98%', '40.02%', '19.99%']
*/

Alternative: Linear Search (Simple)

function sampleLinear(probabilities) {
    const r = Math.random();
    let cumulative = 0;

    for (let i = 0; i < probabilities.length; i++) {
        cumulative += probabilities[i];
        if (r < cumulative) {
            return i;
        }
    }

    // Should never reach here if probs sum to 1
    return probabilities.length - 1;
}

// Test
const probs = [0.1, 0.3, 0.4, 0.2];
const counts = [0, 0, 0, 0];

for (let i = 0; i < 100000; i++) {
    counts[sampleLinear(probs)]++;
}

console.log('Linear search results:');
counts.forEach((c, i) => {
    console.log(`Index ${i}: ${(c/100000*100).toFixed(2)}% (expected ${probs[i]*100}%)`);
});

Python Version with Chi-Square Test

import random
import bisect
from scipy import stats

class ProbabilityDistribution:
    def __init__(self, probabilities):
        self.probabilities = probabilities

        # Normalize if needed
        total = sum(probabilities)
        if abs(total - 1.0) > 1e-6:
            probabilities = [p / total for p in probabilities]

        # Build CDF
        self.cdf = []
        cumulative = 0
        for p in probabilities:
            cumulative += p
            self.cdf.append(cumulative)

        self.cdf[-1] = 1.0  # Fix float errors

    def sample(self):
        r = random.random()
        # bisect_left finds insertion point
        return bisect.bisect_left(self.cdf, r)

    def sample_multiple(self, n):
        counts = [0] * len(self.probabilities)
        for _ in range(n):
            counts[self.sample()] += 1
        return counts


# Test with statistical validation
probs = [0.1, 0.3, 0.4, 0.2]
dist = ProbabilityDistribution(probs)

n_samples = 100000
samples = dist.sample_multiple(n_samples)

print("Probability Distribution Sampling")
print("=" * 50)
print(f"Sample size: {n_samples}n")

print("IndextExpectedtObservedtDiff")
for i, (expected, observed) in enumerate(zip(probs, samples)):
    expected_count = expected * n_samples
    observed_pct = observed / n_samples
    diff = abs(observed - expected_count)
    print(f"{i}t{expected*100:.1f}%tt{observed_pct*100:.2f}%tt{diff:.0f}")

# Chi-square goodness of fit test
expected_counts = [p * n_samples for p in probs]
chi2, p_value = stats.chisquare(samples, expected_counts)

print(f"nChi-square statistic: {chi2:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Result: {'PASS' if p_value > 0.05 else 'FAIL'} (α=0.05)")

Complexity Analysis

Method	Setup	Per Sample	Space
Linear Search	O(1)	O(n)	O(1)
CDF + Binary Search	O(n)	O(log n)	O(n)
Alias Method	O(n)	O(1)	O(n)

When to use each:

Linear: One-time sampling, small n
Binary: Multiple samples, medium n (most common)
Alias: Millions of samples, worth complex setup

Common Mistakes

Not normalizing: If probs sum to 0.9, last value never selected
Float precision: Use epsilon comparisons, not exact equality
Wrong binary search: Need first value >= r, not just >=
Not handling edge cases: Empty array, single element, etc.
Inefficient for repeated sampling: Rebuilding CDF each time

Follow-up Questions

How to sample from continuous distribution (e.g. Normal)? Use inverse CDF method or Box-Muller transform
Sample without replacement? Track used indices, adjust remaining probabilities
Weighted random with changing weights? Use heap or segment tree for O(log n) updates
Prove your sampling is correct? Chi-square test, KS test
Stream of probabilities? Use reservoir sampling

Real-World Applications

Monte Carlo simulation: Sampling from probability models
Game development: Loot drops, enemy spawns
A/B testing: Assigning users to experiments
Load balancing: Weighted server selection
ML training: Sampling training examples

Probability Distribution Functions: PMF, PDF, CDF, and the Common Distributions

Probability Mass Function (PMF) — Discrete

Probability Density Function (PDF) — Continuous

Cumulative Distribution Function (CDF)

Expectation and Variance

Common Distributions to Know

Bernoulli

Binomial

Geometric

Poisson

Uniform (Continuous)

Exponential

Normal (Gaussian)

Lognormal

Memorylessness Property

Common Interview Problems

Computing expectation

Variance under transformation

Sum of independent random variables

Recognize the distribution

Compute a tail probability

Common Mistakes

Frequently Asked Questions

What’s the practical difference between PMF and PDF?

Why is the normal distribution so important?

What does memorylessness mean and why does it matter?

How do I quickly compute moments?

What books should I use for distribution prep?

💡Strategies for Solving This Problem

Statistics and Sampling

The Problem

Approach 1: Linear Search

Approach 2: Binary Search with CDF (Optimal) ✓

Approach 3: Alias Method (Advanced)

Edge Cases

At Two Sigma

✅Solution

Solution: CDF with Binary Search

Alternative: Linear Search (Simple)

Python Version with Chi-Square Test

Complexity Analysis

Common Mistakes

Follow-up Questions

Real-World Applications

Related Problems