Write code to generate all possible combinations of ‘r’ elements in a given array of size ‘n’?
💡Strategies for Solving This Problem
Statistics and Sampling
Got this at Two Sigma in 2024. Tests understanding of probability distributions, sampling, and implementing statistical functions from scratch. Common in quantitative trading interviews.
The Problem
Implement a function that samples from a custom probability distribution given as discrete probabilities.
Example: Given array [0.1, 0.3, 0.4, 0.2], return index 0 with 10% probability, index 1 with 30%, index 2 with 40%, index 3 with 20%.
Approach 1: Linear Search
Generate random number [0, 1). Walk through array adding probabilities until sum exceeds random number.
Algorithm:
probs = [0.1, 0.3, 0.4, 0.2]
r = random() // e.g. 0.65
sum = 0
for i, p in probs:
sum += p
if sum >= r:
return i
Example: r=0.65
- 0.1 < 0.65, continue
- 0.1 + 0.3 = 0.4 < 0.65, continue
- 0.4 + 0.4 = 0.8 >= 0.65, return index 2 ✓
Time: O(n) per sample
Approach 2: Binary Search with CDF (Optimal) ✓
Pre-compute cumulative distribution function (CDF), then use binary search.
CDF: [0.1, 0.4, 0.8, 1.0]
To sample:
- Generate r = random()
- Binary search CDF for first value >= r
- Return that index
Time: O(n) setup, O(log n) per sample
Much better when sampling multiple times from same distribution.
Approach 3: Alias Method (Advanced)
Pre-process into O(n) space structure that allows O(1) sampling. Complex but optimal for many samples.
Used in high-frequency systems where sampling speed is critical.
Edge Cases
- Probabilities don't sum to 1: Normalize first
- Zero probabilities: Skip in CDF
- Floating point errors: Use epsilon comparisons
- Empty array: Error or return null
- Single element: Always return 0
At Two Sigma
I initially did linear search. Interviewer said "You'll sample millions of times. Can you do better?" Then I did CDF with binary search. He asked about space-time tradeoff and mentioned alias method. We discussed when each approach is best.