Bayesian Weather Forecasting: How Forecasts Update with Evidence

⏱ 5 min read

Bayesian Weather Forecasting: How Forecasts Update with Evidence

“How do you forecast tomorrow’s weather given today’s observation?” is a classic probability interview question that tests Bayesian reasoning under uncertainty. The puzzle has many forms — predicting rain given a wet sidewalk, predicting a coin’s bias from observed flips, classifying email as spam given keywords — but they all share the same Bayes-rule structure. This guide covers the framework, the canonical examples, and the variations that come up at quant trading firms (Jane Street, SIG, Optiver) and machine-learning interviews at FAANG / AI labs.

The Core Framework: Bayes’ Theorem

Bayes’ theorem relates the conditional probability of an event A given evidence B to the reverse conditional and the priors:

P(A | B) = P(B | A) × P(A) / P(B)

Or, more practically:

P(A | B) = P(B | A) × P(A) / [P(B | A) × P(A) + P(B | not A) × P(not A)]

For weather forecasting:

A = “it will rain tomorrow”
B = today’s observation (e.g., “the sky was overcast at sunset”)
P(A) = prior probability of rain (climatological average)
P(B | A) = how often the sky is overcast given rain follows
P(B | not A) = how often the sky is overcast given no rain
P(A | B) = posterior probability of rain given the observation

Canonical Example: Will It Rain Tomorrow?

Suppose:

P(rain) = 0.30 (climate average — rains 30% of days)
P(overcast at sunset | rain follows) = 0.80
P(overcast at sunset | no rain) = 0.20

You observe overcast skies at sunset. What’s the probability it will rain tomorrow?

Apply Bayes:

P(rain | overcast) = P(overcast | rain) × P(rain) / [P(overcast | rain) × P(rain) + P(overcast | no rain) × P(no rain)]

= (0.80 × 0.30) / (0.80 × 0.30 + 0.20 × 0.70)

= 0.24 / (0.24 + 0.14)

= 0.24 / 0.38

= 0.632

The observation lifted the probability from 30% (prior) to ~63% (posterior). The key step is using the conditional probabilities under each hypothesis to compute the posterior.

Variant: Sequential Updates

Bayesian updates compound. Each new piece of evidence updates the posterior, which becomes the prior for the next observation.

def bayes_update(prior: float, likelihood_h: float, likelihood_not_h: float) -> float:
    """Update P(H) given new evidence E."""
    p_h = prior
    p_not_h = 1 - prior
    p_e = likelihood_h * p_h + likelihood_not_h * p_not_h
    return likelihood_h * p_h / p_e


# Sequential weather updates
prior = 0.30  # initial: 30% chance of rain
prior = bayes_update(prior, 0.80, 0.20)  # observe overcast
print(prior)  # 0.632

prior = bayes_update(prior, 0.70, 0.30)  # observe falling barometer
print(prior)  # 0.800

prior = bayes_update(prior, 0.60, 0.40)  # observe morning fog
print(prior)  # 0.857

Each independent observation refines the estimate. Start with the climate prior; update with successive evidence.

Variant: Detecting a Biased Coin

You’re given a coin from a bag. The bag contains either fair coins (50/50) or biased coins (heads with probability 0.75). The bag is half-and-half — you don’t know which type your coin is.

You flip the coin 10 times and observe 8 heads. What’s the probability the coin is biased?

Likelihoods:

P(8 heads | fair) = C(10, 8) × 0.5^10 ≈ 0.044
P(8 heads | biased) = C(10, 8) × 0.75^8 × 0.25^2 ≈ 0.282

Apply Bayes:

P(biased | 8 heads) = (0.282 × 0.5) / (0.282 × 0.5 + 0.044 × 0.5) = 0.282 / 0.326 ≈ 0.866

The 8-of-10 heads observation makes “biased” much more likely than the 50/50 prior would suggest.

Variant: Spam Filtering (Naive Bayes)

The email-spam classifier — and its descendant, naive Bayes text classification — uses the same framework:

P(spam | words) ∝ P(words | spam) × P(spam)

Naive Bayes assumes word occurrences are independent given class, so:

P(words | spam) = ∏ P(word_i | spam)

Estimate P(word_i | spam) from labeled training data (count word frequencies in spam vs ham). For a new email, multiply through and compare. The “naive” part is the independence assumption (words don’t actually occur independently); the classifier works well in practice anyway.

Common Variations

Disease testing

(LeetCode-adjacent) “A test has 99% sensitivity and 99% specificity. The disease has 1% prevalence. What’s the probability someone with a positive test actually has the disease?” Apply Bayes; the answer is 50% (not 99%) — a counterintuitive result that illustrates the importance of base rates.

The Monty Hall problem

Famous puzzle. Bayesian update on the host’s choice of door. See our Monty Hall variations guide.

A/B test analysis

Bayesian posteriors over conversion-rate differences. Used in production-grade A/B test platforms (Lyft, Uber, Etsy publish on this).

Particle filtering / state estimation

Bayesian filter applied to time-series. Used in robotics, navigation, sensor fusion. Beyond standard interview scope but worth knowing exists.

Common Mistakes

Inverting conditional direction. P(A | B) ≠ P(B | A). The disease-test paradox shows this dramatically. Always state both directions explicitly.
Forgetting the prior. The prior dominates when evidence is weak. The disease-test trap is exactly this — high test accuracy doesn’t help much against a strong prior of “no disease.”
Multiplying probabilities of dependent events. Naive Bayes assumes independence; in reality, this assumption can fail badly. Recognize when the assumption is unreasonable.
Numerical underflow. Multiplying many small probabilities (like in naive Bayes) underflows quickly. Work with log probabilities and add instead.
Not asking about the prior. “What’s the chance it rains given…” has no answer without the prior. Strong candidates always elicit or assume the prior explicitly.

Frequently Asked Questions

What’s the expected interview answer?

Walk through Bayes’ theorem with concrete numbers. Identify prior, likelihood, marginal, and posterior. Compute. Strong candidates verbalize each step (“the prior is 0.3; the likelihood under hypothesis A is 0.8; under not-A is 0.2; so the posterior is…”) rather than just plugging into the formula.

What’s the most common Bayes mistake on interviews?

Inverting the conditional direction. Confusing P(positive test | disease) with P(disease | positive test). The two can differ by orders of magnitude when the prior is far from 0.5. Always write both directions out before plugging in.

How does Bayes generalize to time-series?

Sequential updates: each observation updates the posterior, which becomes the prior for the next observation. Particle filters and Kalman filters are formal versions of this for high-dimensional state spaces. The same theorem applies; the bookkeeping gets more involved.

When does naive Bayes fail in practice?

When the independence assumption is badly violated. Naive Bayes for text classification works because word independence is approximately true in many domains. For applications where features are highly correlated (e.g., physical sensor readings), naive Bayes underperforms gradient-boosted trees and other methods that model dependencies. Modern recommendation systems and ML-driven fraud detection rarely use pure naive Bayes — they use models that handle dependencies explicitly.

Why do quant trading firms ask Bayesian problems?

Trading is fundamentally Bayesian — every order, news event, or counterparty action is evidence that updates your estimate of “fair value.” Strong candidates think Bayesian by reflex. The interview tests whether you can apply the framework correctly under pressure with messy real-world inputs.