Find Median in a Stream: The Two-Heaps Insight Problem

⏱ 6 min read

Numbers arrive one at a time in a stream. After each number, return the median of all numbers seen so far. The median of an even number of values is the average of the two middle values; for an odd number, it is the middle value itself.

This is LeetCode 295, “Find Median from Data Stream”, and it is the canonical “data structure insight” problem in the modern interview canon. The brute force is simple but slow. The sorted-list approach is faster but still suboptimal. The optimal solution — two heaps, one max-heap for the lower half and one min-heap for the upper half — is the kind of small but powerful insight that signals algorithmic maturity. Once a candidate has internalized the two-heaps pattern on this problem, related problems (sliding window median, kth largest in a stream) become accessible.

The brute force: O(n log n) per query

class MedianFinderBrute:
    def __init__(self):
        self.nums = []
    def add(self, num):
        self.nums.append(num)
    def find_median(self):
        s = sorted(self.nums)
        n = len(s)
        if n % 2:
            return s[n // 2]
        return (s[n // 2 - 1] + s[n // 2]) / 2

O(1) per addition, O(n log n) per query. Fine for occasional queries, terrible if the median is queried after every insertion.

Sorted list: O(n) per insertion

import bisect

class MedianFinderSorted:
    def __init__(self):
        self.nums = []
    def add(self, num):
        bisect.insort(self.nums, num)
    def find_median(self):
        n = len(self.nums)
        if n % 2:
            return self.nums[n // 2]
        return (self.nums[n // 2 - 1] + self.nums[n // 2]) / 2

Insertion is O(n) because of the shift required to insert in the middle of a sorted array. Median lookup is O(1). Total O(n) per insertion plus O(1) per query. Better than brute force in some access patterns; still suboptimal.

Two heaps: O(log n) per insertion, O(1) per query

Maintain two heaps:

Lower half (max-heap). Contains the smaller half of the numbers seen so far. The top is the largest of the lower half.
Upper half (min-heap). Contains the larger half. The top is the smallest of the upper half.

import heapq

class MedianFinder:
    def __init__(self):
        self.lower = []  # max-heap (negate values for Python's min-heap)
        self.upper = []  # min-heap

    def add(self, num):
        # Always push to lower first
        heapq.heappush(self.lower, -num)
        # Move the max of lower to upper to keep them sorted relative to each other
        heapq.heappush(self.upper, -heapq.heappop(self.lower))
        # Rebalance if upper has more than lower
        if len(self.upper) > len(self.lower):
            heapq.heappush(self.lower, -heapq.heappop(self.upper))

    def find_median(self):
        if len(self.lower) > len(self.upper):
            return -self.lower[0]
        return (-self.lower[0] + self.upper[0]) / 2

O(log n) per insertion (three heap operations, each O(log n)). O(1) per query (heap top access). Optimal asymptotic complexity.

The Python max-heap trick

Python’s heapq only provides a min-heap. To use it as a max-heap, negate the values on push and negate again on pop. The boilerplate is annoying but the asymptotic complexity is unchanged.

In Java, PriorityQueue takes a custom comparator, so a max-heap is straightforward. In C++, priority_queue defaults to max-heap, with the inverse trick needed for a min-heap.

The rebalancing logic

The crux of the algorithm is keeping the two heaps balanced. The standard sequence:

Push the new number into the lower heap.
Pop the maximum of the lower heap and push it into the upper heap. (This may move the new number directly into the upper, if it was larger than the previous lower max.)
If the upper heap now has more elements than the lower, pop the minimum of the upper and push it into the lower. (This rebalances.)

After this sequence, the invariants hold: every element in the lower is at most every element in the upper, and the lower has either the same count or one more.

Many candidates produce a buggy version of this on their first attempt because the rebalancing has multiple cases that are easy to confuse. Writing it as a fixed three-step sequence — push, swap, rebalance — is cleaner than trying to handle every case explicitly.

Variations interviewers ask

Sliding window median (LC 480). Compute the median of the last k numbers as the window slides. Same two-heaps idea, plus a “lazy deletion” mechanism for handling expired elements.
Median of two sorted arrays (LC 4). A different problem entirely (uses binary search), but the median framing is similar.
Kth largest in a stream (LC 703). Use a min-heap of size k. Tests recognition of when “single heap” is sufficient.
Median of a multiset of integers in a small range. Use a count-array instead of heaps; O(1) per insertion if the value range is small.
Approximate median for very large streams. Real-world question. Uses sampling or sketching algorithms (t-digest, q-digest); harder; usually a senior+ follow-up.

What interviewers grade

Find median in a stream is a senior-level FAANG question, occasionally appearing at staff+. The signal layers:

Brute force first. The candidate states the O(n log n) brute force quickly.
Articulate the two-halves insight. “I want O(1) median lookup. That means the data structure should keep me near the middle. Two heaps — one for each half — would do that.”
Code the heaps cleanly. Including the max-heap-via-negation if working in Python.
Get the rebalancing right. Most candidates struggle here; it differentiates strong from average.
Discuss the streaming context. What if the stream is unbounded? What if memory is constrained? Does the candidate think about real-world constraints?

Is it asked in 2026?

Yes, regularly at senior+ FAANG, at fintech tier 1 (which uses streaming financial data and finds the question domain-relevant), and at AI labs for ML infrastructure roles. The two-heaps pattern itself is generative — once a candidate has it, several other LeetCode problems unlock — so the question continues to differentiate even among candidates who have heard of the basic two-heaps trick.

Frequently Asked Questions

Why two heaps and not a balanced BST?

A balanced BST works (red-black or AVL) and gives the same asymptotic complexity. Two heaps are simpler to implement, especially in languages where a balanced BST is not in the standard library. Most interviewers prefer the heap solution for that reason.

What if the stream is very long?

The standard two-heaps approach uses O(n) memory, which is unbounded for an unbounded stream. For very long streams, approximate algorithms (t-digest, q-digest, or quantile sketches) trade accuracy for bounded memory. This is a common follow-up at senior+ levels.

Can I use a sorted list instead?

Yes for correctness, but it is O(n) per insertion versus O(log n) for two heaps. The interviewer expects the heap solution.

What’s the time complexity for finding the median?

O(1) for the two-heaps approach. The median is either the top of one heap or the average of the two tops, both O(1) to access.

Is this still asked in 2026?

Yes, regularly at senior+ FAANG and at fintech and AI infrastructure roles. It remains in the canonical “data structure insight” tier of LeetCode hards.