Webloggers – techinterview

⏱ 4 min read

Five webloggers – joshua Allen, meg Hourihan, jason Kottke, robert Scoble, and joel Spolsky – were competing for karma points on the major search engines: google, yahoo, altavista, lycos, and msn. karma was distributed on a five point scale. the most popular weblog received 5 points, and the least popular received 1 point. for each search engine, no two webloggers received the same number of points. overall scores were determined by adding up the individual scores from each search engine.

Allen got the highest number of karma points – 24. Kottke was consistent in his scores: he got the same karma points from 4 different search engines. Spolsky got 5 points from lycos, and 3 from msn.

no two webloggers got the same total score, and the final rankings were as follows: Allen, Hourihan, Kottke, Scoble, and Spolsky. how many karma points did Hourihan get from lycos?

Solution

let’s start with what we know

     G   Y   A   L   M    Total
   |============================
 A |                           24
 H |
 K |
 Sc|
 Sp|               5   3

the only possible values for Allen achieving 24 is { 5 5 5 5 4 } and since Spolsky got a 5 from lycos, we know that is where Allen’s 4 comes from.

we also know that the total number of points given out was 75.
(5 * (5 + 4 + 3 + 2 + 1))

spolsky had to have at least 11 points. if Spolsky had more than 11 points, say 12, then is it possible to achieve a solution? Scoble would have had to have at least 13 (since there were no ties), and Kottke 14, and Houlihan 15. that would yield an overall total of 78. too much! so Spolsky definitely had 11 points.

     G   Y   A   L   M    Total
   |============================
 A | 5   5   5   4   5      24
 H |
 K |
 Sc|
 Sp| 1   1   1   5   3      11

using the same logic as before, we also know that Scoble could not have gotten more than 12 points. if he had 13, and Kottke 14, and Houlihan 15, the total would be 77. still too much. so Scoble had 12, and continuing on Kottke had to have 13 and Houlihan 15, otherwise the totals would be over 75.

now we know Kottke had 14 points. if he got four 4’s for consistency, it wouldn’t work (already over 16). if he got four 2’s, it also wouldn’t work (8 points plus the maximum 5 is still only 13). so he had to have received four 3’s. and since he couldn’t have gotten a 3 from msn, that is where he received a 1.

     G   Y   A   L   M    Total
   |============================
 A | 5   5   5   4   5      24
 H |                        15
 K | 3   3   3   3   1      13
 Sc|                        12
 Sp| 1   1   1   5   3      11

let’s look at scoble. we can see from the chart that all 5’s and 3’s have already been given out (and there is only one 1 left). so Scoble’s scores can only contain 4’s, 2’s, or a single 1. given that information the only possible combination of 5 scores that would yield 12 is { 2 2 2 2 4 }. since Allen already has a 4 from lycos, Scoble must have a 2 there.

     G   Y   A   L   M    Total
   |============================
 A | 5   5   5   4   5      24
 H |                            15
 K | 3   3   3   3   1      13
 Sc|               2           12
 Sp| 1   1   1   5   3      11

hence Houlihan must have a 1 from lycos!

     G   Y   A   L   M    Total
   |============================
 A | 5   5   5   4   5      24
 H |               1          15
 K | 3   3   3   3   1      13
 Sc|               2          12
 Sp| 1   1   1  5   3      11

2026 Update: Log Processing at Scale — Stream Analytics and Bloom Filters

Processing web server logs at scale is a common data engineering interview topic. The naive approach fails at millions of log lines; you need probabilistic data structures and streaming algorithms.

import hashlib
import math
from collections import defaultdict

# Problem: Count unique visitors from 100M log entries
# Constraint: Can't store all IPs (too much memory)

# Approach 1: HyperLogLog — count distinct elements with O(log log n) memory
class HyperLogLog:
    """Approximates count of distinct elements with ~2% error."""

    def __init__(self, b=10):
        self.b = b
        self.m = 2 ** b  # Number of registers
        self.registers = [0] * self.m
        self.alpha = 0.7213 / (1 + 1.079 / self.m)

    def add(self, item):
        h = int(hashlib.md5(str(item).encode()).hexdigest(), 16)
        register_idx = h & (self.m - 1)
        w = h >> self.b
        self.registers[register_idx] = max(
            self.registers[register_idx],
            self._leading_zeros(w) + 1
        )

    def _leading_zeros(self, n, bits=32):
        if n == 0: return bits
        count = 0
        for i in range(bits - 1, -1, -1):
            if n & (1 << i):
                break
            count += 1
        return count

    def count(self):
        Z = 1.0 / sum(2 ** -r for r in self.registers)
        return int(self.alpha * self.m * self.m * Z)

# Approach 2: Bloom Filter — check if IP has been seen (with false positives)
class BloomFilter:
    """Space-efficient probabilistic set membership. No false negatives."""

    def __init__(self, n_items, false_positive_rate=0.01):
        self.m = int(-n_items * math.log(false_positive_rate) / math.log(2)**2)
        self.k = int(self.m / n_items * math.log(2))
        self.bits = bytearray(self.m // 8 + 1)

    def _hashes(self, item):
        h1 = int(hashlib.md5(str(item).encode()).hexdigest(), 16)
        h2 = int(hashlib.sha256(str(item).encode()).hexdigest(), 16)
        return [(h1 + i * h2) % self.m for i in range(self.k)]

    def add(self, item):
        for pos in self._hashes(item):
            self.bits[pos // 8] |= (1 < bool:
        return all(self.bits[pos // 8] & (1 < int:
        return min(self.table[i][self._hash(item, seed)]
                   for i, seed in enumerate(self.seeds))

# Real-world log processing pipeline
def analyze_logs_streaming(log_stream, max_memory_mb=100):
    hll = HyperLogLog(b=12)           # ~4KB for unique IP count
    bf = BloomFilter(10_000_000)      # Bloom filter for new IP detection
    cms = CountMinSketch(10000, 7)    # Frequency estimation

    for line in log_stream:
        ip = line.split()[0]  # Extract IP from log line
        hll.add(ip)
        if not bf.might_contain(ip):
            bf.add(ip)
        cms.add(ip)

    return {
        "estimated_unique_ips": hll.count(),
        "top_ip_requests": "use CMS to query specific IPs",
    }

2026 interview context: Log processing questions at FAANG now routinely ask about probabilistic data structures. Redis 7.x includes native HyperLogLog (PFADD/PFCOUNT). Apache Flink and Kafka Streams use Count-Min Sketch for windowed analytics. Bloom filters power Chrome’s malware URL checking and Cassandra’s SSTable lookups.