How does local agent batching improve access log write throughput?

A sidecar or host agent (e.g., Fluentd, Vector) buffers log lines in memory or on local disk, then flushes them as compressed batches to the downstream sink (Kafka, S3, or a log aggregator) every N milliseconds or when the buffer reaches a size threshold. This trades a small amount of latency for a dramatic reduction in TCP connection overhead and per-write serialization cost, improving throughput by orders of magnitude compared to synchronous per-request writes.

How is S3 partitioned for efficient Athena queries on access logs?

Logs are written to S3 keys prefixed by Hive-style partition columns such as 'year=2024/month=01/day=15/hour=03/', which allows Athena to use partition pruning to skip irrelevant prefixes when a WHERE clause filters on those columns. Service or region sub-partitions can be added to further narrow scans for multi-tenant environments, but cardinality should be kept low to avoid small-file problems.

How is log sampling configured for high-traffic services?

Head-based sampling applies a fixed probability (e.g., 1%) at the point of log emission so that every request has an equal chance of being recorded, reducing volume proportionally while preserving statistical representativeness for aggregate metrics. Tail-based sampling retains 100% of logs for requests that result in errors or high latency and samples the rest, ensuring full fidelity for the cases that matter most for debugging.

How are RED metrics extracted from a log stream in real time?

A stream processor (e.g., Flink or Kafka Streams) consumes the access log topic, parses each event to extract HTTP status code, response time, and endpoint, then increments counters and histograms in a windowed aggregation (e.g., 1-minute tumbling window) to produce Rate, Error rate, and Duration (p50/p95/p99) per service and route. The aggregated metrics are emitted to a time-series database like Prometheus or InfluxDB for dashboarding and alerting.

Access Log Service Low-Level Design: High-Throughput Write, Structured Logging, and Query Interface

⏱ 5 min read

What an Access Log Service Provides

Every HTTP request to a web service generates a log record. At scale, this is millions of records per minute. The access log service must ingest this stream durably at high throughput, make logs searchable for recent debugging, archive efficiently for long-term compliance, and support real-time anomaly detection without becoming the bottleneck in the request path.

Log Schema

{
  "request_id":   "550e8400-e29b-41d4-a716-446655440000",
  "timestamp":    "2024-01-15T10:23:45.123Z",
  "user_id":      "user:12345",
  "ip_address":   "203.0.113.42",
  "method":       "GET",
  "path":         "/api/v1/products",
  "query_params": "category=electronics&page=2",
  "status_code":  200,
  "latency_ms":   47,
  "user_agent":   "Mozilla/5.0 ...",
  "referer":      "https://example.com/",
  "bytes_sent":   8192,
  "trace_id":     "abc123def456",
  "service_name": "product-api"
}

All fields use consistent names across all services. Structured JSON format makes every field machine-parseable without regex extraction downstream.

Write Pipeline

The path from application to durable storage:

Application: writes structured JSON log to stdout or a local Unix socket. No network call in the request path.
Local log agent (Vector, Fluentd, Filebeat): tails stdout, buffers in a local ring buffer (survives brief agent restarts), batch-sends to Kafka every 1 second or 1,000 events — whichever comes first.
Kafka topic access-logs: durable, partitioned by service_name for ordered processing per service. Retention: 7 days.
Kafka consumers: fan out to parallel sinks — Elasticsearch for hot search, S3 for archival, Flink for real-time processing.

Batching Benefits

Writing 1,000 log events as a single compressed batch to Kafka costs roughly the same network round-trip as writing one event. Batching reduces:

Write amplification on Kafka brokers
Number of S3 PUT requests (S3 charges per request)
CPU overhead from TLS handshakes per small write

The local agent absorbs burst traffic in its ring buffer, smoothing the write rate to Kafka.

S3 Archival for Long-Term Storage

Kafka consumers write to S3 in Parquet format, partitioned by time and service:

s3://logs/{service_name}/year=2024/month=01/day=15/hour=10/part-0001.parquet.gz

Parquet's columnar format enables Athena to scan only the columns needed for a query (e.g., only status_code and latency_ms), dramatically reducing scan costs compared to JSON. GZIP compression reduces storage and transfer costs by 5-10x over raw JSON.

Athena Query Interface

AWS Athena (or any query engine over S3) is configured with an external table:

CREATE EXTERNAL TABLE access_logs (
  request_id  STRING,
  timestamp   TIMESTAMP,
  service_name STRING,
  status_code INT,
  latency_ms  INT,
  ...
)
STORED AS PARQUET
LOCATION 's3://logs/'
PARTITIONED BY (service_name STRING, year INT, month INT, day INT)

Queries targeting a specific service and date range scan only the relevant partitions — sub-second for typical analyses on billions of records.

Real-Time Stream Processing

A Flink job consumes from the access-logs Kafka topic in real time, computing:

Error rate per service per minute (alert if > 1% for 3 consecutive minutes)
Latency p99 per endpoint per minute
Unusual IP address behavior (single IP > 1,000 requests per minute → scraper alert)
Sudden traffic spike per service (3x normal rate → capacity alert)

Flink emits alerts to PagerDuty and publishes real-time metrics to Prometheus via a metrics sink.

Log Sampling

For very high-traffic services (millions of requests per minute), 100% logging is expensive. Selective sampling:

Success responses (2xx): sample 10%
Error responses (4xx, 5xx): 100%
Slow responses (latency > p99 threshold): 100%

Sampling decision is made at the log agent with a consistent hash of request_id to ensure reproducibility. Sampled metrics are upscaled by the sampling factor in dashboards.

Log Retention Policy

Hot (Elasticsearch): 7 days. Full-text search, sub-second queries for recent debugging. Expensive per GB.
Warm (S3 Parquet): 90 days. SQL queries via Athena. Low cost, seconds to minutes for complex queries.
Cold (S3 Glacier): 7 years. Required for PCI-DSS, SOC2, HIPAA compliance. Retrieval takes hours, cost is minimal.

S3 lifecycle policies automate the transition: objects move from S3 Standard → S3 Infrequent Access → S3 Glacier automatically based on age.

PII in Logs and Log-Based Metrics

Before writing to any sink, the log agent strips or hashes PII from query_params and path: remove password fields, hash user-identifying values, drop request bodies. Refer to the PII scrubber design for the detection pipeline.

The Kafka consumer also extracts RED metrics (Rate, Errors, Duration) from the log stream and pushes them to Prometheus, providing per-endpoint metrics without requiring services to instrument individually.