Low Level Design: Access Log Service – Tech Interview Dot Org

What Is an Access Log Service?

An access log service captures structured HTTP request/response details from every service in the platform, stores them durably, makes them searchable, enforces retention policies, and supports compliance exports.

AccessLog Record Schema

{
  request_id:           UUID,
  timestamp:            ISO8601,
  method:               TEXT,       -- GET, POST, etc.
  path:                 TEXT,
  query_string:         TEXT,       -- PII-masked (see below)
  status_code:          INT,
  latency_ms:           INT,
  request_size_bytes:   INT,
  response_size_bytes:  INT,
  client_ip:            TEXT,
  user_agent:           TEXT,
  user_id:              UUID nullable,
  api_key:              TEXT nullable,
  upstream_service:     TEXT,
  trace_id:             TEXT
}

Log Emission

Each gateway or service emits a log event to Kafka after the response is sent. Emission is fire-and-forget and non-blocking — it must not add latency to the request path. Use an in-process async queue with a background producer thread/goroutine. If Kafka is unavailable, buffer locally and retry; drop oldest entries if the buffer exceeds a size limit.

Kafka to Elasticsearch Pipeline

A Kafka consumer group reads log events and bulk-writes to Elasticsearch:

Batch size: 1000 records per bulk request
Flush interval: 5 seconds (whichever comes first — size or time)
On Elasticsearch error: retry with exponential backoff, dead-letter to a separate Kafka topic after 3 failures

Elasticsearch Index Design

Index pattern: access-logs-YYYY-MM-DD  (daily rolling)

ILM policy:
  hot   → 7 days   (primary shards, full search)
  warm  → 30 days  (read-only, compressed)
  delete → 90 days

Use keyword type for status_code, method, upstream_service, user_id, api_key. Use date type for timestamp. Use integer for latency_ms. Avoid indexing query_string as analyzed text — store only.

Search API

GET /logs?path=/api/v1/users&status=500&user_id=abc&from=2026-04-17T00:00:00Z&to=2026-04-17T23:59:59Z&page=1&page_size=50

Translates to an Elasticsearch bool query with range on timestamp and term filters on requested fields. Returns paginated results with total, page, results[]. Cap page_size at 500 to prevent expensive deep pagination.

IP Geolocation Enrichment

At ingest time (in the Kafka consumer before writing to Elasticsearch), look up client_ip using the MaxMind GeoIP2 database (loaded into memory). Append geo_country and geo_city fields to the log record. The GeoIP2 database is updated weekly via a background job.

PII Masking

Before writing to Elasticsearch, parse the query_string and mask values for sensitive parameter names: password, token, secret, api_key, auth. Replace the value with ***.

Input:  ?user=alice&password=hunter2&token=abc123
Output: ?user=alice&password=***&token=***

Compliance Export

Support exporting all access logs for a given user_id and date range to S3 in Parquet format (columnar, compressed with Snappy). Used for GDPR data subject requests and SOC2 audit evidence. Export is triggered via an async job API; the caller polls for completion and receives a presigned S3 URL.

Sampling for High-Volume Endpoints

For endpoints receiving more than 1000 requests per second, apply 10% sampling for DEBUG-level log events. Error-level events (status >= 500) and slow requests (latency_ms > 1000) are always written regardless of sampling. Sampling decision is made at emission time using a deterministic hash of request_id to ensure consistency across retries.

Summary

An access log service decouples log emission from storage via Kafka, enriches records at ingest with geolocation and PII masking, indexes into daily-rolling Elasticsearch indices with ILM-based retention, and supports fast search, compliance export to S3 Parquet, and intelligent sampling for high-volume endpoints.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How should access logs be emitted without adding request latency?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Log emission must be fire-and-forget and non-blocking. After the response is sent to the client, the service enqueues the log event to an in-process async queue. A background producer thread forwards events to Kafka. If Kafka is unavailable, events are buffered locally and retried; oldest entries are dropped if the buffer exceeds its size limit.”
}
},
{
“@type”: “Question”,
“name”: “How do you structure Elasticsearch indices for access logs?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use daily rolling indices with the pattern access-logs-YYYY-MM-DD. Apply an ILM policy: 7 days hot (full search on primary shards), 30 days warm (read-only, compressed), delete at 90 days. Use keyword type for status_code, method, user_id, and upstream_service fields to enable fast exact-match filtering.”
}
},
{
“@type”: “Question”,
“name”: “How do you mask PII in access log query strings?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “At ingest time in the Kafka consumer, parse the query_string and identify sensitive parameter names such as password, token, secret, api_key, and auth. Replace each sensitive value with *** before writing to Elasticsearch. The original raw string is never persisted.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle log volume for high-traffic endpoints?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Apply sampling at emission time for endpoints exceeding 1000 requests per second. Sample DEBUG-level log events at 10% using a deterministic hash of the request_id so retry decisions are consistent. Error-level events (status >= 500) and slow requests (latency above 1000ms) are always written regardless of the sampling rate.”
}
}
]
}