Audit Log System Low-Level Design

What is an Audit Log?

An audit log is an immutable, append-only record of every significant action taken in a system: who did what, to which resource, when, and from where. Audit logs are required for compliance (SOC 2, HIPAA, PCI-DSS), security incident investigation, and debugging production issues. The key properties: tamper-evident (entries cannot be modified or deleted), complete (no events silently dropped), and queryable (quickly find all actions by a user or on a resource).

Requirements

  • Record every create/update/delete operation with actor, resource, old and new values
  • Immutable: no UPDATE or DELETE on audit records
  • Query: all actions by user X in the last 30 days; all changes to resource Y
  • Retention: 7 years for financial data (SOX), 1 year for general access logs
  • 10K events/second at peak; query response <500ms
  • Export audit report for compliance officers (date range, actor, resource type)

Data Model

AuditEvent(
    event_id    UUID PRIMARY KEY,
    tenant_id   UUID NOT NULL,
    actor_id    UUID NOT NULL,       -- user or service account who performed the action
    actor_type  ENUM(USER, SERVICE, SYSTEM),
    action      VARCHAR NOT NULL,    -- 'user.login', 'order.cancel', 'permission.grant'
    resource_type VARCHAR,           -- 'order', 'user', 'document'
    resource_id UUID,
    before_state JSONB,              -- snapshot of resource before change (nullable)
    after_state  JSONB,              -- snapshot of resource after change (nullable)
    metadata     JSONB,              -- ip_address, user_agent, request_id, etc.
    created_at   TIMESTAMPTZ NOT NULL,  -- immutable
    -- NO updated_at column — this record never changes
    CHECK (created_at IS NOT NULL)
)

-- Indexes for common query patterns
CREATE INDEX idx_audit_actor ON AuditEvent(tenant_id, actor_id, created_at DESC);
CREATE INDEX idx_audit_resource ON AuditEvent(tenant_id, resource_type, resource_id, created_at DESC);
CREATE INDEX idx_audit_action ON AuditEvent(tenant_id, action, created_at DESC);

Capturing Events

Two approaches for capturing audit events:

Application-level: call an audit logging function explicitly in every service operation:

def cancel_order(order_id, actor_id, reason):
    before = db.get_order(order_id)
    db.update_order(order_id, status='CANCELLED')
    after = db.get_order(order_id)

    audit_log.record(
        actor_id=actor_id,
        action='order.cancel',
        resource_type='order',
        resource_id=order_id,
        before_state=before.to_dict(),
        after_state=after.to_dict(),
        metadata={'reason': reason, 'ip': request.remote_addr}
    )

Database triggers: PostgreSQL triggers fire on every INSERT/UPDATE/DELETE and write to the audit table automatically. No risk of missing events if developers forget to call the audit function. Con: triggers run in the same transaction — a slow audit write blocks the main operation.

Async Write Path

For high throughput (10K events/second), don’t write audit events synchronously in the transaction:

# Application writes to Kafka (non-blocking)
def record(event_data):
    kafka.produce('audit-events', value=json.dumps({
        **event_data,
        'event_id': str(uuid4()),
        'created_at': datetime.utcnow().isoformat()
    }))

# Kafka consumer writes to audit DB (separate process)
for message in kafka.consume('audit-events'):
    db.insert(AuditEvent(**json.loads(message.value)))

Trade-off: async = eventual consistency. The audit event may arrive 1-2 seconds after the action. Acceptable for compliance queries; not for real-time security alerting (use a separate streaming pipeline for that).

Immutability Enforcement

-- PostgreSQL: revoke UPDATE and DELETE privileges on audit table
REVOKE UPDATE, DELETE ON AuditEvent FROM application_user;
-- Only INSERT is allowed; SELECT for queries

-- Append-only table: use PostgreSQL partitioning with partition by month
-- Old partitions can be moved to cold storage (S3 via foreign data wrapper)
-- after their compliance retention window expires

Tamper Detection

For high-assurance audit trails, chain events with a hash of the previous entry:

AuditEvent.chain_hash = SHA256(previous_event_id || event_id || created_at || action || resource_id)

Any modification to a past event breaks the hash chain. Verify chain integrity by replaying all hashes from the beginning. Used in financial systems and blockchain-inspired audit trails.

Key Design Decisions

  • JSONB for before/after state — flexible schema for any resource type without altering the audit table
  • Kafka for async ingestion — decouples application throughput from audit DB write speed
  • DB-level REVOKE — immutability enforced at the database, not just application, layer
  • action namespace (resource.verb) — enables precise filtering (all ‘user.*’ actions, all ‘*.delete’ actions)
  • Partitioning by month — enables cheap archival of old partitions to cold storage after retention window

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What data should an audit log capture for compliance?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”An audit log must capture: who (actor_id, actor_type), what (action using resource.verb namespace like "order.cancel"), which resource (resource_type, resource_id), when (immutable created_at timestamp), and context (ip_address, user_agent, request_id in JSONB metadata). For data change compliance (SOC 2, GDPR), also capture before_state and after_state as JSONB snapshots. This enables reconstruction of the full history of any resource.”}},{“@type”:”Question”,”name”:”How do you make an audit log tamper-evident?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”At the database level: REVOKE UPDATE, DELETE on the audit table from the application user — only INSERT is permitted. For higher assurance: chain events with a hash linking each entry to the previous one (chain_hash = SHA256(prev_event_id || event_id || created_at || action)). Any modification to a past event breaks the hash chain. Verify integrity by replaying all hashes. This approach is used in financial audit trails and blockchain-inspired systems.”}},{“@type”:”Question”,”name”:”How do you handle 10,000 audit events per second without blocking application code?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Write audit events to Kafka asynchronously (non-blocking publish). The application records the event to Kafka and continues — the audit write does not block or slow the main operation. A Kafka consumer writes events to the audit database in batches. Trade-off: there is a 1-2 second delay before the event appears in the audit log. This is acceptable for compliance queries; for real-time security alerting, add a second consumer that streams to an alerting pipeline.”}},{“@type”:”Question”,”name”:”How long do you need to retain audit logs?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Retention depends on regulatory requirements: SOX (Sarbanes-Oxley) requires 7 years for financial records; HIPAA requires 6 years for healthcare access logs; PCI-DSS requires 1 year for payment system audit trails; GDPR does not specify but 1-2 years is standard. Implement retention with PostgreSQL table partitioning by month: old partitions can be moved to cold storage (S3 via foreign data wrapper or archival export) after the hot retention window, then deleted after the compliance window.”}},{“@type”:”Question”,”name”:”What is the difference between an audit log and application logs?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Application logs (stdout, structured JSON to Datadog/Splunk) track technical events for debugging: errors, latencies, request traces. They are high-volume, short-retention, and may contain sensitive data that should not be broadly accessible. Audit logs track business events for compliance and security: who did what to which resource. They are lower-volume, long-retention (years), immutable, and queryable by compliance officers. They should be stored in a dedicated, access-controlled database — not mixed with application logs.”}}]}

Audit log and compliance system design is discussed in Atlassian system design interview questions.

Immutable audit trail and tamper-evident log design is covered in Coinbase system design interview preparation.

Audit logging and financial record compliance design is in Stripe system design interview guide.

Scroll to Top