What is an Audit Log?
An audit log is an immutable, append-only record of every significant action taken in a system: who did what, to which resource, when, and from where. Audit logs are required for compliance (SOC 2, HIPAA, PCI-DSS), security incident investigation, and debugging production issues. The key properties: tamper-evident (entries cannot be modified or deleted), complete (no events silently dropped), and queryable (quickly find all actions by a user or on a resource).
Requirements
- Record every create/update/delete operation with actor, resource, old and new values
- Immutable: no UPDATE or DELETE on audit records
- Query: all actions by user X in the last 30 days; all changes to resource Y
- Retention: 7 years for financial data (SOX), 1 year for general access logs
- 10K events/second at peak; query response <500ms
- Export audit report for compliance officers (date range, actor, resource type)
Data Model
AuditEvent(
event_id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
actor_id UUID NOT NULL, -- user or service account who performed the action
actor_type ENUM(USER, SERVICE, SYSTEM),
action VARCHAR NOT NULL, -- 'user.login', 'order.cancel', 'permission.grant'
resource_type VARCHAR, -- 'order', 'user', 'document'
resource_id UUID,
before_state JSONB, -- snapshot of resource before change (nullable)
after_state JSONB, -- snapshot of resource after change (nullable)
metadata JSONB, -- ip_address, user_agent, request_id, etc.
created_at TIMESTAMPTZ NOT NULL, -- immutable
-- NO updated_at column — this record never changes
CHECK (created_at IS NOT NULL)
)
-- Indexes for common query patterns
CREATE INDEX idx_audit_actor ON AuditEvent(tenant_id, actor_id, created_at DESC);
CREATE INDEX idx_audit_resource ON AuditEvent(tenant_id, resource_type, resource_id, created_at DESC);
CREATE INDEX idx_audit_action ON AuditEvent(tenant_id, action, created_at DESC);
Capturing Events
Two approaches for capturing audit events:
Application-level: call an audit logging function explicitly in every service operation:
def cancel_order(order_id, actor_id, reason):
before = db.get_order(order_id)
db.update_order(order_id, status='CANCELLED')
after = db.get_order(order_id)
audit_log.record(
actor_id=actor_id,
action='order.cancel',
resource_type='order',
resource_id=order_id,
before_state=before.to_dict(),
after_state=after.to_dict(),
metadata={'reason': reason, 'ip': request.remote_addr}
)
Database triggers: PostgreSQL triggers fire on every INSERT/UPDATE/DELETE and write to the audit table automatically. No risk of missing events if developers forget to call the audit function. Con: triggers run in the same transaction — a slow audit write blocks the main operation.
Async Write Path
For high throughput (10K events/second), don’t write audit events synchronously in the transaction:
# Application writes to Kafka (non-blocking)
def record(event_data):
kafka.produce('audit-events', value=json.dumps({
**event_data,
'event_id': str(uuid4()),
'created_at': datetime.utcnow().isoformat()
}))
# Kafka consumer writes to audit DB (separate process)
for message in kafka.consume('audit-events'):
db.insert(AuditEvent(**json.loads(message.value)))
Trade-off: async = eventual consistency. The audit event may arrive 1-2 seconds after the action. Acceptable for compliance queries; not for real-time security alerting (use a separate streaming pipeline for that).
Immutability Enforcement
-- PostgreSQL: revoke UPDATE and DELETE privileges on audit table REVOKE UPDATE, DELETE ON AuditEvent FROM application_user; -- Only INSERT is allowed; SELECT for queries -- Append-only table: use PostgreSQL partitioning with partition by month -- Old partitions can be moved to cold storage (S3 via foreign data wrapper) -- after their compliance retention window expires
Tamper Detection
For high-assurance audit trails, chain events with a hash of the previous entry:
AuditEvent.chain_hash = SHA256(previous_event_id || event_id || created_at || action || resource_id)
Any modification to a past event breaks the hash chain. Verify chain integrity by replaying all hashes from the beginning. Used in financial systems and blockchain-inspired audit trails.
Key Design Decisions
- JSONB for before/after state — flexible schema for any resource type without altering the audit table
- Kafka for async ingestion — decouples application throughput from audit DB write speed
- DB-level REVOKE — immutability enforced at the database, not just application, layer
- action namespace (resource.verb) — enables precise filtering (all ‘user.*’ actions, all ‘*.delete’ actions)
- Partitioning by month — enables cheap archival of old partitions to cold storage after retention window
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What data should an audit log capture for compliance?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”An audit log must capture: who (actor_id, actor_type), what (action using resource.verb namespace like "order.cancel"), which resource (resource_type, resource_id), when (immutable created_at timestamp), and context (ip_address, user_agent, request_id in JSONB metadata). For data change compliance (SOC 2, GDPR), also capture before_state and after_state as JSONB snapshots. This enables reconstruction of the full history of any resource.”}},{“@type”:”Question”,”name”:”How do you make an audit log tamper-evident?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”At the database level: REVOKE UPDATE, DELETE on the audit table from the application user — only INSERT is permitted. For higher assurance: chain events with a hash linking each entry to the previous one (chain_hash = SHA256(prev_event_id || event_id || created_at || action)). Any modification to a past event breaks the hash chain. Verify integrity by replaying all hashes. This approach is used in financial audit trails and blockchain-inspired systems.”}},{“@type”:”Question”,”name”:”How do you handle 10,000 audit events per second without blocking application code?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Write audit events to Kafka asynchronously (non-blocking publish). The application records the event to Kafka and continues — the audit write does not block or slow the main operation. A Kafka consumer writes events to the audit database in batches. Trade-off: there is a 1-2 second delay before the event appears in the audit log. This is acceptable for compliance queries; for real-time security alerting, add a second consumer that streams to an alerting pipeline.”}},{“@type”:”Question”,”name”:”How long do you need to retain audit logs?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Retention depends on regulatory requirements: SOX (Sarbanes-Oxley) requires 7 years for financial records; HIPAA requires 6 years for healthcare access logs; PCI-DSS requires 1 year for payment system audit trails; GDPR does not specify but 1-2 years is standard. Implement retention with PostgreSQL table partitioning by month: old partitions can be moved to cold storage (S3 via foreign data wrapper or archival export) after the hot retention window, then deleted after the compliance window.”}},{“@type”:”Question”,”name”:”What is the difference between an audit log and application logs?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Application logs (stdout, structured JSON to Datadog/Splunk) track technical events for debugging: errors, latencies, request traces. They are high-volume, short-retention, and may contain sensitive data that should not be broadly accessible. Audit logs track business events for compliance and security: who did what to which resource. They are lower-volume, long-retention (years), immutable, and queryable by compliance officers. They should be stored in a dedicated, access-controlled database — not mixed with application logs.”}}]}
Audit log and compliance system design is discussed in Atlassian system design interview questions.
Immutable audit trail and tamper-evident log design is covered in Coinbase system design interview preparation.
Audit logging and financial record compliance design is in Stripe system design interview guide.