How does hash chaining detect tampering in an audit log?

Each record stores the SHA-256 hash of its own fields combined with the hash of the previous record. To verify integrity, recompute each hash and confirm it matches the stored value and that the prev_hash field matches the predecessor record_hash. Any deletion, modification, or insertion in the middle breaks the chain at that point.

How is append-only enforced at the database level?

Two mechanisms work together: a BEFORE UPDATE OR DELETE trigger raises an exception for any modification attempt, and the application database role is granted INSERT and SELECT privileges only, with no UPDATE, DELETE, or TRUNCATE on the audit table.

What retention periods apply to audit logs for compliance?

Retention depends on regulation and event type. Financial transaction events typically require 7 years under SOX. Authentication and access events are commonly retained for 1 year. Debug and informational events may be kept for 90 days. A RetentionPolicy table maps event types to durations so that partition archival is automated.

How do you paginate audit queries efficiently at scale?

Use a keyset cursor on (created_at, id) rather than OFFSET. The query filters WHERE (created_at, id) > (cursor_ts, cursor_id) with an ORDER BY created_at, id LIMIT N. A compound index on (actor_id, created_at, id) satisfies actor-filtered queries without a sort step and remains stable under concurrent inserts.

How does hash chaining detect tampering in audit logs?

Each record stores a hash of (previous_record_hash + current_record_fields); verifying integrity means recomputing the hash chain from genesis to the latest record and checking for any mismatch.

How is the append-only guarantee enforced at the database level?

A database trigger raises an exception on any UPDATE or DELETE against the audit_event table; additionally, the application role is granted INSERT-only privileges, preventing any non-insert operation.

How are compliance queries executed efficiently on a large partitioned table?

Queries include the created_at range so the planner prunes irrelevant partitions; a GIN index on (actor_id, resource_type, resource_id) accelerates filtered lookups within a partition.

How is the 7-year retention enforced for financial events?

RetentionPolicy rows define per event_type retention periods; a maintenance job checks partition ages against the policy and drops (or archives to cold storage) only partitions entirely outside the retention window.

Audit Logging System Low-Level Design: Tamper-Evident Records, Structured Events, and Compliance Queries

⏱ 8 min read

An audit logging system records every significant action in a platform — who did what to which resource, when, and with what outcome — in a tamper-evident, append-only store. This guide covers the low-level design: structured event schema, append-only enforcement, hash chaining for tamper detection, partition-based retention, and a compliance query API.

1. Structured Event Schema

Each audit event captures five dimensions:

Who: actor_id (user or service identifier), actor_type (HUMAN, SERVICE, SYSTEM)
What: action (CREATE, READ, UPDATE, DELETE, LOGIN, EXPORT, …), resource_type, resource_id
Context: ip_address, user_agent, request_id (correlates with application trace)
Outcome: SUCCESS or FAILURE (with error code)
State diff: before_state JSONB and after_state JSONB — capture only changed fields for UPDATE events to keep storage bounded

-- Audit event table, partitioned by month
CREATE TABLE AuditEvent (
    id            BIGSERIAL,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    actor_id      TEXT NOT NULL,
    actor_type    TEXT NOT NULL,
    action        TEXT NOT NULL,
    resource_type TEXT NOT NULL,
    resource_id   TEXT NOT NULL,
    outcome       TEXT NOT NULL,   -- SUCCESS | FAILURE
    error_code    TEXT,
    ip_address    INET,
    user_agent    TEXT,
    request_id    UUID,
    before_state  JSONB,
    after_state   JSONB,
    prev_hash     TEXT NOT NULL,   -- SHA-256 of previous record
    record_hash   TEXT NOT NULL    -- SHA-256 of this record fields + prev_hash
) PARTITION BY RANGE (created_at);

-- Monthly partitions (created by automated job or pg_partman)
CREATE TABLE AuditEvent_2025_01
    PARTITION OF AuditEvent
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

-- Indexes on common filter columns
CREATE INDEX ON AuditEvent (actor_id, created_at);
CREATE INDEX ON AuditEvent (resource_type, resource_id, created_at);
CREATE INDEX ON AuditEvent (action, created_at);

2. Append-Only Enforcement

Application code must never issue UPDATE or DELETE on AuditEvent. Enforce this at two levels:

DB trigger: a BEFORE UPDATE OR DELETE trigger raises an exception unconditionally.
DB role: the application database user is granted INSERT and SELECT only — no UPDATE, DELETE, or TRUNCATE on AuditEvent.

CREATE OR REPLACE FUNCTION audit_immutable()
RETURNS TRIGGER LANGUAGE plpgsql AS $$
BEGIN
    RAISE EXCEPTION 'Audit log is immutable: % not allowed', TG_OP;
END;
$$;

CREATE TRIGGER audit_no_modify
BEFORE UPDATE OR DELETE ON AuditEvent
FOR EACH ROW EXECUTE FUNCTION audit_immutable();

Partition detachment (for archival) is a privileged DBA operation logged separately and requires a second approver in the runbook.

3. Hash Chaining for Tamper Detection

Each record stores prev_hash (the record_hash of the immediately preceding row) and record_hash (SHA-256 of the concatenation of all record fields plus prev_hash). The genesis record uses prev_hash = '0' * 64.

To verify integrity of a range [A, B]:

Fetch all records ordered by id from A to B.
For each record, recompute SHA-256(fields + prev_hash) and compare to stored record_hash.
Verify that record[n].prev_hash == record[n-1].record_hash.
Any mismatch indicates tampering or gap at that position.

For large tables, verify in chunks of 10,000 rows and publish the chain head hash to an external notary (e.g., a public blockchain timestamp or a signed S3 object) periodically.

4. Partitioning and Retention

Monthly RANGE partitions allow dropping old data by detaching a partition instead of row-level deletes — avoiding table bloat and lock contention.
After detachment, the partition is compressed (pg_dump to zstd) and moved to cold storage (S3 Glacier).
A RetentionPolicy table maps event types to retention durations. Financial transactions: 7 years (SOX). Auth events: 1 year. Debug events: 90 days.
A nightly job computes which partitions are past retention for all their event types and schedules detachment.

5. Compliance Query API

Queries use a cursor of (created_at, id) for stable pagination that survives concurrent inserts:

-- Fetch next page after cursor
SELECT * FROM AuditEvent
WHERE actor_id = $1
  AND created_at BETWEEN $2 AND $3
  AND (created_at, id) > ($4, $5)   -- cursor
ORDER BY created_at, id
LIMIT 100;

Compound index on (actor_id, created_at, id) satisfies this query without a sort step.

6. Python Reference Implementation

import hashlib, json, time, uuid
from typing import Optional, Any

GENESIS_HASH = "0" * 64

def _compute_hash(prev_hash: str, fields: dict) -> str:
    payload = prev_hash + json.dumps(fields, sort_keys=True, default=str)
    return hashlib.sha256(payload.encode()).hexdigest()

def log_event(
    actor_id: str,
    actor_type: str,
    action: str,
    resource_type: str,
    resource_id: str,
    outcome: str,
    ip_address: Optional[str] = None,
    request_id: Optional[str] = None,
    before: Optional[Any] = None,
    after: Optional[Any] = None,
) -> int:
    """Insert a new audit event; compute and chain hashes."""
    # Serialize state diffs — only store changed keys for UPDATE
    before_state = json.dumps(before) if before is not None else None
    after_state  = json.dumps(after)  if after  is not None else None

    # Fetch previous hash with row lock to serialize inserts
    row = db.query_one(
        "SELECT record_hash FROM AuditEvent ORDER BY id DESC LIMIT 1 FOR UPDATE"
    )
    prev_hash = row.record_hash if row else GENESIS_HASH

    fields = dict(
        actor_id=actor_id, actor_type=actor_type, action=action,
        resource_type=resource_type, resource_id=resource_id,
        outcome=outcome, ip_address=ip_address, request_id=request_id,
        before_state=before_state, after_state=after_state,
    )
    record_hash = _compute_hash(prev_hash, fields)

    event_id = db.execute_returning(
        """INSERT INTO AuditEvent
           (actor_id, actor_type, action, resource_type, resource_id, outcome,
            ip_address, request_id, before_state, after_state, prev_hash, record_hash)
           VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s) RETURNING id""",
        [actor_id, actor_type, action, resource_type, resource_id, outcome,
         ip_address, request_id, before_state, after_state, prev_hash, record_hash]
    )
    return event_id

def verify_chain(from_id: int, to_id: int) -> bool:
    """Recompute and verify hash chain over a range of records."""
    rows = db.query(
        "SELECT * FROM AuditEvent WHERE id BETWEEN %s AND %s ORDER BY id",
        [from_id, to_id]
    )
    expected_prev = rows[0].prev_hash
    for row in rows:
        if row.prev_hash != expected_prev:
            print(f"Chain break at id={row.id}: expected prev_hash {expected_prev}")
            return False
        fields = {k: getattr(row, k) for k in
                  ["actor_id","actor_type","action","resource_type","resource_id",
                   "outcome","ip_address","request_id","before_state","after_state"]}
        computed = _compute_hash(row.prev_hash, fields)
        if computed != row.record_hash:
            print(f"Hash mismatch at id={row.id}")
            return False
        expected_prev = row.record_hash
    return True

def query_audit(filters: dict, cursor: Optional[tuple] = None, page_size: int = 100) -> list:
    """Paginated compliance query with optional cursor (created_at, id)."""
    conditions = []
    params = []
    if "actor_id" in filters:
        conditions.append("actor_id = %s"); params.append(filters["actor_id"])
    if "resource_type" in filters:
        conditions.append("resource_type = %s"); params.append(filters["resource_type"])
    if "action" in filters:
        conditions.append("action = %s"); params.append(filters["action"])
    if "from_date" in filters:
        conditions.append("created_at >= %s"); params.append(filters["from_date"])
    if "to_date" in filters:
        conditions.append("created_at  (%s, %s)")
        params.extend(cursor)
    where = ("WHERE " + " AND ".join(conditions)) if conditions else ""
    params.append(page_size)
    return db.query(
        f"SELECT * FROM AuditEvent {where} ORDER BY created_at, id LIMIT %s",
        params
    )

7. Operational Considerations

Write throughput: the row-level lock for hash chaining serializes inserts. For high-throughput systems, use a dedicated audit writer service with an in-memory queue and micro-batch inserts (batch 100 events, chain within the batch, single transaction). Alternatively, use asynchronous chaining: insert without hash, then a background job chains in order — but this creates a window where records are unchained.
Index bloat: monthly partition pruning keeps index sizes bounded; use BRIN indexes on created_at within each partition for range scans.
Export for SIEM: tail the AuditEvent table with logical replication or a Change Data Capture (CDC) tool and stream to Splunk / Elasticsearch.