Question 1

How do you prevent even database administrators from deleting audit logs?

Accepted Answer

Defense in depth: (1) Revoke privileges: the application service account has INSERT-only permission on the audit_log table. Even if the application is compromised, it cannot DELETE. DBAs can be further restricted by requiring dual approval for DELETE on this table. (2) Write to an immutable external store: replicate audit events to S3 with Object Lock (WORM -- Write Once Read Many). Object Lock with compliance mode prevents deletion even by AWS root users for the retention period. The primary DB cannot reach this S3 bucket to modify it. (3) Hash chaining: each entry's hash includes the previous entry. A deletion leaves a "hole" in the chain detectable during integrity verification. (4) Signed batches: batch audit events and sign with a hardware security module (HSM)-protected key. Modification invalidates the signature.

Question 2

What is the difference between an audit log and an application log?

Accepted Answer

Application logs are operational: INFO/WARN/ERROR messages for debugging, performance monitoring, and incident response. They capture system behavior: "database query took 500ms," "cache miss on key X," "null pointer exception in method Y." They are typically short-lived (retained 7-30 days), not sensitive, and accessible to engineers. Audit logs are compliance-grade: they record business actions -- who did what, to which data, when. They capture actor intent and data changes: "user admin_john@company.com deleted user account 12345 at 14:23:00 UTC." They must be retained for years (regulatory requirements), access-controlled (only security/compliance teams), tamper-proof, and complete (no missing events). A compromised system should not be able to erase its own audit trail. Application logs can be purged freely; audit logs cannot.

Question 3

How do you query audit logs efficiently during a security investigation?

Accepted Answer

Investigation queries: "all actions by user X in the last 30 days," "all access to patient record Y," "all failed login attempts in the last hour," "all admin privilege grants this quarter." Primary database (PostgreSQL): index on (actor_id, timestamp), (resource_type, resource_id, timestamp), (action_type, timestamp). These cover the most common investigation patterns. Elasticsearch: sync audit events via CDC for full-text and complex filter queries. Boolean filter queries: actor_id=X AND action_type IN [DELETE, EXPORT] AND timestamp > NOW()-30d. Kibana or Grafana dashboards for security teams. For compliance exports (e.g., all PHI access for a HIPAA audit): pre-built report queries with scheduled execution. Store report results as downloadable CSVs with their own audit trail (who requested the report, when).

Question 4

What events must be logged for SOC 2 compliance?

Accepted Answer

SOC 2 Trust Services Criteria require logging for the Security and Availability criteria: Authentication events (login, logout, failed login, password change, MFA enrollment/disable). Authorization events (privilege changes, role assignments, permission grants/revocations). Data access (read/write/delete of sensitive data). System configuration changes (firewall rule changes, security setting changes, new user provisioning). Data export events (bulk downloads, API exports). Third-party access (SSO logins, OAuth token grants). Infrastructure events (server start/stop, deployment). For each event: minimum fields are actor, timestamp, source IP, outcome, and the resource affected. SOC 2 auditors typically review log completeness and the ability to investigate specific incidents. Demonstrate that logs cannot be modified and are retained for at least 1 year (Type II audit period).

Question 5

How do you implement real-time alerting on audit log events?

Accepted Answer

Publish audit events to a Kafka topic in real time (not batch). A SIEM (Security Information and Event Management) system consumes from Kafka and evaluates alert rules. Common alert rules: (1) Brute force: > 5 failed logins for the same user in 5 minutes -- alert + auto-lock account. (2) Impossible travel: user logs in from New York then Tokyo within 1 hour -- alert for review. (3) Bulk data export: any single export of > 10,000 records -- alert security team. (4) Privilege escalation: any grant of admin role -- alert and require dual approval. (5) Off-hours access to sensitive data: access to PII records outside 9am-6pm business hours -- flag for review. Rules are evaluated as streaming queries (Flink, Spark Streaming, or SIEM rule engines like Splunk alerts). Alert routing: PagerDuty for critical (active breach), Slack for high (review needed), email digest for medium.

System Design: Audit Log — Immutable Event Trail, Compliance, and Tamper Detection

Why Audit Logs?

What to Log

Immutability and Tamper Detection

Schema and Storage

Implementation Patterns

Interview Tips