Core Models: ACL, RBAC, ABAC, ReBAC
Access control models define how permissions are assigned and evaluated. Each has different tradeoffs in expressiveness, scalability, and operational complexity.
ACL (Access Control List): each resource stores a list of (principal, permission) pairs. Simple, but doesn’t scale – updating a user’s access requires touching every resource they have access to. Suitable for file systems with small user counts (Unix permissions).
RBAC (Role-Based Access Control): users are assigned roles, roles have permissions. Adding a user to a role grants all that role’s permissions. Most enterprise systems use RBAC. Tradeoff: role explosion when you need fine-grained resource-level control (“editor of project X” vs “editor of project Y”).
ABAC (Attribute-Based Access Control): policies are boolean expressions over attributes of the user, resource, and environment. Example: allow if user.department == resource.department AND user.clearance >= resource.classification AND time.hour in [9, 17]. Very expressive but policies become hard to audit and debug.
ReBAC (Relationship-Based Access Control): permissions are derived from the graph of relationships between entities. “User can view document if user is member of a group that has viewer on the document’s parent folder.” Google Zanzibar is the canonical implementation. Handles complex inheritance naturally.
Database Schema for RBAC
-- Core entities
CREATE TABLE users (
id BIGINT PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE roles (
id BIGINT PRIMARY KEY,
name VARCHAR(100) UNIQUE NOT NULL, -- 'admin', 'editor', 'viewer'
description TEXT
);
CREATE TABLE permissions (
id BIGINT PRIMARY KEY,
resource VARCHAR(100) NOT NULL, -- 'post', 'user', 'billing'
action VARCHAR(50) NOT NULL, -- 'create', 'read', 'update', 'delete'
UNIQUE(resource, action)
);
-- Junction tables
CREATE TABLE user_roles (
user_id BIGINT REFERENCES users(id) ON DELETE CASCADE,
role_id BIGINT REFERENCES roles(id) ON DELETE CASCADE,
granted_by BIGINT REFERENCES users(id),
granted_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (user_id, role_id)
);
CREATE TABLE role_permissions (
role_id BIGINT REFERENCES roles(id) ON DELETE CASCADE,
permission_id BIGINT REFERENCES permissions(id) ON DELETE CASCADE,
PRIMARY KEY (role_id, permission_id)
);
-- Resource-scoped roles (editor of specific project)
CREATE TABLE user_resource_roles (
user_id BIGINT REFERENCES users(id) ON DELETE CASCADE,
role_id BIGINT REFERENCES roles(id) ON DELETE CASCADE,
resource_type VARCHAR(100) NOT NULL,
resource_id BIGINT NOT NULL,
granted_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (user_id, role_id, resource_type, resource_id)
);
CREATE INDEX idx_user_resource_roles_lookup
ON user_resource_roles(user_id, resource_type, resource_id);
Access Check Algorithm
def can_access(user_id, resource_type, resource_id, action):
# Check global roles first
global_perms = db.query("""
SELECT 1 FROM user_roles ur
JOIN role_permissions rp ON ur.role_id = rp.role_id
JOIN permissions p ON rp.permission_id = p.id
WHERE ur.user_id = %s
AND p.resource = %s
AND p.action = %s
LIMIT 1
""", [user_id, resource_type, action])
if global_perms:
return True
# Check resource-scoped roles
scoped_perms = db.query("""
SELECT 1 FROM user_resource_roles urr
JOIN role_permissions rp ON urr.role_id = rp.role_id
JOIN permissions p ON rp.permission_id = p.id
WHERE urr.user_id = %s
AND urr.resource_type = %s
AND urr.resource_id = %s
AND p.action = %s
LIMIT 1
""", [user_id, resource_type, resource_id, action])
return bool(scoped_perms)
Caching Strategy
Access checks are on the hot path. A naive implementation hits the database on every request.
Cache key structure: acl:{user_id}:{resource_type}:{resource_id}:{action} -> boolean, TTL 5 minutes.
For bulk prefetch (page load checks 20+ permissions): cache acl:user:{user_id}:roles as the set of (role_id, resource_type, resource_id) tuples, TTL 5 minutes. Derive individual permission checks locally from this set.
Invalidation on role change: when a user’s roles change, delete acl:user:{user_id}:* from Redis. Use Redis SCAN with pattern rather than KEYS to avoid blocking. Alternatively, version the cache: store a version counter per user, include it in cache key.
Negative caching: cache denied results too. Without negative caching, a user probing non-existent resources can flood the database. Use the same TTL.
Consistency tradeoff: 5-minute TTL means a revoked permission can still be exercised for up to 5 minutes. For sensitive actions (billing, delete), bypass cache or use a shorter TTL (30s). Document this explicitly in your design.
Permission Inheritance
Three models for hierarchical resources (folders containing files):
Explicit-only: permissions are not inherited. Simple, auditable, but requires setting permissions on every resource. Suitable for flat structures.
Downward inheritance: permissions flow from parent to children. If you have “edit” on a folder, you have “edit” on all files inside. Recursive check: walk up the resource tree until you find an ACL entry or reach the root.
Override model (Google Drive behavior): inherited permissions can be overridden at any level. A folder shared with “anyone can view” can contain a subfolder that is “only owner.” Implementation: store an explicit “deny” entry, or store a break-inheritance flag that stops traversal.
For deeply nested hierarchies, precompute effective permissions using a materialized permission table, updated asynchronously when ACLs change.
Audit Logging
CREATE TABLE access_logs (
id BIGINT PRIMARY KEY, -- use sequence, not UUID for clustering
user_id BIGINT NOT NULL,
resource_type VARCHAR(100),
resource_id BIGINT,
action VARCHAR(50),
decision BOOLEAN NOT NULL, -- true=allow, false=deny
reason VARCHAR(255), -- 'role:admin', 'resource_role:editor:42'
ip_address INET,
user_agent TEXT,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at); -- monthly partitions
-- Async write pattern
def log_access(event):
queue.publish('access-log-queue', json.dumps(event))
# Consumer writes to DB in batches of 500, every 1 second
Audit logs must be append-only. Use INSERT-only table with no UPDATE/DELETE permissions for the application user. Retention policy: archive to cold storage after 90 days, delete after 7 years (compliance).
API Design
# Role management
POST /roles # create role
GET /roles/{role_id} # get role details
PUT /roles/{role_id}/permissions # set permissions on role
# User role assignment
POST /users/{user_id}/roles # assign global role
DELETE /users/{user_id}/roles/{role_id}
# Resource-scoped roles
POST /resources/{type}/{id}/roles # assign role on specific resource
GET /resources/{type}/{id}/roles # list roles on resource
# The critical check endpoint
POST /access/check
# Request: {"user_id": 42, "resource_type": "post", "resource_id": 7, "action": "edit"}
# Response: {"allowed": true, "reason": "role:editor", "cached": true}
The /access/check endpoint should support bulk checks in one request to reduce round trips. Batch up to 100 checks, return an array of results.
Google Zanzibar / ReBAC Pattern
Zanzibar stores relation tuples: (object#relation@user). Examples:
doc:readme#viewer@user:alice
doc:readme#owner@user:bob
folder:eng#viewer@user:alice
doc:readme#parent@folder:eng # doc is in folder
# Namespace config defines how to expand "viewer" on a doc:
# viewer = owner | editor | (parent->viewer)
# This means: viewer of doc = union of owners, editors,
# and anyone who is viewer of parent folder
The check algorithm is a graph traversal: expand the permission expression recursively, checking tuples until you find one matching the user or exhaust all paths. Zanzibar uses a distributed cache (called “leopard”) and zookies (consistency tokens based on Spanner timestamps) to avoid serving stale results from cache after ACL writes.
Scale Considerations
Read path: access checks are extremely read-heavy. Target p99 < 5ms for cached checks, p99 < 20ms for cache misses. Use Redis cluster with read replicas. Pre-warm cache on login by loading all user roles.
Write path: role assignments are infrequent (admin operations). Write to primary DB, invalidate cache synchronously, audit log asynchronously via queue. Writes can tolerate 100-200ms latency.
Hot users: a superadmin user’s cache entry gets evicted and re-fetched frequently. Consider a local in-process cache (LRU, 1000 entries, 30s TTL) in front of Redis to absorb spikes.
Scale numbers for interviews: 10k RPS of access checks, 99% cache hit rate, 100 cache nodes each handling 10k ops/sec = comfortable headroom. DB only sees 100 RPS on cache misses.
Stripe system design interviews cover authorization and access control. See design patterns for Stripe interview: authorization and permissions system design.
Atlassian products require complex permission systems. See system design questions for Atlassian interview: permissions and access control system design.
LinkedIn system design covers enterprise access control and RBAC. See patterns for LinkedIn interview: RBAC and enterprise access control design.