Core Models: ACL, RBAC, ABAC, ReBAC
Access control models define how permissions are assigned and evaluated. Each has different tradeoffs in expressiveness, scalability, and operational complexity.
ACL (Access Control List): each resource stores a list of (principal, permission) pairs. Simple, but doesn’t scale – updating a user’s access requires touching every resource they have access to. Suitable for file systems with small user counts (Unix permissions).
RBAC (Role-Based Access Control): users are assigned roles, roles have permissions. Adding a user to a role grants all that role’s permissions. Most enterprise systems use RBAC. Tradeoff: role explosion when you need fine-grained resource-level control (“editor of project X” vs “editor of project Y”).
ABAC (Attribute-Based Access Control): policies are boolean expressions over attributes of the user, resource, and environment. Example: allow if user.department == resource.department AND user.clearance >= resource.classification AND time.hour in [9, 17]. Very expressive but policies become hard to audit and debug.
ReBAC (Relationship-Based Access Control): permissions are derived from the graph of relationships between entities. “User can view document if user is member of a group that has viewer on the document’s parent folder.” Google Zanzibar is the canonical implementation. Handles complex inheritance naturally.
Database Schema for RBAC
-- Core entities
CREATE TABLE users (
id BIGINT PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE roles (
id BIGINT PRIMARY KEY,
name VARCHAR(100) UNIQUE NOT NULL, -- 'admin', 'editor', 'viewer'
description TEXT
);
CREATE TABLE permissions (
id BIGINT PRIMARY KEY,
resource VARCHAR(100) NOT NULL, -- 'post', 'user', 'billing'
action VARCHAR(50) NOT NULL, -- 'create', 'read', 'update', 'delete'
UNIQUE(resource, action)
);
-- Junction tables
CREATE TABLE user_roles (
user_id BIGINT REFERENCES users(id) ON DELETE CASCADE,
role_id BIGINT REFERENCES roles(id) ON DELETE CASCADE,
granted_by BIGINT REFERENCES users(id),
granted_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (user_id, role_id)
);
CREATE TABLE role_permissions (
role_id BIGINT REFERENCES roles(id) ON DELETE CASCADE,
permission_id BIGINT REFERENCES permissions(id) ON DELETE CASCADE,
PRIMARY KEY (role_id, permission_id)
);
-- Resource-scoped roles (editor of specific project)
CREATE TABLE user_resource_roles (
user_id BIGINT REFERENCES users(id) ON DELETE CASCADE,
role_id BIGINT REFERENCES roles(id) ON DELETE CASCADE,
resource_type VARCHAR(100) NOT NULL,
resource_id BIGINT NOT NULL,
granted_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (user_id, role_id, resource_type, resource_id)
);
CREATE INDEX idx_user_resource_roles_lookup
ON user_resource_roles(user_id, resource_type, resource_id);
Access Check Algorithm
def can_access(user_id, resource_type, resource_id, action):
# Check global roles first
global_perms = db.query("""
SELECT 1 FROM user_roles ur
JOIN role_permissions rp ON ur.role_id = rp.role_id
JOIN permissions p ON rp.permission_id = p.id
WHERE ur.user_id = %s
AND p.resource = %s
AND p.action = %s
LIMIT 1
""", [user_id, resource_type, action])
if global_perms:
return True
# Check resource-scoped roles
scoped_perms = db.query("""
SELECT 1 FROM user_resource_roles urr
JOIN role_permissions rp ON urr.role_id = rp.role_id
JOIN permissions p ON rp.permission_id = p.id
WHERE urr.user_id = %s
AND urr.resource_type = %s
AND urr.resource_id = %s
AND p.action = %s
LIMIT 1
""", [user_id, resource_type, resource_id, action])
return bool(scoped_perms)
Caching Strategy
Access checks are on the hot path. A naive implementation hits the database on every request.
Cache key structure: acl:{user_id}:{resource_type}:{resource_id}:{action} -> boolean, TTL 5 minutes.
For bulk prefetch (page load checks 20+ permissions): cache acl:user:{user_id}:roles as the set of (role_id, resource_type, resource_id) tuples, TTL 5 minutes. Derive individual permission checks locally from this set.
Invalidation on role change: when a user’s roles change, delete acl:user:{user_id}:* from Redis. Use Redis SCAN with pattern rather than KEYS to avoid blocking. Alternatively, version the cache: store a version counter per user, include it in cache key.
Negative caching: cache denied results too. Without negative caching, a user probing non-existent resources can flood the database. Use the same TTL.
Consistency tradeoff: 5-minute TTL means a revoked permission can still be exercised for up to 5 minutes. For sensitive actions (billing, delete), bypass cache or use a shorter TTL (30s). Document this explicitly in your design.
Permission Inheritance
Three models for hierarchical resources (folders containing files):
Explicit-only: permissions are not inherited. Simple, auditable, but requires setting permissions on every resource. Suitable for flat structures.
Downward inheritance: permissions flow from parent to children. If you have “edit” on a folder, you have “edit” on all files inside. Recursive check: walk up the resource tree until you find an ACL entry or reach the root.
Override model (Google Drive behavior): inherited permissions can be overridden at any level. A folder shared with “anyone can view” can contain a subfolder that is “only owner.” Implementation: store an explicit “deny” entry, or store a break-inheritance flag that stops traversal.
For deeply nested hierarchies, precompute effective permissions using a materialized permission table, updated asynchronously when ACLs change.
Audit Logging
CREATE TABLE access_logs (
id BIGINT PRIMARY KEY, -- use sequence, not UUID for clustering
user_id BIGINT NOT NULL,
resource_type VARCHAR(100),
resource_id BIGINT,
action VARCHAR(50),
decision BOOLEAN NOT NULL, -- true=allow, false=deny
reason VARCHAR(255), -- 'role:admin', 'resource_role:editor:42'
ip_address INET,
user_agent TEXT,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at); -- monthly partitions
-- Async write pattern
def log_access(event):
queue.publish('access-log-queue', json.dumps(event))
# Consumer writes to DB in batches of 500, every 1 second
Audit logs must be append-only. Use INSERT-only table with no UPDATE/DELETE permissions for the application user. Retention policy: archive to cold storage after 90 days, delete after 7 years (compliance).
API Design
# Role management
POST /roles # create role
GET /roles/{role_id} # get role details
PUT /roles/{role_id}/permissions # set permissions on role
# User role assignment
POST /users/{user_id}/roles # assign global role
DELETE /users/{user_id}/roles/{role_id}
# Resource-scoped roles
POST /resources/{type}/{id}/roles # assign role on specific resource
GET /resources/{type}/{id}/roles # list roles on resource
# The critical check endpoint
POST /access/check
# Request: {"user_id": 42, "resource_type": "post", "resource_id": 7, "action": "edit"}
# Response: {"allowed": true, "reason": "role:editor", "cached": true}
The /access/check endpoint should support bulk checks in one request to reduce round trips. Batch up to 100 checks, return an array of results.
Google Zanzibar / ReBAC Pattern
Zanzibar stores relation tuples: (object#relation@user). Examples:
doc:readme#viewer@user:alice
doc:readme#owner@user:bob
folder:eng#viewer@user:alice
doc:readme#parent@folder:eng # doc is in folder
# Namespace config defines how to expand "viewer" on a doc:
# viewer = owner | editor | (parent->viewer)
# This means: viewer of doc = union of owners, editors,
# and anyone who is viewer of parent folder
The check algorithm is a graph traversal: expand the permission expression recursively, checking tuples until you find one matching the user or exhaust all paths. Zanzibar uses a distributed cache (called “leopard”) and zookies (consistency tokens based on Spanner timestamps) to avoid serving stale results from cache after ACL writes.
Scale Considerations
Read path: access checks are extremely read-heavy. Target p99 < 5ms for cached checks, p99 < 20ms for cache misses. Use Redis cluster with read replicas. Pre-warm cache on login by loading all user roles.
Write path: role assignments are infrequent (admin operations). Write to primary DB, invalidate cache synchronously, audit log asynchronously via queue. Writes can tolerate 100-200ms latency.
Hot users: a superadmin user’s cache entry gets evicted and re-fetched frequently. Consider a local in-process cache (LRU, 1000 entries, 30s TTL) in front of Redis to absorb spikes.
Scale numbers for interviews: 10k RPS of access checks, 99% cache hit rate, 100 cache nodes each handling 10k ops/sec = comfortable headroom. DB only sees 100 RPS on cache misses.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is the difference between RBAC, ABAC, and ReBAC?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”RBAC (Role-Based): users get roles, roles get permissions. Access check = does user have a role with the required permission? Simple, auditable, most common for B2B SaaS. ABAC (Attribute-Based): permissions are policies evaluated against subject attributes, resource attributes, and environment. Example: "user.department == document.department AND time == business_hours". Flexible but complex and slower to evaluate. ReBAC (Relationship-Based): permissions derived from graph relationships between subjects and resources. "User can view doc if user is a member of the group that owns the folder containing the doc." Used by Google Zanzibar, Notion, Carta. Best for hierarchical or social-graph resources.”}},{“@type”:”Question”,”name”:”How do you cache permission checks efficiently?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Cache each user's effective permissions as a blob in Redis: key = perms:{user_id}, value = JSON list of {resource_type, action, resource_id} tuples. TTL = 5 minutes. Cache is built lazily on first access check and reused for all subsequent checks. Invalidation: when a role is granted or revoked, immediately DEL perms:{affected_user_id} from Redis. For bulk role changes (adding a permission to a role that many users have), query all users with that role and pipeline-DEL all their cache keys. Cache DENY results too with shorter TTL (60 seconds) to prevent DB hammering for unauthorized users.”}},{“@type”:”Question”,”name”:”How does permission inheritance work in hierarchical resources?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Three common models: Explicit-only – every permission assignment is explicit, nothing is inherited. Simple and auditable but verbose. Downward inheritance – permissions on a parent resource propagate to all children. If a user has "read" on a project, they can read all documents in it. Requires walking up the resource hierarchy on access checks (or pre-computing inherited permissions and caching). Override model – inherited by default, but child resources can have explicit permissions overriding the parent. Used by Google Drive: a file can be shared with specific people even if it lives in a private folder. Implementation: check explicit permissions first, then fall back to inherited from parent.”}},{“@type”:”Question”,”name”:”How do you prevent authorization bypass in an access control system?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Four principles: (1) Deny by default – if no matching permission is found, deny. Never allow unless explicitly permitted. (2) Check at every layer – do not rely on the UI to hide unauthorized actions; enforce in the API handler and in critical business logic. (3) Use the resource ID from the authenticated context, not from user input, when scoping access checks – prevents IDOR attacks. (4) Log all access decisions including denials – audit logs reveal probing attacks. For token-based systems: validate the token is not revoked and the permissions claim matches the database (do not trust JWT claims alone for high-sensitivity operations).”}},{“@type”:”Question”,”name”:”What is Google Zanzibar and how does it work?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Zanzibar is Google's global authorization system serving 10+ products (Drive, YouTube, Calendar). Core model: relation tuples (object, relation, user) – e.g., (doc:readme, owner, user:alice) or (folder:home, parent, doc:readme). Access check: evaluate a "check" expression by traversing the relation graph – can user:alice perform the "viewer" action on doc:readme? The system looks for any path from alice to the doc via the viewer relation, including through group membership and folder hierarchies. Key design: consistent snapshots called "zookies" prevent TOCTOU races, and the system achieves single-digit millisecond p99 latency at global scale through aggressive caching of intermediate results.”}}]}Stripe system design interviews cover authorization and access control. See design patterns for Stripe interview: authorization and permissions system design.
Atlassian products require complex permission systems. See system design questions for Atlassian interview: permissions and access control system design.
LinkedIn system design covers enterprise access control and RBAC. See patterns for LinkedIn interview: RBAC and enterprise access control design.