What is the difference between RBAC, ABAC, and ReBAC?

RBAC (Role-Based): users get roles, roles get permissions. Access check = does user have a role with the required permission? Simple, auditable, most common for B2B SaaS. ABAC (Attribute-Based): permissions are policies evaluated against subject attributes, resource attributes, and environment. Example: "user.department == document.department AND time == business_hours". Flexible but complex and slower to evaluate. ReBAC (Relationship-Based): permissions derived from graph relationships between subjects and resources. "User can view doc if user is a member of the group that owns the folder containing the doc." Used by Google Zanzibar, Notion, Carta. Best for hierarchical or social-graph resources.

How do you cache permission checks efficiently?

Cache each user's effective permissions as a blob in Redis: key = perms:{user_id}, value = JSON list of {resource_type, action, resource_id} tuples. TTL = 5 minutes. Cache is built lazily on first access check and reused for all subsequent checks. Invalidation: when a role is granted or revoked, immediately DEL perms:{affected_user_id} from Redis. For bulk role changes (adding a permission to a role that many users have), query all users with that role and pipeline-DEL all their cache keys. Cache DENY results too with shorter TTL (60 seconds) to prevent DB hammering for unauthorized users.

How does permission inheritance work in hierarchical resources?

Three common models: Explicit-only - every permission assignment is explicit, nothing is inherited. Simple and auditable but verbose. Downward inheritance - permissions on a parent resource propagate to all children. If a user has "read" on a project, they can read all documents in it. Requires walking up the resource hierarchy on access checks (or pre-computing inherited permissions and caching). Override model - inherited by default, but child resources can have explicit permissions overriding the parent. Used by Google Drive: a file can be shared with specific people even if it lives in a private folder. Implementation: check explicit permissions first, then fall back to inherited from parent.

How do you prevent authorization bypass in an access control system?

Four principles: (1) Deny by default - if no matching permission is found, deny. Never allow unless explicitly permitted. (2) Check at every layer - do not rely on the UI to hide unauthorized actions; enforce in the API handler and in critical business logic. (3) Use the resource ID from the authenticated context, not from user input, when scoping access checks - prevents IDOR attacks. (4) Log all access decisions including denials - audit logs reveal probing attacks. For token-based systems: validate the token is not revoked and the permissions claim matches the database (do not trust JWT claims alone for high-sensitivity operations).

What is Google Zanzibar and how does it work?

Zanzibar is Google's global authorization system serving 10+ products (Drive, YouTube, Calendar). Core model: relation tuples (object, relation, user) - e.g., (doc:readme, owner, user:alice) or (folder:home, parent, doc:readme). Access check: evaluate a "check" expression by traversing the relation graph - can user:alice perform the "viewer" action on doc:readme? The system looks for any path from alice to the doc via the viewer relation, including through group membership and folder hierarchies. Key design: consistent snapshots called "zookies" prevent TOCTOU races, and the system achieves single-digit millisecond p99 latency at global scale through aggressive caching of intermediate results.

Access Control System Low-Level Design

⏱ 9 min read

Core Models: ACL, RBAC, ABAC, ReBAC

Access control models define how permissions are assigned and evaluated. Each has different tradeoffs in expressiveness, scalability, and operational complexity.

ACL (Access Control List): each resource stores a list of (principal, permission) pairs. Simple, but doesn’t scale – updating a user’s access requires touching every resource they have access to. Suitable for file systems with small user counts (Unix permissions).

RBAC (Role-Based Access Control): users are assigned roles, roles have permissions. Adding a user to a role grants all that role’s permissions. Most enterprise systems use RBAC. Tradeoff: role explosion when you need fine-grained resource-level control (“editor of project X” vs “editor of project Y”).

ABAC (Attribute-Based Access Control): policies are boolean expressions over attributes of the user, resource, and environment. Example: allow if user.department == resource.department AND user.clearance >= resource.classification AND time.hour in [9, 17]. Very expressive but policies become hard to audit and debug.

ReBAC (Relationship-Based Access Control): permissions are derived from the graph of relationships between entities. “User can view document if user is member of a group that has viewer on the document’s parent folder.” Google Zanzibar is the canonical implementation. Handles complex inheritance naturally.

Database Schema for RBAC

-- Core entities
CREATE TABLE users (
    id          BIGINT PRIMARY KEY,
    email       VARCHAR(255) UNIQUE NOT NULL,
    created_at  TIMESTAMP DEFAULT NOW()
);

CREATE TABLE roles (
    id          BIGINT PRIMARY KEY,
    name        VARCHAR(100) UNIQUE NOT NULL,  -- 'admin', 'editor', 'viewer'
    description TEXT
);

CREATE TABLE permissions (
    id          BIGINT PRIMARY KEY,
    resource    VARCHAR(100) NOT NULL,  -- 'post', 'user', 'billing'
    action      VARCHAR(50)  NOT NULL,  -- 'create', 'read', 'update', 'delete'
    UNIQUE(resource, action)
);

-- Junction tables
CREATE TABLE user_roles (
    user_id     BIGINT REFERENCES users(id) ON DELETE CASCADE,
    role_id     BIGINT REFERENCES roles(id) ON DELETE CASCADE,
    granted_by  BIGINT REFERENCES users(id),
    granted_at  TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (user_id, role_id)
);

CREATE TABLE role_permissions (
    role_id        BIGINT REFERENCES roles(id) ON DELETE CASCADE,
    permission_id  BIGINT REFERENCES permissions(id) ON DELETE CASCADE,
    PRIMARY KEY (role_id, permission_id)
);

-- Resource-scoped roles (editor of specific project)
CREATE TABLE user_resource_roles (
    user_id       BIGINT REFERENCES users(id) ON DELETE CASCADE,
    role_id       BIGINT REFERENCES roles(id) ON DELETE CASCADE,
    resource_type VARCHAR(100) NOT NULL,
    resource_id   BIGINT NOT NULL,
    granted_at    TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (user_id, role_id, resource_type, resource_id)
);

CREATE INDEX idx_user_resource_roles_lookup
    ON user_resource_roles(user_id, resource_type, resource_id);

Access Check Algorithm

def can_access(user_id, resource_type, resource_id, action):
    # Check global roles first
    global_perms = db.query("""
        SELECT 1 FROM user_roles ur
        JOIN role_permissions rp ON ur.role_id = rp.role_id
        JOIN permissions p ON rp.permission_id = p.id
        WHERE ur.user_id = %s
          AND p.resource = %s
          AND p.action = %s
        LIMIT 1
    """, [user_id, resource_type, action])

    if global_perms:
        return True

    # Check resource-scoped roles
    scoped_perms = db.query("""
        SELECT 1 FROM user_resource_roles urr
        JOIN role_permissions rp ON urr.role_id = rp.role_id
        JOIN permissions p ON rp.permission_id = p.id
        WHERE urr.user_id = %s
          AND urr.resource_type = %s
          AND urr.resource_id = %s
          AND p.action = %s
        LIMIT 1
    """, [user_id, resource_type, resource_id, action])

    return bool(scoped_perms)

Caching Strategy

Access checks are on the hot path. A naive implementation hits the database on every request.

Cache key structure: acl:{user_id}:{resource_type}:{resource_id}:{action} -> boolean, TTL 5 minutes.

For bulk prefetch (page load checks 20+ permissions): cache acl:user:{user_id}:roles as the set of (role_id, resource_type, resource_id) tuples, TTL 5 minutes. Derive individual permission checks locally from this set.

Invalidation on role change: when a user’s roles change, delete acl:user:{user_id}:* from Redis. Use Redis SCAN with pattern rather than KEYS to avoid blocking. Alternatively, version the cache: store a version counter per user, include it in cache key.

Negative caching: cache denied results too. Without negative caching, a user probing non-existent resources can flood the database. Use the same TTL.

Consistency tradeoff: 5-minute TTL means a revoked permission can still be exercised for up to 5 minutes. For sensitive actions (billing, delete), bypass cache or use a shorter TTL (30s). Document this explicitly in your design.

Permission Inheritance

Three models for hierarchical resources (folders containing files):

Explicit-only: permissions are not inherited. Simple, auditable, but requires setting permissions on every resource. Suitable for flat structures.

Downward inheritance: permissions flow from parent to children. If you have “edit” on a folder, you have “edit” on all files inside. Recursive check: walk up the resource tree until you find an ACL entry or reach the root.

Override model (Google Drive behavior): inherited permissions can be overridden at any level. A folder shared with “anyone can view” can contain a subfolder that is “only owner.” Implementation: store an explicit “deny” entry, or store a break-inheritance flag that stops traversal.

For deeply nested hierarchies, precompute effective permissions using a materialized permission table, updated asynchronously when ACLs change.

Audit Logging

CREATE TABLE access_logs (
    id            BIGINT PRIMARY KEY,  -- use sequence, not UUID for clustering
    user_id       BIGINT NOT NULL,
    resource_type VARCHAR(100),
    resource_id   BIGINT,
    action        VARCHAR(50),
    decision      BOOLEAN NOT NULL,    -- true=allow, false=deny
    reason        VARCHAR(255),        -- 'role:admin', 'resource_role:editor:42'
    ip_address    INET,
    user_agent    TEXT,
    created_at    TIMESTAMP NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);  -- monthly partitions

-- Async write pattern
def log_access(event):
    queue.publish('access-log-queue', json.dumps(event))
    # Consumer writes to DB in batches of 500, every 1 second

Audit logs must be append-only. Use INSERT-only table with no UPDATE/DELETE permissions for the application user. Retention policy: archive to cold storage after 90 days, delete after 7 years (compliance).

API Design

# Role management
POST   /roles                          # create role
GET    /roles/{role_id}               # get role details
PUT    /roles/{role_id}/permissions   # set permissions on role

# User role assignment
POST   /users/{user_id}/roles         # assign global role
DELETE /users/{user_id}/roles/{role_id}

# Resource-scoped roles
POST   /resources/{type}/{id}/roles   # assign role on specific resource
GET    /resources/{type}/{id}/roles   # list roles on resource

# The critical check endpoint
POST   /access/check
# Request:  {"user_id": 42, "resource_type": "post", "resource_id": 7, "action": "edit"}
# Response: {"allowed": true, "reason": "role:editor", "cached": true}

The /access/check endpoint should support bulk checks in one request to reduce round trips. Batch up to 100 checks, return an array of results.

Google Zanzibar / ReBAC Pattern

Zanzibar stores relation tuples: (object#relation@user). Examples:

doc:readme#viewer@user:alice
doc:readme#owner@user:bob
folder:eng#viewer@user:alice
doc:readme#parent@folder:eng   # doc is in folder

# Namespace config defines how to expand "viewer" on a doc:
# viewer = owner | editor | (parent->viewer)
# This means: viewer of doc = union of owners, editors,
# and anyone who is viewer of parent folder

The check algorithm is a graph traversal: expand the permission expression recursively, checking tuples until you find one matching the user or exhaust all paths. Zanzibar uses a distributed cache (called “leopard”) and zookies (consistency tokens based on Spanner timestamps) to avoid serving stale results from cache after ACL writes.

Scale Considerations

Read path: access checks are extremely read-heavy. Target p99 < 5ms for cached checks, p99 < 20ms for cache misses. Use Redis cluster with read replicas. Pre-warm cache on login by loading all user roles.

Write path: role assignments are infrequent (admin operations). Write to primary DB, invalidate cache synchronously, audit log asynchronously via queue. Writes can tolerate 100-200ms latency.

Hot users: a superadmin user’s cache entry gets evicted and re-fetched frequently. Consider a local in-process cache (LRU, 1000 entries, 30s TTL) in front of Redis to absorb spikes.

Scale numbers for interviews: 10k RPS of access checks, 99% cache hit rate, 100 cache nodes each handling 10k ops/sec = comfortable headroom. DB only sees 100 RPS on cache misses.

Stripe system design interviews cover authorization and access control. See design patterns for Stripe interview: authorization and permissions system design.

Atlassian products require complex permission systems. See system design questions for Atlassian interview: permissions and access control system design.

LinkedIn system design covers enterprise access control and RBAC. See patterns for LinkedIn interview: RBAC and enterprise access control design.