Access Control System Low-Level Design

Core Models: ACL, RBAC, ABAC, ReBAC

Access control models define how permissions are assigned and evaluated. Each has different tradeoffs in expressiveness, scalability, and operational complexity.

ACL (Access Control List): each resource stores a list of (principal, permission) pairs. Simple, but doesn’t scale – updating a user’s access requires touching every resource they have access to. Suitable for file systems with small user counts (Unix permissions).

RBAC (Role-Based Access Control): users are assigned roles, roles have permissions. Adding a user to a role grants all that role’s permissions. Most enterprise systems use RBAC. Tradeoff: role explosion when you need fine-grained resource-level control (“editor of project X” vs “editor of project Y”).

ABAC (Attribute-Based Access Control): policies are boolean expressions over attributes of the user, resource, and environment. Example: allow if user.department == resource.department AND user.clearance >= resource.classification AND time.hour in [9, 17]. Very expressive but policies become hard to audit and debug.

ReBAC (Relationship-Based Access Control): permissions are derived from the graph of relationships between entities. “User can view document if user is member of a group that has viewer on the document’s parent folder.” Google Zanzibar is the canonical implementation. Handles complex inheritance naturally.

Database Schema for RBAC

-- Core entities
CREATE TABLE users (
    id          BIGINT PRIMARY KEY,
    email       VARCHAR(255) UNIQUE NOT NULL,
    created_at  TIMESTAMP DEFAULT NOW()
);

CREATE TABLE roles (
    id          BIGINT PRIMARY KEY,
    name        VARCHAR(100) UNIQUE NOT NULL,  -- 'admin', 'editor', 'viewer'
    description TEXT
);

CREATE TABLE permissions (
    id          BIGINT PRIMARY KEY,
    resource    VARCHAR(100) NOT NULL,  -- 'post', 'user', 'billing'
    action      VARCHAR(50)  NOT NULL,  -- 'create', 'read', 'update', 'delete'
    UNIQUE(resource, action)
);

-- Junction tables
CREATE TABLE user_roles (
    user_id     BIGINT REFERENCES users(id) ON DELETE CASCADE,
    role_id     BIGINT REFERENCES roles(id) ON DELETE CASCADE,
    granted_by  BIGINT REFERENCES users(id),
    granted_at  TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (user_id, role_id)
);

CREATE TABLE role_permissions (
    role_id        BIGINT REFERENCES roles(id) ON DELETE CASCADE,
    permission_id  BIGINT REFERENCES permissions(id) ON DELETE CASCADE,
    PRIMARY KEY (role_id, permission_id)
);

-- Resource-scoped roles (editor of specific project)
CREATE TABLE user_resource_roles (
    user_id       BIGINT REFERENCES users(id) ON DELETE CASCADE,
    role_id       BIGINT REFERENCES roles(id) ON DELETE CASCADE,
    resource_type VARCHAR(100) NOT NULL,
    resource_id   BIGINT NOT NULL,
    granted_at    TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (user_id, role_id, resource_type, resource_id)
);

CREATE INDEX idx_user_resource_roles_lookup
    ON user_resource_roles(user_id, resource_type, resource_id);

Access Check Algorithm

def can_access(user_id, resource_type, resource_id, action):
    # Check global roles first
    global_perms = db.query("""
        SELECT 1 FROM user_roles ur
        JOIN role_permissions rp ON ur.role_id = rp.role_id
        JOIN permissions p ON rp.permission_id = p.id
        WHERE ur.user_id = %s
          AND p.resource = %s
          AND p.action = %s
        LIMIT 1
    """, [user_id, resource_type, action])

    if global_perms:
        return True

    # Check resource-scoped roles
    scoped_perms = db.query("""
        SELECT 1 FROM user_resource_roles urr
        JOIN role_permissions rp ON urr.role_id = rp.role_id
        JOIN permissions p ON rp.permission_id = p.id
        WHERE urr.user_id = %s
          AND urr.resource_type = %s
          AND urr.resource_id = %s
          AND p.action = %s
        LIMIT 1
    """, [user_id, resource_type, resource_id, action])

    return bool(scoped_perms)

Caching Strategy

Access checks are on the hot path. A naive implementation hits the database on every request.

Cache key structure: acl:{user_id}:{resource_type}:{resource_id}:{action} -> boolean, TTL 5 minutes.

For bulk prefetch (page load checks 20+ permissions): cache acl:user:{user_id}:roles as the set of (role_id, resource_type, resource_id) tuples, TTL 5 minutes. Derive individual permission checks locally from this set.

Invalidation on role change: when a user’s roles change, delete acl:user:{user_id}:* from Redis. Use Redis SCAN with pattern rather than KEYS to avoid blocking. Alternatively, version the cache: store a version counter per user, include it in cache key.

Negative caching: cache denied results too. Without negative caching, a user probing non-existent resources can flood the database. Use the same TTL.

Consistency tradeoff: 5-minute TTL means a revoked permission can still be exercised for up to 5 minutes. For sensitive actions (billing, delete), bypass cache or use a shorter TTL (30s). Document this explicitly in your design.

Permission Inheritance

Three models for hierarchical resources (folders containing files):

Explicit-only: permissions are not inherited. Simple, auditable, but requires setting permissions on every resource. Suitable for flat structures.

Downward inheritance: permissions flow from parent to children. If you have “edit” on a folder, you have “edit” on all files inside. Recursive check: walk up the resource tree until you find an ACL entry or reach the root.

Override model (Google Drive behavior): inherited permissions can be overridden at any level. A folder shared with “anyone can view” can contain a subfolder that is “only owner.” Implementation: store an explicit “deny” entry, or store a break-inheritance flag that stops traversal.

For deeply nested hierarchies, precompute effective permissions using a materialized permission table, updated asynchronously when ACLs change.

Audit Logging

CREATE TABLE access_logs (
    id            BIGINT PRIMARY KEY,  -- use sequence, not UUID for clustering
    user_id       BIGINT NOT NULL,
    resource_type VARCHAR(100),
    resource_id   BIGINT,
    action        VARCHAR(50),
    decision      BOOLEAN NOT NULL,    -- true=allow, false=deny
    reason        VARCHAR(255),        -- 'role:admin', 'resource_role:editor:42'
    ip_address    INET,
    user_agent    TEXT,
    created_at    TIMESTAMP NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);  -- monthly partitions

-- Async write pattern
def log_access(event):
    queue.publish('access-log-queue', json.dumps(event))
    # Consumer writes to DB in batches of 500, every 1 second

Audit logs must be append-only. Use INSERT-only table with no UPDATE/DELETE permissions for the application user. Retention policy: archive to cold storage after 90 days, delete after 7 years (compliance).

API Design

# Role management
POST   /roles                          # create role
GET    /roles/{role_id}               # get role details
PUT    /roles/{role_id}/permissions   # set permissions on role

# User role assignment
POST   /users/{user_id}/roles         # assign global role
DELETE /users/{user_id}/roles/{role_id}

# Resource-scoped roles
POST   /resources/{type}/{id}/roles   # assign role on specific resource
GET    /resources/{type}/{id}/roles   # list roles on resource

# The critical check endpoint
POST   /access/check
# Request:  {"user_id": 42, "resource_type": "post", "resource_id": 7, "action": "edit"}
# Response: {"allowed": true, "reason": "role:editor", "cached": true}

The /access/check endpoint should support bulk checks in one request to reduce round trips. Batch up to 100 checks, return an array of results.

Google Zanzibar / ReBAC Pattern

Zanzibar stores relation tuples: (object#relation@user). Examples:

doc:readme#viewer@user:alice
doc:readme#owner@user:bob
folder:eng#viewer@user:alice
doc:readme#parent@folder:eng   # doc is in folder

# Namespace config defines how to expand "viewer" on a doc:
# viewer = owner | editor | (parent->viewer)
# This means: viewer of doc = union of owners, editors,
# and anyone who is viewer of parent folder

The check algorithm is a graph traversal: expand the permission expression recursively, checking tuples until you find one matching the user or exhaust all paths. Zanzibar uses a distributed cache (called “leopard”) and zookies (consistency tokens based on Spanner timestamps) to avoid serving stale results from cache after ACL writes.

Scale Considerations

Read path: access checks are extremely read-heavy. Target p99 < 5ms for cached checks, p99 < 20ms for cache misses. Use Redis cluster with read replicas. Pre-warm cache on login by loading all user roles.

Write path: role assignments are infrequent (admin operations). Write to primary DB, invalidate cache synchronously, audit log asynchronously via queue. Writes can tolerate 100-200ms latency.

Hot users: a superadmin user’s cache entry gets evicted and re-fetched frequently. Consider a local in-process cache (LRU, 1000 entries, 30s TTL) in front of Redis to absorb spikes.

Scale numbers for interviews: 10k RPS of access checks, 99% cache hit rate, 100 cache nodes each handling 10k ops/sec = comfortable headroom. DB only sees 100 RPS on cache misses.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is the difference between RBAC, ABAC, and ReBAC?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”RBAC (Role-Based): users get roles, roles get permissions. Access check = does user have a role with the required permission? Simple, auditable, most common for B2B SaaS. ABAC (Attribute-Based): permissions are policies evaluated against subject attributes, resource attributes, and environment. Example: "user.department == document.department AND time == business_hours". Flexible but complex and slower to evaluate. ReBAC (Relationship-Based): permissions derived from graph relationships between subjects and resources. "User can view doc if user is a member of the group that owns the folder containing the doc." Used by Google Zanzibar, Notion, Carta. Best for hierarchical or social-graph resources.”}},{“@type”:”Question”,”name”:”How do you cache permission checks efficiently?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Cache each user's effective permissions as a blob in Redis: key = perms:{user_id}, value = JSON list of {resource_type, action, resource_id} tuples. TTL = 5 minutes. Cache is built lazily on first access check and reused for all subsequent checks. Invalidation: when a role is granted or revoked, immediately DEL perms:{affected_user_id} from Redis. For bulk role changes (adding a permission to a role that many users have), query all users with that role and pipeline-DEL all their cache keys. Cache DENY results too with shorter TTL (60 seconds) to prevent DB hammering for unauthorized users.”}},{“@type”:”Question”,”name”:”How does permission inheritance work in hierarchical resources?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Three common models: Explicit-only – every permission assignment is explicit, nothing is inherited. Simple and auditable but verbose. Downward inheritance – permissions on a parent resource propagate to all children. If a user has "read" on a project, they can read all documents in it. Requires walking up the resource hierarchy on access checks (or pre-computing inherited permissions and caching). Override model – inherited by default, but child resources can have explicit permissions overriding the parent. Used by Google Drive: a file can be shared with specific people even if it lives in a private folder. Implementation: check explicit permissions first, then fall back to inherited from parent.”}},{“@type”:”Question”,”name”:”How do you prevent authorization bypass in an access control system?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Four principles: (1) Deny by default – if no matching permission is found, deny. Never allow unless explicitly permitted. (2) Check at every layer – do not rely on the UI to hide unauthorized actions; enforce in the API handler and in critical business logic. (3) Use the resource ID from the authenticated context, not from user input, when scoping access checks – prevents IDOR attacks. (4) Log all access decisions including denials – audit logs reveal probing attacks. For token-based systems: validate the token is not revoked and the permissions claim matches the database (do not trust JWT claims alone for high-sensitivity operations).”}},{“@type”:”Question”,”name”:”What is Google Zanzibar and how does it work?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Zanzibar is Google's global authorization system serving 10+ products (Drive, YouTube, Calendar). Core model: relation tuples (object, relation, user) – e.g., (doc:readme, owner, user:alice) or (folder:home, parent, doc:readme). Access check: evaluate a "check" expression by traversing the relation graph – can user:alice perform the "viewer" action on doc:readme? The system looks for any path from alice to the doc via the viewer relation, including through group membership and folder hierarchies. Key design: consistent snapshots called "zookies" prevent TOCTOU races, and the system achieves single-digit millisecond p99 latency at global scale through aggressive caching of intermediate results.”}}]}

Stripe system design interviews cover authorization and access control. See design patterns for Stripe interview: authorization and permissions system design.

Atlassian products require complex permission systems. See system design questions for Atlassian interview: permissions and access control system design.

LinkedIn system design covers enterprise access control and RBAC. See patterns for LinkedIn interview: RBAC and enterprise access control design.