Multi-Tenancy System Low-Level Design – Tech Interview Dot Org

What is Multi-Tenancy?

Multi-tenancy means a single application instance serves multiple customers (tenants), with each tenant’s data isolated from others. SaaS products like Salesforce, Slack, and Atlassian are multi-tenant. The key design question: how do you isolate tenant data — at the database level, schema level, or row level? Each approach makes different trade-offs between isolation, cost, and operational complexity.

Three Isolation Models

Database-per-tenant: each tenant gets a dedicated database. Strongest isolation, simple queries (no tenant_id filter needed), easy to migrate/backup individual tenants. Con: expensive at scale (1000 tenants = 1000 DBs), complex provisioning, hard to run cross-tenant analytics.

Schema-per-tenant: each tenant gets a dedicated schema within the same database instance (PostgreSQL schemas). Good isolation, shared DB infrastructure. Con: schema migrations must run for every tenant, harder to manage at thousands of tenants.

Shared database, row-level isolation: all tenants share tables; every row has a tenant_id foreign key. Cheapest, easiest to operate. Con: risk of data leakage if tenant_id filters are missed, query performance degrades without proper indexing, noisy neighbor problem (one tenant’s heavy queries affect others).

Row-Level Isolation (Most Common for SaaS)

-- Every table has tenant_id
CREATE TABLE projects (
    project_id UUID PRIMARY KEY,
    tenant_id UUID NOT NULL REFERENCES tenants(tenant_id),
    name VARCHAR,
    created_at TIMESTAMP
);

-- Composite index for tenant queries
CREATE INDEX idx_projects_tenant ON projects(tenant_id, created_at DESC);

-- Row Level Security (PostgreSQL RLS)
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON projects
    USING (tenant_id = current_setting('app.current_tenant_id')::UUID);

PostgreSQL Row Level Security (RLS) enforces isolation at the DB level — even if application code forgets the WHERE tenant_id filter, the policy blocks access. Set the tenant context at query time:

def execute_query(query, tenant_id, params):
    with db.connection() as conn:
        conn.execute("SET LOCAL app.current_tenant_id = %s", (str(tenant_id),))
        return conn.execute(query, params)

Tenant Routing

Identify the tenant from the request. Common methods:

# 1. Subdomain: acme.app.example.com → tenant=acme
tenant = request.host.split('.')[0]

# 2. Custom domain: acme.com → look up in TenantDomain table
tenant = TenantDomain.get(domain=request.host).tenant

# 3. JWT claim: token contains tenant_id
tenant = jwt.decode(token)['tenant_id']

# 4. API key: each key is scoped to a tenant
tenant = ApiKey.get(key=request.headers['X-API-Key']).tenant

Data Model for Tenants

Tenant(tenant_id UUID, name VARCHAR, slug VARCHAR UNIQUE,
       plan ENUM(FREE, STARTER, PROFESSIONAL, ENTERPRISE),
       status ENUM(ACTIVE, SUSPENDED, CANCELLED),
       max_users INT, max_storage_gb INT,
       created_at, billing_customer_id VARCHAR)

TenantDomain(domain VARCHAR PRIMARY KEY, tenant_id UUID)
-- for custom domain support

TenantUser(tenant_id UUID, user_id UUID, role ENUM(OWNER, ADMIN, MEMBER),
           invited_at, joined_at,
           PRIMARY KEY (tenant_id, user_id))

Noisy Neighbor Mitigation

In shared infrastructure, one tenant’s heavy usage degrades others. Mitigations: (1) Per-tenant rate limits on API endpoints. (2) Query timeouts: kill queries running longer than N seconds. (3) Resource quotas: max_storage_gb, max_api_calls_per_day enforced at the application layer. (4) Connection pooling per tenant: PgBouncer pool_mode=transaction with per-tenant limits. (5) For enterprise tenants on paid plans: dedicated DB replicas for read-heavy workloads.

Key Design Decisions

PostgreSQL RLS for row-level isolation — DB-enforced, fails safe if application code omits tenant filter
Composite index (tenant_id, …) on all tenant tables — required for query performance; full table scans without it
Tenant context from request, not from user — users can belong to multiple tenants; the request determines which tenant is active
Plan-based resource limits — prevent noisy neighbor without operational complexity of separate DBs
Slug for subdomain routing — human-readable, URL-safe, must be globally unique across all tenants

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What are the three approaches to multi-tenant data isolation?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”(1) Database-per-tenant: each tenant gets a dedicated database. Strongest isolation, easy backup per tenant, but expensive at scale and complex to provision. (2) Schema-per-tenant: each tenant gets a PostgreSQL schema in a shared instance. Good isolation, shared infrastructure, but schema migrations must run for every tenant. (3) Shared database with row-level isolation: all tenants share tables with a tenant_id column. Cheapest, easiest to operate, but risks data leakage if filters are missed. Most SaaS products use approach 3 with PostgreSQL Row Level Security.”}},{“@type”:”Question”,”name”:”How does PostgreSQL Row Level Security (RLS) enforce tenant isolation?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Enable RLS on each table (ALTER TABLE t ENABLE ROW LEVEL SECURITY), then create a policy: CREATE POLICY tenant_isolation ON t USING (tenant_id = current_setting("app.current_tenant_id")::UUID). At query time, set the tenant context: SET LOCAL app.current_tenant_id = ‘tenant-uuid’. Every query on that table will automatically filter to only the current tenant’s rows — even if the application code omits the WHERE clause. This is a fail-safe: data leakage requires both the app and the DB policy to fail simultaneously.”}},{“@type”:”Question”,”name”:”How do you route requests to the correct tenant?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Common methods: (1) Subdomain routing — acme.app.example.com, extract "acme" from the host header and look up the tenant. (2) Custom domain — acme.com maps to a tenant via a TenantDomain table. (3) JWT claim — the auth token contains tenant_id, validated and extracted by middleware. (4) API key scoping — each API key is associated with a tenant_id. Middleware extracts the tenant from the request and sets it in the request context before any handler runs.”}},{“@type”:”Question”,”name”:”How do you prevent one tenant from degrading performance for others (noisy neighbor)?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Implement per-tenant resource limits: (1) API rate limiting per tenant_id (e.g., 1000 req/min for free tier, 10K for paid). (2) DB query timeouts — kill queries running longer than 30 seconds. (3) Storage quotas enforced at application level. (4) PgBouncer connection limits per tenant — prevents one tenant from consuming all DB connections. (5) For enterprise customers on high-volume plans, provide dedicated DB read replicas to isolate their read workloads.”}},{“@type”:”Question”,”name”:”How do you handle tenant-specific schema customizations?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use a JSON/JSONB column for custom fields rather than adding columns per tenant: ALTER TABLE records ADD COLUMN custom_fields JSONB. Tenants can store arbitrary key-value data here. For indexing custom fields: CREATE INDEX ON records USING GIN(custom_fields) for containment queries. For tenants needing heavy custom field queries, allow them to create expression indexes on specific paths. This avoids schema migrations for each tenant’s customization while maintaining query performance for common cases.”}}]}

Multi-tenant SaaS architecture is a common topic in Atlassian system design interview questions.

Multi-tenancy and tenant isolation design is discussed in Shopify system design interview guide.

Multi-tenant data isolation and resource quotas are covered in Databricks system design interview preparation.