Tenant Isolation System Low-Level Design: Row-Level Security, Connection Pool Limits, and Noisy Neighbor Prevention

Tenant Isolation System: Low-Level Design

A multi-tenant SaaS application serves many organizations (tenants) from a shared infrastructure. Tenant isolation ensures that one tenant cannot access another tenant’s data, and that one tenant’s heavy usage does not degrade performance for others. This design covers the three isolation models (shared schema, separate schema, separate database), row-level security for shared schema, connection pool isolation, and rate limiting per tenant.

Isolation Model Comparison

"""
Model 1: Shared Schema (all tenants in same tables)
  - Every table has a tenant_id column
  - Lowest cost: single database, single schema
  - Risk: missing WHERE tenant_id=X in one query leaks all tenants' data
  - Mitigation: Postgres Row-Level Security (RLS) enforces tenant_id at DB level
  - Best for: SMB SaaS, cost-sensitive products, homogeneous tenant needs

Model 2: Separate Schema (one schema per tenant, same database)
  - search_path set per connection: SET search_path = tenant_abc, public
  - Each tenant has their own table instances (orders, users, etc.)
  - Moderate isolation: still same Postgres process
  - Supports tenant-specific schema customizations
  - Best for: mid-market SaaS, custom fields per tenant

Model 3: Separate Database (one DB per tenant)
  - Maximum isolation: separate connection pool, separate WAL, separate backups
  - Highest cost: N databases to operate, patch, and back up
  - Required for: compliance (HIPAA, FedRAMP), enterprise contracts with data residency
  - Best for: enterprise/regulated customers
"""

Row-Level Security (Shared Schema Model)

-- Enable RLS on every tenant-scoped table
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
ALTER TABLE orders FORCE ROW LEVEL SECURITY;  -- applies to table owners too

-- Policy: a session can only see rows matching its tenant_id
CREATE POLICY tenant_isolation ON orders
    USING (tenant_id = current_setting('app.tenant_id')::BIGINT);

-- Same pattern for every tenant-scoped table
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON users
    USING (tenant_id = current_setting('app.tenant_id')::BIGINT);

-- Application sets tenant context at connection checkout:
-- SET LOCAL app.tenant_id = '42';  (transaction-scoped)
-- All subsequent queries on this connection only see tenant 42's rows.

Application-Level Tenant Context

from contextlib import contextmanager
import threading

_local = threading.local()

@contextmanager
def tenant_context(tenant_id: int):
    """
    Set the active tenant for all DB operations in this scope.
    Passes tenant_id to Postgres via SET LOCAL so RLS policies apply.
    """
    old = getattr(_local, 'tenant_id', None)
    _local.tenant_id = tenant_id
    conn = db_pool.getconn()
    try:
        with conn.cursor() as cur:
            cur.execute("SET LOCAL app.tenant_id = %s", (str(tenant_id),))
        yield conn
    finally:
        conn.commit()
        db_pool.putconn(conn)
        _local.tenant_id = old

# Usage in a request handler:
# @app.route('/api/orders')
# def list_orders():
#     with tenant_context(request.tenant_id) as conn:
#         orders = conn.fetchall("SELECT * FROM orders ORDER BY created_at DESC LIMIT 20")
#     return jsonify(orders)

Per-Tenant Rate Limiting

import redis, time

r = redis.Redis(decode_responses=True)

TENANT_RATE_LIMITS = {
    'free':       {'requests_per_minute': 60,   'burst': 10},
    'starter':    {'requests_per_minute': 600,  'burst': 100},
    'business':   {'requests_per_minute': 3000, 'burst': 500},
    'enterprise': {'requests_per_minute': 30000,'burst': 5000},
}

def check_rate_limit(tenant_id: int, plan: str) -> dict:
    """
    Token bucket rate limiter per tenant.
    Returns: {'allowed': bool, 'remaining': int, 'reset_at': float}
    """
    limits = TENANT_RATE_LIMITS.get(plan, TENANT_RATE_LIMITS['free'])
    rpm = limits['requests_per_minute']
    burst = limits['burst']

    now = time.time()
    window_key = f"ratelimit:{tenant_id}:{int(now // 60)}"  # 1-minute fixed window

    current = r.incr(window_key)
    if current == 1:
        r.expire(window_key, 120)  # 2-minute TTL for safety

    allowed = current  dict:
    """
    Aggregate resource usage for a tenant across dimensions.
    Used for billing and quota enforcement.
    """
    return db.fetchone("""
        SELECT
            (SELECT COUNT(*) FROM users WHERE tenant_id=%s) AS user_count,
            (SELECT COUNT(*) FROM orders WHERE tenant_id=%s) AS order_count,
            (SELECT COALESCE(SUM(size_bytes),0) FROM Document WHERE tenant_id=%s) AS storage_bytes,
            (SELECT api_call_count FROM TenantUsage WHERE tenant_id=%s AND period=date_trunc('month',NOW()))
                AS api_calls_this_month
    """, (tenant_id, tenant_id, tenant_id, tenant_id))

Connection Pool Isolation for Noisy Neighbor Prevention

"""
Problem: tenant A runs 50 concurrent slow analytical queries, exhausting the connection pool
and starving tenants B, C, D of database connections.

Solution: per-tier connection pool limits.
"""

import psycopg2.pool

POOL_ALLOCATIONS = {
    'free':       2,   # max 2 concurrent DB connections
    'starter':    5,
    'business':   20,
    'enterprise': 50,
}

_tenant_pools = {}  # tenant_id -> psycopg2.pool.ThreadedConnectionPool
_global_pool = psycopg2.pool.ThreadedConnectionPool(
    minconn=10, maxconn=200, host='db-primary', dbname='appdb'
)

def get_tenant_conn(tenant_id: int, plan: str):
    """
    Return a connection for the tenant, respecting their pool allocation.
    A tenant on the 'free' plan can hold at most 2 connections simultaneously.
    """
    max_conns = POOL_ALLOCATIONS.get(plan, 2)
    if tenant_id not in _tenant_pools:
        _tenant_pools[tenant_id] = psycopg2.pool.ThreadedConnectionPool(
            minconn=1, maxconn=max_conns,
            host='db-primary', dbname='appdb'
        )
    try:
        conn = _tenant_pools[tenant_id].getconn()
    except psycopg2.pool.PoolError:
        raise TenantResourceExhaustedError(
            f"Connection limit reached for tenant {tenant_id} on plan {plan}"
        )
    # Set tenant context for RLS
    with conn.cursor() as cur:
        cur.execute("SET LOCAL app.tenant_id = %s", (str(tenant_id),))
    return conn

Key Design Decisions

  • RLS as last-resort safety net, not primary isolation: application code should always include WHERE tenant_id=%s in queries. RLS is a defense-in-depth layer that catches bugs (missing tenant filter) before they become data breaches. RLS policies have a small performance cost (~5% query overhead) from the additional predicate. Use EXPLAIN ANALYZE to verify the RLS predicate is being pushed down to index scans, not applied as a post-filter on full table scans.
  • SET LOCAL vs SET: SET LOCAL applies the tenant_id setting only for the duration of the current transaction. When the connection is returned to the pool and the transaction ends, the setting is reset. SET (without LOCAL) persists for the connection session — dangerous in a connection pool where connections are reused across tenants.
  • Per-tier connection pool limits prevent noisy neighbor: a free-tier tenant running 50 concurrent queries cannot exhaust the global pool because their pool is capped at 2 connections. This enforces the implicit SLA of different plan tiers without requiring application-level query throttling.
  • Separate schema for enterprise tenants: for customers requiring data residency (EU data must stay in EU) or strict compliance boundaries, a separate Postgres schema (or database) allows independent backups, separate encryption keys, and schema customizations without forking the application code. The application selects the schema at connection time by setting search_path.

Tenant isolation and multi-tenant SaaS design is discussed in Stripe system design interview questions.

Tenant isolation and workspace isolation design is covered in Atlassian system design interview preparation.

Tenant isolation and merchant data isolation design is discussed in Shopify system design interview guide.

Scroll to Top