System Design Interview: Design a Multi-Tenant SaaS Platform

What Is a Multi-Tenant SaaS Platform?

A multi-tenant SaaS platform serves multiple customers (tenants) from a single shared infrastructure. Each tenant’s data must be isolated from others, while the platform shares compute, storage, and operational overhead. Salesforce, Slack, Zendesk, and GitHub Enterprise all use multi-tenant architectures. The core engineering challenges are: data isolation and security, tenant-specific customization, fair resource allocation (preventing noisy neighbors), and per-tenant billing.

  • Airbnb Interview Guide
  • Databricks Interview Guide
  • Cloudflare Interview Guide
  • Atlassian Interview Guide
  • Stripe Interview Guide
  • Shopify Interview Guide
  • Tenancy Models

    Model 1: Silo (Database-per-Tenant)

    Each tenant gets a dedicated database (or schema). Complete data isolation — no SQL cross-tenant leakage possible.

    • Pros: strongest isolation, easy compliance (GDPR delete is drop one database), independent scaling per tenant, no noisy-neighbor on database I/O
    • Cons: operational overhead scales with tenant count (1000 tenants = 1000 databases to manage, migrate, monitor), inefficient for small tenants
    • Best for: enterprise customers with strict compliance, large tenants justifying dedicated resources

    Model 2: Shared Database, Separate Schemas

    One database, one schema per tenant. Tenant data is logically separated at the schema level.

    • Pros: easier to manage than separate databases, still good isolation (schema-level permissions), migrations run per-schema
    • Cons: schema proliferation (PostgreSQL handles thousands of schemas, but tooling complexity grows), still per-tenant migration overhead

    Model 3: Shared Database, Shared Schema (Row-Level)

    All tenants in the same table, distinguished by a tenant_id column. Most resource-efficient.

    • Pros: minimal overhead, easy to add new tenants, efficient for small tenants
    • Cons: tenant_id must be on every table and every query (easy to forget — potential data leak), harder compliance story, noisy-neighbor on table I/O
    • Best for: SMB SaaS with many small tenants, where isolation requirements are lower

    Hybrid Approach (Real-World Standard)

    Tier customers by size: enterprise customers get dedicated databases (silo); SMB customers share a database with row-level isolation. Routing layer maps tenant_id to the correct connection pool.

    Tenant Routing Layer

    Every request carries a tenant identifier (subdomain, JWT claim, API key prefix). The routing layer maps tenant_id → database connection pool.

    // Tenant context stored in request-scoped context
    tenant_id = extract_from_jwt(request.headers.authorization)
    db_pool = tenant_router.get_pool(tenant_id)
    // All database queries in this request use db_pool
    

    Store the routing table in Redis (tenant_id → {db_host, schema, pool_config}) with short TTL (5 minutes) for fast lookups. The routing table is updated when new tenants are provisioned or migrated.

    Data Isolation Enforcement

    For shared-schema tenancy, every query must include a WHERE tenant_id = ? clause. Relying on developers to remember is fragile — use framework-level enforcement:

    • Row-Level Security (PostgreSQL RLS): define a policy at the database level: CREATE POLICY tenant_isolation ON users USING (tenant_id = current_setting(‘app.tenant_id’)). Set the setting at connection start. The database enforces isolation regardless of application code. Even a buggy query that forgets the WHERE clause is safe.
    • ORM scoping: in frameworks like Rails, use default_scope to automatically add tenant_id conditions to all queries for a tenant-aware model.
    • Middleware injection: request middleware sets the tenant context; a database interceptor adds WHERE tenant_id = ? to all queries automatically.

    Tenant Provisioning

    When a new tenant signs up:

    1. Create tenant record in the global tenants table
    2. Assign to a database pool (silo: provision new database; shared: create new schema or insert into shared pool)
    3. Run database migrations for the new tenant’s schema
    4. Set up default configuration (branding, feature flags, limits)
    5. Send welcome email and activate account

    Automate with a provisioning service that orchestrates these steps. Target: new tenant fully active within 30 seconds of sign-up.

    Schema Migrations at Scale

    Running database migrations across 10K tenant schemas simultaneously would cause a thundering herd. Strategies:

    • Rolling migrations: apply to a batch of tenants per hour (e.g., 100 tenants/batch). Complete within the migration window without overloading the database.
    • Expand-contract pattern: add new columns as nullable first; deploy code to write both old and new; backfill; then make column not-null. Never break existing tenants during migration.
    • Migration service: dedicated service tracks migration state per tenant (tenant_id, migration_version, status). Provides visibility and retry capability.

    Resource Quotas and Noisy Neighbor

    • Per-tenant limits: API rate limiting (tokens per minute), storage quota, max concurrent connections, max query execution time
    • Enforce in the API gateway (rate limiter per tenant_id) and at the database level (statement timeout, connection pool size per tenant)
    • Detect noisy tenants: monitor p99 query latency per tenant. If one tenant’s queries are slow and consuming disproportionate DB CPU, throttle their connection pool or move them to a dedicated shard

    Per-Tenant Customization

    • Feature flags: per-tenant feature flag table. The feature flag service checks if a feature is enabled for the requesting tenant. Enterprise plans get advanced features; SMB gets standard set.
    • Branding: tenant-specific logos, colors, domain names. Store in a tenant_config table; serve from CDN with per-tenant cache keys.
    • Custom workflows: webhook endpoints where tenants receive events and can trigger their own logic. Zendesk triggers, Salesforce flows.

    Interview Tips

    • Lead with the three tenancy models and their trade-offs — this is the architecture decision that drives everything else.
    • PostgreSQL Row-Level Security is the strongest data isolation mechanism for shared-schema — mentioning it impresses interviewers.
    • Rolling migrations at scale is a practical concern many candidates miss — it shows production awareness.
    • The hybrid model (silo for enterprise, shared for SMB) is the real-world answer — pure silo or pure shared is usually wrong for a general-purpose SaaS.
    Scroll to Top