What Is a Multi-Tenant SaaS Platform?
A multi-tenant SaaS platform serves multiple customers (tenants) from a single shared infrastructure. Each tenant’s data must be isolated from others, while the platform shares compute, storage, and operational overhead. Salesforce, Slack, Zendesk, and GitHub Enterprise all use multi-tenant architectures. The core engineering challenges are: data isolation and security, tenant-specific customization, fair resource allocation (preventing noisy neighbors), and per-tenant billing.
Tenancy Models
Model 1: Silo (Database-per-Tenant)
Each tenant gets a dedicated database (or schema). Complete data isolation — no SQL cross-tenant leakage possible.
- Pros: strongest isolation, easy compliance (GDPR delete is drop one database), independent scaling per tenant, no noisy-neighbor on database I/O
- Cons: operational overhead scales with tenant count (1000 tenants = 1000 databases to manage, migrate, monitor), inefficient for small tenants
- Best for: enterprise customers with strict compliance, large tenants justifying dedicated resources
Model 2: Shared Database, Separate Schemas
One database, one schema per tenant. Tenant data is logically separated at the schema level.
- Pros: easier to manage than separate databases, still good isolation (schema-level permissions), migrations run per-schema
- Cons: schema proliferation (PostgreSQL handles thousands of schemas, but tooling complexity grows), still per-tenant migration overhead
Model 3: Shared Database, Shared Schema (Row-Level)
All tenants in the same table, distinguished by a tenant_id column. Most resource-efficient.
- Pros: minimal overhead, easy to add new tenants, efficient for small tenants
- Cons: tenant_id must be on every table and every query (easy to forget — potential data leak), harder compliance story, noisy-neighbor on table I/O
- Best for: SMB SaaS with many small tenants, where isolation requirements are lower
Hybrid Approach (Real-World Standard)
Tier customers by size: enterprise customers get dedicated databases (silo); SMB customers share a database with row-level isolation. Routing layer maps tenant_id to the correct connection pool.
Tenant Routing Layer
Every request carries a tenant identifier (subdomain, JWT claim, API key prefix). The routing layer maps tenant_id → database connection pool.
// Tenant context stored in request-scoped context
tenant_id = extract_from_jwt(request.headers.authorization)
db_pool = tenant_router.get_pool(tenant_id)
// All database queries in this request use db_pool
Store the routing table in Redis (tenant_id → {db_host, schema, pool_config}) with short TTL (5 minutes) for fast lookups. The routing table is updated when new tenants are provisioned or migrated.
Data Isolation Enforcement
For shared-schema tenancy, every query must include a WHERE tenant_id = ? clause. Relying on developers to remember is fragile — use framework-level enforcement:
- Row-Level Security (PostgreSQL RLS): define a policy at the database level: CREATE POLICY tenant_isolation ON users USING (tenant_id = current_setting(‘app.tenant_id’)). Set the setting at connection start. The database enforces isolation regardless of application code. Even a buggy query that forgets the WHERE clause is safe.
- ORM scoping: in frameworks like Rails, use default_scope to automatically add tenant_id conditions to all queries for a tenant-aware model.
- Middleware injection: request middleware sets the tenant context; a database interceptor adds WHERE tenant_id = ? to all queries automatically.
Tenant Provisioning
When a new tenant signs up:
- Create tenant record in the global tenants table
- Assign to a database pool (silo: provision new database; shared: create new schema or insert into shared pool)
- Run database migrations for the new tenant’s schema
- Set up default configuration (branding, feature flags, limits)
- Send welcome email and activate account
Automate with a provisioning service that orchestrates these steps. Target: new tenant fully active within 30 seconds of sign-up.
Schema Migrations at Scale
Running database migrations across 10K tenant schemas simultaneously would cause a thundering herd. Strategies:
- Rolling migrations: apply to a batch of tenants per hour (e.g., 100 tenants/batch). Complete within the migration window without overloading the database.
- Expand-contract pattern: add new columns as nullable first; deploy code to write both old and new; backfill; then make column not-null. Never break existing tenants during migration.
- Migration service: dedicated service tracks migration state per tenant (tenant_id, migration_version, status). Provides visibility and retry capability.
Resource Quotas and Noisy Neighbor
- Per-tenant limits: API rate limiting (tokens per minute), storage quota, max concurrent connections, max query execution time
- Enforce in the API gateway (rate limiter per tenant_id) and at the database level (statement timeout, connection pool size per tenant)
- Detect noisy tenants: monitor p99 query latency per tenant. If one tenant’s queries are slow and consuming disproportionate DB CPU, throttle their connection pool or move them to a dedicated shard
Per-Tenant Customization
- Feature flags: per-tenant feature flag table. The feature flag service checks if a feature is enabled for the requesting tenant. Enterprise plans get advanced features; SMB gets standard set.
- Branding: tenant-specific logos, colors, domain names. Store in a tenant_config table; serve from CDN with per-tenant cache keys.
- Custom workflows: webhook endpoints where tenants receive events and can trigger their own logic. Zendesk triggers, Salesforce flows.
Interview Tips
- Lead with the three tenancy models and their trade-offs — this is the architecture decision that drives everything else.
- PostgreSQL Row-Level Security is the strongest data isolation mechanism for shared-schema — mentioning it impresses interviewers.
- Rolling migrations at scale is a practical concern many candidates miss — it shows production awareness.
- The hybrid model (silo for enterprise, shared for SMB) is the real-world answer — pure silo or pure shared is usually wrong for a general-purpose SaaS.