Multi-Tenancy Architecture: Low-Level Design

Multi-tenancy is an architecture where a single application instance serves multiple customers (tenants), with each tenant’s data isolated from others. Designing multi-tenancy correctly is one of the foundational decisions in SaaS architecture — the choice made early is expensive to change later and affects security, scalability, and operational complexity at every layer of the stack.

Tenancy Models

Silo (Database per Tenant)

Each tenant gets a dedicated database instance. Complete isolation: a noisy tenant cannot affect others, a breach of one database does not expose other tenants, and tenant databases can be scaled independently. Downside: operational cost scales linearly with tenant count — 1000 tenants means 1000 database instances, each requiring backups, monitoring, and patching. Best for: enterprise SaaS where tenants pay for dedicated infrastructure, strict compliance requirements (HIPAA, FedRAMP), or tenants with highly variable load patterns.

Pool (Shared Database, Separate Schema)

All tenants share a database server but each gets a separate schema (or schema prefix). Queries stay within a schema; a connection pool routes to the correct schema per request. Isolation is logical rather than physical — a SQL bug could cross schema boundaries, but most ORMs prevent this with schema-level configuration. Better resource utilization than silo: 1000 tenants on one database server. Suitable for mid-market SaaS with moderate per-tenant load.

Pool (Shared Database, Shared Schema)

All tenants share tables, with a tenant_id column in every table. Every query must include WHERE tenant_id = ? — a “forgotten WHERE clause” leaks data across tenants. This is the most resource-efficient model and the most dangerous: data isolation relies entirely on application-level filtering. Mitigate with Row-Level Security (RLS) in PostgreSQL — define a policy that enforces tenant_id filtering at the database level, making it impossible to query without the filter. Best for high-volume SMB SaaS where operational efficiency matters most.

Tenant Identification

How the application knows which tenant a request belongs to: subdomain routing (acme.app.com extracts “acme”); custom domain mapping (acme.com → tenant_id lookup in DNS/config table); JWT claims (tenant_id embedded in access token); path prefix (/org/acme/resource). Subdomain is most common for web SaaS. Store the mapping in a tenant registry service: domain → tenant_id, with caching (Redis) to avoid a DB lookup on every request.

Data Isolation Enforcement

For shared-schema tenancy, enforce isolation at multiple layers: (1) PostgreSQL RLS policy: CREATE POLICY tenant_isolation ON orders USING (tenant_id = current_setting(‘app.tenant_id’)::uuid); SET LOCAL app.tenant_id = ? at connection time. (2) ORM-level tenant scoping: all queries go through a base repository that injects WHERE tenant_id = current_tenant. (3) Integration tests: verify that a request authenticated as tenant A cannot access tenant B’s data by attempting cross-tenant fetches and asserting 404/403.

Noisy Neighbor Problem

In shared models, one tenant generating heavy load degrades performance for all others. Mitigate with: per-tenant rate limiting (token bucket per tenant_id at the API gateway), per-tenant query timeouts (SET statement_timeout per connection), connection pool limits per tenant, and workload isolation (move heavy analytics queries to read replicas). For severely noisy tenants, offer a migration to the silo model — enterprise pricing that covers dedicated infrastructure.

Schema Migrations

Shared-schema migrations are straightforward — one migration runs once. Separate-schema migrations must run for every tenant schema — 1000 tenants means 1000 ALTER TABLE statements. Run with parallelism but with circuit breakers: if migration fails on tenant 50, stop and investigate before proceeding. Use a migration tracking table per tenant schema to enable idempotent reruns. Separate-schema migrations are the main operational pain of the pool-with-schema model.

Tenant Onboarding and Offboarding

Onboarding: provision tenant record, run schema migrations (for separate-schema), configure DNS (for subdomain model), initialize default data. Automate fully — manual onboarding at 1000 tenants/month is not viable. Offboarding: export tenant data (data portability, required by some regulations), archive or delete all rows (GDPR erasure), deprovision DNS and infrastructure. Model onboarding/offboarding as a state machine: pending → provisioning → active → suspended → terminated. Each state transition is logged for audit.

{ “@context”: “https://schema.org”, “@type”: “FAQPage”, “mainEntity”: [ { “@type”: “Question”, “name”: “What are the three main multi-tenancy models and how do they compare?”, “acceptedAnswer”: {“@type”: “Answer”, “text”: “Silo (database per tenant): complete isolation, highest cost — each tenant has a dedicated database instance. Best for enterprise SaaS with compliance requirements (HIPAA, FedRAMP) or highly variable per-tenant load. Pool with separate schema: shared database server, one schema per tenant — better resource utilization, logical isolation. Schema migrations must run per-tenant schema. Pool with shared schema: all tenants in shared tables with a tenant_id column — most efficient, least isolated. Every query must filter by tenant_id; enforce with PostgreSQL Row-Level Security to prevent data leakage when the application forgets the filter. The choice is often a spectrum: start shared-schema, offer silo for enterprise customers.”} }, { “@type”: “Question”, “name”: “How does PostgreSQL Row-Level Security enforce tenant data isolation?”, “acceptedAnswer”: {“@type”: “Answer”, “text”: “PostgreSQL RLS enforces tenant isolation at the database level, making it impossible to query a table without the tenant filter even if the application omits it. Implementation: (1) ALTER TABLE orders ENABLE ROW LEVEL SECURITY; (2) CREATE POLICY tenant_isolation ON orders USING (tenant_id = current_setting(‘app.tenant_id’)::uuid); (3) At the start of each database session, SET LOCAL app.tenant_id = ‘tenant-uuid’. The policy is enforced by PostgreSQL for every SELECT, INSERT, UPDATE, DELETE — bypassing it requires a superuser connection. This provides defense-in-depth: even if application code has a bug that omits the tenant filter, the database enforces isolation.”} }, { “@type”: “Question”, “name”: “What is the noisy neighbor problem in multi-tenancy and how do you mitigate it?”, “acceptedAnswer”: {“@type”: “Answer”, “text”: “In shared-resource tenancy models, a tenant generating heavy load (running large reports, bulk imports, traffic spikes) degrades performance for all other tenants sharing the same infrastructure. Mitigation: per-tenant rate limiting at the API gateway (token bucket per tenant_id); per-tenant database query timeouts (SET statement_timeout = ‘5s’ per connection); connection pool limits per tenant (prevent one tenant from exhausting the pool); and routing heavy analytics queries to read replicas. For chronic noisy tenants, offer migration to the silo model at enterprise pricing — they pay for the dedicated infrastructure their load requires, which also aligns incentives.”} }, { “@type”: “Question”, “name”: “How do you run schema migrations safely in a separate-schema multi-tenant system?”, “acceptedAnswer”: {“@type”: “Answer”, “text”: “In separate-schema multi-tenancy, each schema must receive every migration independently. With 1000 tenants, a single ALTER TABLE runs 1000 times. Safe approach: maintain a migration_log table in each schema tracking applied migrations (version, applied_at). The migration runner iterates schemas with parallelism (e.g., 10 concurrent), applies the migration to each, and records completion. Add a circuit breaker: if migration fails on more than N tenants (e.g., 5), stop and alert rather than continuing — a systematic failure should block forward progress. Use non-locking DDL where available (PostgreSQL concurrent index creation, MySQL online DDL) to avoid locking tenant schemas during migrations.”} } ] }