Multi-Tenant Service Low-Level Design: Tenant Isolation Strategies, Data Partitioning, and Noisy Neighbor Prevention

Multi-Tenant Service: Overview and Requirements

A multi-tenant service hosts multiple customers (tenants) on shared infrastructure while providing strong isolation guarantees so that one tenant cannot access another tenants data and resource contention from one tenant does not degrade service for others. Isolation strategy, data partitioning, and noisy neighbor prevention are the three axes that interviewers probe in depth.

Functional Requirements

Support thousands of tenants on a single deployment with per-tenant configuration.
Enforce strict data isolation: no cross-tenant data leakage at any layer.
Apply per-tenant resource quotas: requests per second, storage bytes, CPU time.
Provide tenant-level usage reporting for billing and capacity planning.
Allow tenant onboarding and offboarding without downtime for other tenants.

Non-Functional Requirements

P99 latency within 20% of single-tenant baseline for any given tenant during peak load.
Zero cross-tenant data leakage — verified by automated isolation tests in CI.
Tenant provisioning completed within 30 seconds of an onboarding request.

Isolation Strategies

Schema-Per-Tenant

Each tenant gets a dedicated database schema (PostgreSQL) or keyspace (Cassandra). The application resolves the correct schema at request time using the tenant identifier extracted from the JWT or API key. This provides strong isolation without the cost of separate database instances and allows schema migration per tenant on different rollout schedules.

Row-Level Security

For tenants sharing a single schema (cost-optimized tiers), use PostgreSQL Row-Level Security (RLS) policies. Each table has a tenant_id column. The application sets a session variable (SET app.current_tenant_id) before executing any query; the RLS policy filters all reads and writes to rows matching that variable. This prevents accidental cross-tenant access even if the application layer has a bug in tenant context propagation.

Shard-Per-Tenant

Large or high-throughput tenants get dedicated database shards or dedicated service pods. A tenant metadata store maps tenant_id to its shard endpoint. The routing layer performs the lookup on each request. This shard model eliminates noisy neighbor risk for premium tier tenants.

Data Model

Tenant Record

tenant_id — UUID, immutable.
slug — human-readable identifier used in URLs and schema names.
tier — FREE, STANDARD, ENTERPRISE.
isolation_model — SHARED_SCHEMA, SCHEMA_PER_TENANT, DEDICATED_SHARD.
shard_endpoint — database connection string for dedicated shards.
status — PROVISIONING, ACTIVE, SUSPENDED, OFFBOARDING.
created_at, offboarded_at.

Resource Quota Record

tenant_id — foreign key.
rps_limit — requests per second ceiling.
storage_bytes_limit, storage_bytes_used.
cpu_ms_per_minute_limit.
quota_reset_at — for period-based quotas.

Core Algorithms

Quota Enforcement

Use a token bucket per tenant stored in Redis. On each request, the API gateway calls a Lua script (atomic in Redis) to deduct one token. If the bucket is empty, return HTTP 429 with a Retry-After header. Bucket parameters are loaded from the quota record and cached in Redis with a short TTL; quota changes propagate within seconds.

Noisy Neighbor Prevention

Assign each tenant to a weighted fair queue in the application tier. Requests are dequeued proportionally to tenant tier weights, preventing a single bursty tenant from starving others.
Apply CPU time accounting per tenant using a thread-local timer. When a tenant exhausts its CPU quota for the current minute, enqueue subsequent requests with a lower priority or defer them.
Storage writes that exceed the tenant byte limit are rejected immediately with a descriptive error, not silently dropped.

Scalability Design

Cache tenant metadata (isolation model, shard endpoint, quota parameters) in an in-process LRU cache with a 60-second TTL to avoid a Redis or database lookup on every request.
Use connection pooling per shard with PgBouncer to prevent tenant-specific shards from exhausting database connection limits during traffic spikes.
Run tenant usage aggregation asynchronously: collect raw usage events in Kafka, aggregate in a stream processor, and write summaries to a reporting store every minute.
Scale the shared-schema tier horizontally by adding read replicas; route read-only queries to replicas using the RLS context variable.

API Design

POST /v1/tenants — provision a new tenant; returns tenant_id and initial credentials.
GET /v1/tenants/{tenant_id} — retrieve tenant metadata and current status.
PATCH /v1/tenants/{tenant_id}/quota — update resource quota limits (operator only).
GET /v1/tenants/{tenant_id}/usage — return current period usage vs. quota for all resource types.
DELETE /v1/tenants/{tenant_id} — initiate offboarding: suspend access, schedule data deletion after retention period.

Observability

Track per-tenant latency percentiles to detect when one tenant is being degraded by neighbors even within quota limits.
Alert when any tenant exceeds 80% of their storage or RPS quota, giving the customer success team time to reach out before a hard limit is hit.
Monitor RLS policy evaluation errors — any error here is a potential isolation breach and must page the security on-call immediately.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What are the trade-offs between schema-per-tenant and row-level security for tenant isolation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Schema-per-tenant provides strong isolation and independent migration capability at the cost of connection pool explosion and higher operational overhead as tenant count grows. Row-level security (RLS) uses a single schema with a tenant_id predicate enforced by the database, enabling better resource sharing and simpler ops, but risks misconfiguration that could expose cross-tenant data and makes per-tenant schema changes harder.”
}
},
{
“@type”: “Question”,
“name”: “How is data partitioning designed in a multi-tenant service?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Tenant ID is used as the leading partition key in horizontal sharding (consistent hashing or range-based) so all data for a tenant lands on the same shard, enabling efficient tenant-scoped queries and simple data deletion. Large tenants may be assigned dedicated shards while small tenants are co-located. Partition metadata is maintained in a routing table consulted by the application layer.”
}
},
{
“@type”: “Question”,
“name”: “How is resource quota enforcement implemented per tenant?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each tenant is assigned quota limits (API rate, storage bytes, compute seconds) stored in a configuration store. A quota-enforcement middleware checks a per-tenant token bucket (in Redis) on each request. Background jobs track slow-moving metrics (storage, monthly API usage) against quota and emit events when thresholds are approached, allowing soft warnings before hard limits kick in.”
}
},
{
“@type”: “Question”,
“name”: “How do you prevent the noisy neighbor problem in a multi-tenant service?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Noisy neighbor prevention combines rate limiting at the API gateway, CPU and I/O cgroup limits at the container level, database query timeouts and connection limits per tenant, and queue-based work isolation where each tenant's jobs run in separate priority queues. Monitoring detects tenants consuming disproportionate resources so they can be migrated to dedicated infrastructure.”
}
}
]
}