CAP Theorem: Consistency, Availability, and Partition Tolerance

The CAP theorem comes up in almost every senior-level system design interview. Interviewers ask it because it forces you to make real trade-offs — there is no universally correct answer. The question is usually framed as: “Is your design CP or AP? What do you sacrifice, and under what conditions?”

Strategy

The CAP theorem is often misunderstood. Before explaining what each letter means, clear up the common misconception: you don’t get to choose partition tolerance. Network partitions happen in any distributed system — routers fail, cables get cut, data centers lose connectivity. You can’t opt out. So the real trade-off is: when a partition occurs, do you sacrifice consistency or availability?

That framing is much more useful in an interview than memorizing definitions.

The Three Properties

Consistency (C)
Every read receives the most recent write, or an error. All nodes see the same data at the same time. If you write a value to node A, any subsequent read from node B returns that same value (or the system refuses to answer).

Availability (A)
Every request receives a response — not necessarily the most recent data, but a response. The system stays up even if some nodes fail.

Partition Tolerance (P)
The system continues operating even when network communication between nodes fails. Messages between nodes can be lost or delayed arbitrarily.

The Actual Trade-off: CP vs. AP

Since partitions are a given in any real distributed system, the theorem reduces to:

CP systems — when a partition occurs, they refuse requests (or return an error) rather than risk serving stale data. They sacrifice availability to preserve consistency.
AP systems — when a partition occurs, they serve potentially stale data rather than refuse requests. They sacrifice consistency to preserve availability.

CP Examples

Apache Zookeeper: Zookeeper is a coordination service — it stores leader election state, configuration, and distributed locks. If a partition splits the cluster, the minority partition stops accepting writes and returns errors. Wrong configuration state is worse than no response at all.

HBase: Strongly consistent reads and writes. During a region server failure, affected regions go offline until recovery completes.

Traditional relational databases (single node): A single Postgres instance is trivially consistent — but it’s not distributed, so partition tolerance doesn’t meaningfully apply.

AP Examples

Apache Cassandra: Cassandra stays up and accepts reads/writes during a partition, but different nodes may temporarily disagree on the latest value. It uses “eventual consistency” — nodes sync up after the partition heals. You tune the trade-off per-query with consistency levels (ONE, QUORUM, ALL).

Amazon DynamoDB (default settings): Eventual consistency by default. Strongly consistent reads are available but cost 2× the read units.

CouchDB: Designed around availability and partition tolerance. Uses multi-version concurrency control and syncs changes when partitions heal.

Eventual Consistency vs. Strong Consistency

AP systems often use eventual consistency: after a partition heals, all nodes will converge to the same state — eventually. How eventual depends on the system.

Practical implications:

A user updates their profile picture. Another user might see the old picture for a few seconds. Fine for social media; not fine for bank balances.
A shopping cart add can be handled eventually consistent — if two devices add items at the same time, merge them. The alternative (rejecting one add during a partition) is worse for the user.
Inventory counts for flash sales need strong consistency — overselling is a real business problem.

PACELC — The Practical Extension

CAP only describes behavior during a partition. But what about when there’s no partition? That’s where PACELC comes in:

If there is a Partition (P), trade off Availability (A) vs. Consistency (C); Else (E), trade off Latency (L) vs. Consistency (C).

Even without a partition, a strongly consistent system has higher latency — it must confirm writes across multiple nodes before acknowledging. DynamoDB is PA/EL (available during partitions, low-latency else). Spanner is PC/EC (consistent during partitions, consistent else — but higher latency).

Mentioning PACELC in an interview signals you’ve thought beyond the textbook.

How to Use This in a System Design Interview

When you choose a database, be explicit about the trade-off:

// Decision framework
if (data requires strong consistency) {
    // Bank transactions, inventory, distributed locks
    → CP system: Postgres, Spanner, HBase, Zookeeper
} else if (availability and low latency matter more) {
    // User profiles, social feeds, shopping carts
    → AP system: Cassandra, DynamoDB (eventual), CouchDB
}

Sample response to “is your design consistent or available?”:

“I’d use Cassandra here with a QUORUM consistency level for writes — that means writes must be acknowledged by a majority of replicas before returning. This gives us a strong-enough consistency guarantee for this use case while staying available if one replica goes down. For a flash sale inventory counter, I’d use a different approach — Redis with Lua scripts for atomic decrements, or an RDBMS with row-level locking.”

Common Interview Follow-ups

Q: Can you get all three — C, A, and P?
No. That’s the theorem. But in practice, modern systems blur the line: DynamoDB offers tunable consistency per-operation. Cassandra’s QUORUM reads give you strong consistency without giving up full availability.

Q: Is Cassandra CP or AP?
AP by default, but tunable. With consistency level ALL, it becomes CP. The interviewer wants to hear that you know it’s tunable, not that you memorized “Cassandra = AP.”

Q: Where would you use a CP system vs. AP?
CP for: financial transactions, distributed locks, leader election, configuration management.
AP for: social feeds, shopping carts, user-generated content, DNS, CDN caches.

Summary

CAP theorem states that a distributed system can’t simultaneously guarantee consistency, availability, and partition tolerance. Since partitions are unavoidable in real networks, the practical choice is CP (consistency under partition) vs. AP (availability under partition). Know which real-world databases fall into each camp and — more importantly — know why you’d choose one over the other for a given use case.