System Design Article

CAP Theorem & Trade-offs

Difficulty: Easy

The CAP theorem says any distributed data store must trade off Consistency, Availability, or Partition tolerance during a network split, and you only get to keep two. This lesson cuts through the textbook version with the practical engineer's reading: partitions are non-negotiable, so the real choice is between consistency and availability when the network breaks. We cover what each property actually means, why CAP is misleading without PACELC, and how real systems (MongoDB, DynamoDB, Cassandra, Spanner) place themselves on the spectrum. By the end you can defend a system's CAP choice in an interview without falling into the common 'I picked CA' trap.

CAP Theorem & Trade-offs

System Design

Easy

cap-theorem

distributed-systems

consistency

availability

partition-tolerance

system-design

beginner

free

1,151 views

What is the CAP Theorem?

Proposed by Eric Brewer in 2000 and formally proven by Gilbert and Lynch in 2002, the CAP theorem states that a distributed data store can guarantee at most two of these three properties at the same time:

Consistency (C) - every read returns the most recent write or an error. All nodes see the same data at the same time.
Availability (A) - every request to a non-failing node receives a non-error response. The system is always answering.
Partition tolerance (P) - the system continues to operate even when network messages are dropped or delayed between nodes.

The interview-ready summary: in the presence of a network partition, you must choose between consistency and availability. You cannot have both.

Text

---------- The CAP triangle ----------
              Consistency (C)
                   /\
                  /  \
                 /    \
                / CP   \
               /        \
              /  CA      \
             /  (theory)  \
            /              \
           /     AP         \
          /__________________\
  Availability (A)     Partition Tolerance (P)

Why the 'Pick Two' Framing Is Misleading

The popular reading of CAP (often drawn as a triangle) suggests you can pick any two corners. This is technically true but practically wrong: partitions happen. Networks fail, switches reboot, cables get cut, cloud providers have bad days. Any system running on more than one machine will eventually face a partition.

So the meaningful choice is not 'pick two of three'. It is: when (not if) a partition happens, do you sacrifice C or A? A 'CA' system that gives up partition tolerance is just a single-node system - which is fine for SQLite but not for anything called distributed.

This reframing is due to Eric Brewer himself in his 2012 retrospective: 'During a partition, you choose between consistency and availability. Otherwise you do not have to choose.'

CP Systems: Consistency Over Availability

When the network splits, a CP system stops serving requests rather than return stale or conflicting data. The minority partition refuses writes (and often reads) until it can reach a quorum.

Examples: MongoDB (default config with majority writes), HBase, etcd, ZooKeeper, Google Spanner, CockroachDB, traditional RDBMS in synchronous replication.

Use when:

Money is involved (banking, payments, ledgers).
Inventory must not oversell (e-commerce stock).
Strict ordering matters (auctions, scheduling).
You would rather show 'service unavailable' than 'wrong balance'.

The trade-off: real downtime during partitions. If a network split lasts 30 seconds, a CP system has 30 seconds of unavailability for the minority side.

Text

---------- CP system during a partition ----------
  region A (3 nodes) ---X--- region B (2 nodes)
        majority                minority
        accepts writes          refuses writes
        (still consistent)      (returns 503)

AP Systems: Availability Over Consistency

When the network splits, an AP system keeps accepting reads and writes on every side of the partition, even if that means different sides see different data temporarily. Once the partition heals, the sides reconcile (often via last-write-wins or vector clocks or application logic).

Examples: Amazon DynamoDB (default), Cassandra, Riak, CouchDB, DNS itself, most caching layers (Redis in cluster mode with async replication).

Use when:

The product can tolerate brief inconsistency (social feeds, likes, view counts).
Availability is a hard requirement (shopping cart, session data, IoT telemetry).
The write rate is high and global.
Downtime hurts the business more than a temporarily wrong like-count.

The trade-off: clients can read stale data, and concurrent writes can conflict. The application must be designed to handle that (idempotent operations, CRDTs, application-level conflict resolution).

Text

---------- AP system during a partition ----------
  region A (3 nodes) ---X--- region B (2 nodes)
        keeps writing            keeps writing
        (own view)               (own view)
              \                   /
               partition heals
               reconcile (LWW, vector clocks, CRDTs)

A Decision Matrix

Use case	Better choice	Why
Bank account balance	CP	Showing the wrong balance is unacceptable; brief downtime is.
Social media likes	AP	An off-by-five like count is fine; downtime is not.
Inventory checkout	CP	Overselling 50 units of a flash-sale item is a bigger PR problem than slow checkout.
User shopping cart	AP	Customers want their cart to always work, even if it temporarily diverges.
Distributed lock service	CP	The whole point is to be authoritative; no lock service is better than a wrong one.
Real-time analytics dashboard	AP	A few seconds of stale numbers are fine; an empty dashboard is not.
Configuration / service discovery	CP	A wrong config sent to thousands of nodes is catastrophic.
Click-tracking / event ingestion	AP	Drop none, dedupe later. Availability over consistency.

PACELC: The Honest Extension

CAP is silent about what happens when there is no partition. That is most of the time. For that 99% case, every system still trades latency against consistency: synchronous replication adds a round-trip; asynchronous replication is fast but eventually consistent.

PACELC, proposed by Daniel Abadi in 2010, fills the gap:

If a Partition occurs, choose between Availability and Consistency. Else (no partition), choose between Latency and Consistency.

Notation: a system is described by two letters - what it does during a partition, and what it does normally.

System	PACELC	Reading
Cassandra	PA/EL	During partition: availability. Normally: low latency over consistency.
DynamoDB	PA/EL	Same as Cassandra.
MongoDB (majority)	PC/EC	During partition: consistency. Normally: consistency over latency.
HBase	PC/EC	Same.
Spanner	PC/EC	Same, but uses TrueTime to make 'C' really mean linearizable globally.
MySQL with async replicas	PA/EL	Followers can serve stale reads to keep latency low.

PACELC is a more honest framework. Mention it in interviews to signal you have actually thought past the textbook.

How Real Systems Implement Their Choice

Spanner (PC/EC)

Google Spanner is a globally-distributed CP system that uses TrueTime (synchronized atomic clocks + GPS at every data center) to give linearizable transactions across continents. The cost is latency: a write commits in roughly 10 ms locally and 100 ms globally, because Spanner waits out the clock uncertainty before acknowledging.

DynamoDB (PA/EL by default, optional strong consistency)

DynamoDB picks AP by default: writes are accepted by any replica, replicated asynchronously, and eventually consistent reads are the cheap default. You can opt into strongly consistent reads per-request (twice the read capacity unit cost), but you cannot opt into strongly consistent writes - those still propagate eventually.

Cassandra (tunable per request)

Cassandra exposes consistency as a per-request parameter. With 3 replicas (N=3):

R=1, W=1 -> AP, eventual consistency (fastest).
R=2, W=2 -> R + W > N, linearizable for that key.
R=3, W=3 -> linearizable but no fault tolerance.

The standard production setting for many Cassandra deployments is R=QUORUM, W=QUORUM (here, R=2, W=2 for N=3), giving consistency with one node down.

MongoDB (write concern + read preference)

MongoDB lets you tune the same dial:

w: 1 (default) -> acknowledged by primary only. AP-leaning.
w: majority -> waits for a majority of replicas to confirm. CP-leaning.
readPreference: primary -> always read from primary, strong consistency.
readPreference: secondary -> read from a follower, eventual consistency.

How to Talk About CAP in an Interview

The magic words that signal seniority:

Acknowledge that partitions are not optional. 'In a real distributed system, partition tolerance is a given, so the choice is C vs A during a partition.'
Name the system's PACELC class, not just CP/AP. 'DynamoDB is PA/EL; it picks availability under partition and low latency normally.'
Map the choice to the product requirement. 'For a payment ledger, I would pick CP because oversell is worse than downtime.'
Mention the dial. 'Most modern systems let you tune consistency per-request; the question is what the default is and what the application opts into.'
Acknowledge the cost. 'Strong consistency costs latency (extra round trips) and availability (refuse writes during partition). Eventual costs application complexity (conflict resolution).'

Quick Review

CAP says: during a partition, choose C or A. You cannot have both.
'CA' is not a real category - any distributed system must tolerate partitions.
CP -> stops serving in minority partition; preserves consistency.
AP -> keeps serving everywhere; reconciles inconsistencies later.
PACELC adds: even without a partition, every system trades latency vs consistency.
Pick CP for money, locks, inventory, configuration. Pick AP for feeds, carts, telemetry, analytics.
Real systems give you a per-request dial; defaults matter.

Real-World Examples

How real systems implement this in production

Google Spanner (PC/EC)

Spanner is a globally-distributed SQL database that provides linearizable transactions across continents. It uses TrueTime (synchronized atomic clocks plus GPS at every data center) to bound clock uncertainty, then waits out that uncertainty before committing a write. This gives you strong consistency at global scale.

Trade-off: Strong global consistency costs latency. A write that needs cross-region quorum takes ~100 ms; a read at a stale read-timestamp is faster but trades freshness. You pay TrueTime hardware (atomic clocks) and the wait time for the safety it gives you.

Amazon DynamoDB (PA/EL)

DynamoDB defaults to availability and low latency. Reads are eventually consistent unless you explicitly request a strongly consistent read (which costs 2x read capacity units). Writes are always async-replicated. The design assumes that most workloads (shopping carts, sessions, IoT) can tolerate brief staleness in exchange for single-digit-ms latency at any scale.

Trade-off: You get extreme write availability and predictable low latency, but the application must handle eventual consistency. Conditional writes and atomic counters are provided as primitives so the app does not have to roll its own.

Cassandra (tunable per request)

Cassandra exposes the CAP dial directly. Each query specifies a consistency level (ONE, QUORUM, ALL) for both reads and writes. Production deployments commonly use QUORUM/QUORUM (R + W > N for N=3 means R=2, W=2), which gives strong consistency with one node down. Lighter use cases use ONE for both, sacrificing consistency for speed.

Trade-off: The flexibility shifts the burden to the developer. Choosing the wrong level on the wrong query gives you mysterious bugs (stale reads on write-light queries, write timeouts on read-light queries). Most Cassandra outages trace back to misconfigured consistency levels.

etcd / ZooKeeper (PC/EC)

Distributed coordination services like etcd (used by Kubernetes) and ZooKeeper choose CP. They use Raft (etcd) or ZAB (ZooKeeper) to maintain a strongly consistent log across an odd number of nodes. The minority side of any partition refuses writes and stale reads.

Trade-off: Configuration data must be correct everywhere, so consistency wins over availability. The cluster cannot serve writes if more than half the nodes are unreachable, but a stale Kubernetes config is far worse than a brief control-plane outage.

Quick Interview Phrases

Key terms to use in your answer

consistency vs availability trade-off

partition tolerance is non-negotiable

PACELC

eventual consistency

quorum reads and writes

linearizable

Common Interview Questions

Questions you might be asked about this topic

Explain the CAP theorem and how it applies to a system you would design.

Define each letter precisely. State that partition tolerance is mandatory in any distributed system, so the real choice is C vs A during a partition. Map to a concrete product: 'For a payment system, I would pick CP because incorrect balances are worse than brief downtime; I would use a database like Spanner or PostgreSQL with synchronous replicas.' Bonus: mention PACELC to acknowledge the latency/consistency trade-off when there is no partition.

Is DynamoDB a CP or AP system? Why?

Why is 'CA' (consistent and available without partition tolerance) usually not a meaningful option?

How would you handle a 30-second network partition between your two data centers?

What is PACELC and why is it more useful than CAP?

Interview Tips

How to discuss this topic effectively

Always say 'during a partition' before discussing C vs A. Without that qualifier you sound like you have only read the Wikipedia page.

Name PACELC. It signals you understand the latency/consistency trade-off that applies even without a partition, which is most of the time.

Map the CAP choice to the product. 'Bank ledger: CP because oversell is worse than 30 seconds of downtime' is much stronger than 'CP for ACID'.

Avoid claiming a real system is 'CA'. Any database that runs on multiple nodes must be either CP or AP under a partition.

Mention the per-request dial in modern databases (Cassandra R/W, MongoDB write concern, DynamoDB consistent read). It shows you know the defaults are not the whole story.

Common Mistakes

Pitfalls to avoid in interviews

Treating CAP as 'pick any two of three'

Partitions will happen in any distributed system, so the real choice is C vs A during a partition. Calling a system 'CA' usually means it is not actually distributed (single-node databases like SQLite).

Thinking 'eventual consistency' means data is never consistent

Eventual consistency means that, in the absence of new writes, all replicas will converge to the same value within some bounded time (often milliseconds). It is not 'maybe consistent' - it is a precise model with measurable convergence guarantees.

Assuming SQL means CP and NoSQL means AP

The choice is per-system, not per-paradigm. PostgreSQL with async replicas is AP-leaning; DynamoDB with strong reads is CP-leaning. Spanner is SQL and CP; Cassandra is NoSQL and tunable. Always check the actual configuration.

Believing CP systems have zero downtime

CP systems trade availability for consistency. During a partition, the minority side returns errors. The 'consistency' in CP is achieved by refusing requests, not by magically working through the partition.

Forgetting that latency is part of the CAP picture

Strong consistency requires synchronous replication, which adds round-trip latency on every write. PACELC makes this explicit: even without a partition, you trade latency for consistency. A 'strongly consistent' globally-distributed database is also a high-latency one.

Back to System Design