System Design Article
CAP Theorem & Trade-offs
Difficulty: Easy
The CAP theorem says any distributed data store must trade off Consistency, Availability, or Partition tolerance during a network split, and you only get to keep two. This lesson cuts through the textbook version with the practical engineer's reading: partitions are non-negotiable, so the real choice is between consistency and availability when the network breaks. We cover what each property actually means, why CAP is misleading without PACELC, and how real systems (MongoDB, DynamoDB, Cassandra, Spanner) place themselves on the spectrum. By the end you can defend a system's CAP choice in an interview without falling into the common 'I picked CA' trap.
CAP Theorem & Trade-offs
The CAP theorem says any distributed data store must trade off Consistency, Availability, or Partition tolerance during a network split, and you only get to keep two. This lesson cuts through the textbook version with the practical engineer's reading: partitions are non-negotiable, so the real choice is between consistency and availability when the network breaks. We cover what each property actually means, why CAP is misleading without PACELC, and how real systems (MongoDB, DynamoDB, Cassandra, Spanner) place themselves on the spectrum. By the end you can defend a system's CAP choice in an interview without falling into the common 'I picked CA' trap.
1,151 views
4
What is the CAP Theorem?
Proposed by Eric Brewer in 2000 and formally proven by Gilbert and Lynch in 2002, the CAP theorem states that a distributed data store can guarantee at most two of these three properties at the same time:
- Consistency (C) - every read returns the most recent write or an error. All nodes see the same data at the same time.
- Availability (A) - every request to a non-failing node receives a non-error response. The system is always answering.
- Partition tolerance (P) - the system continues to operate even when network messages are dropped or delayed between nodes.
The interview-ready summary: in the presence of a network partition, you must choose between consistency and availability. You cannot have both.
---------- The CAP triangle ----------
Consistency (C)
/\
/ \
/ \
/ CP \
/ \
/ CA \
/ (theory) \
/ \
/ AP \
/__________________\
Availability (A) Partition Tolerance (P)Why the 'Pick Two' Framing Is Misleading
The popular reading of CAP (often drawn as a triangle) suggests you can pick any two corners. This is technically true but practically wrong: partitions happen. Networks fail, switches reboot, cables get cut, cloud providers have bad days. Any system running on more than one machine will eventually face a partition.
So the meaningful choice is not 'pick two of three'. It is: when (not if) a partition happens, do you sacrifice C or A? A 'CA' system that gives up partition tolerance is just a single-node system - which is fine for SQLite but not for anything called distributed.
This reframing is due to Eric Brewer himself in his 2012 retrospective: 'During a partition, you choose between consistency and availability. Otherwise you do not have to choose.'
CP Systems: Consistency Over Availability
When the network splits, a CP system stops serving requests rather than return stale or conflicting data. The minority partition refuses writes (and often reads) until it can reach a quorum.
Examples: MongoDB (default config with majority writes), HBase, etcd, ZooKeeper, Google Spanner, CockroachDB, traditional RDBMS in synchronous replication.
Use when:
- Money is involved (banking, payments, ledgers).
- Inventory must not oversell (e-commerce stock).
- Strict ordering matters (auctions, scheduling).
- You would rather show 'service unavailable' than 'wrong balance'.
The trade-off: real downtime during partitions. If a network split lasts 30 seconds, a CP system has 30 seconds of unavailability for the minority side.
---------- CP system during a partition ----------
region A (3 nodes) ---X--- region B (2 nodes)
majority minority
accepts writes refuses writes
(still consistent) (returns 503)AP Systems: Availability Over Consistency
When the network splits, an AP system keeps accepting reads and writes on every side of the partition, even if that means different sides see different data temporarily. Once the partition heals, the sides reconcile (often via last-write-wins or vector clocks or application logic).
Examples: Amazon DynamoDB (default), Cassandra, Riak, CouchDB, DNS itself, most caching layers (Redis in cluster mode with async replication).
Use when:
- The product can tolerate brief inconsistency (social feeds, likes, view counts).
- Availability is a hard requirement (shopping cart, session data, IoT telemetry).
- The write rate is high and global.
- Downtime hurts the business more than a temporarily wrong like-count.
The trade-off: clients can read stale data, and concurrent writes can conflict. The application must be designed to handle that (idempotent operations, CRDTs, application-level conflict resolution).
---------- AP system during a partition ----------
region A (3 nodes) ---X--- region B (2 nodes)
keeps writing keeps writing
(own view) (own view)
\ /
partition heals
reconcile (LWW, vector clocks, CRDTs)A Decision Matrix
| Use case | Better choice | Why |
|---|---|---|
| Bank account balance | CP | Showing the wrong balance is unacceptable; brief downtime is. |
| Social media likes | AP | An off-by-five like count is fine; downtime is not. |
| Inventory checkout | CP | Overselling 50 units of a flash-sale item is a bigger PR problem than slow checkout. |
| User shopping cart | AP | Customers want their cart to always work, even if it temporarily diverges. |
| Distributed lock service | CP | The whole point is to be authoritative; no lock service is better than a wrong one. |
| Real-time analytics dashboard | AP | A few seconds of stale numbers are fine; an empty dashboard is not. |
| Configuration / service discovery | CP | A wrong config sent to thousands of nodes is catastrophic. |
| Click-tracking / event ingestion | AP | Drop none, dedupe later. Availability over consistency. |
PACELC: The Honest Extension
CAP is silent about what happens when there is no partition. That is most of the time. For that 99% case, every system still trades latency against consistency: synchronous replication adds a round-trip; asynchronous replication is fast but eventually consistent.
PACELC, proposed by Daniel Abadi in 2010, fills the gap:
If a Partition occurs, choose between Availability and Consistency. Else (no partition), choose between Latency and Consistency.
Notation: a system is described by two letters - what it does during a partition, and what it does normally.
| System | PACELC | Reading |
|---|---|---|
| Cassandra | PA/EL | During partition: availability. Normally: low latency over consistency. |
| DynamoDB | PA/EL | Same as Cassandra. |
| MongoDB (majority) | PC/EC | During partition: consistency. Normally: consistency over latency. |
| HBase | PC/EC | Same. |
| Spanner | PC/EC | Same, but uses TrueTime to make 'C' really mean linearizable globally. |
| MySQL with async replicas | PA/EL | Followers can serve stale reads to keep latency low. |
PACELC is a more honest framework. Mention it in interviews to signal you have actually thought past the textbook.
How Real Systems Implement Their Choice
Spanner (PC/EC)
Google Spanner is a globally-distributed CP system that uses TrueTime (synchronized atomic clocks + GPS at every data center) to give linearizable transactions across continents. The cost is latency: a write commits in roughly 10 ms locally and 100 ms globally, because Spanner waits out the clock uncertainty before acknowledging.
DynamoDB (PA/EL by default, optional strong consistency)
DynamoDB picks AP by default: writes are accepted by any replica, replicated asynchronously, and eventually consistent reads are the cheap default. You can opt into strongly consistent reads per-request (twice the read capacity unit cost), but you cannot opt into strongly consistent writes - those still propagate eventually.
Cassandra (tunable per request)
Cassandra exposes consistency as a per-request parameter. With 3 replicas (N=3):
R=1, W=1-> AP, eventual consistency (fastest).R=2, W=2-> R + W > N, linearizable for that key.R=3, W=3-> linearizable but no fault tolerance.
The standard production setting for many Cassandra deployments is R=QUORUM, W=QUORUM (here, R=2, W=2 for N=3), giving consistency with one node down.
MongoDB (write concern + read preference)
MongoDB lets you tune the same dial:
w: 1(default) -> acknowledged by primary only. AP-leaning.w: majority-> waits for a majority of replicas to confirm. CP-leaning.readPreference: primary-> always read from primary, strong consistency.readPreference: secondary-> read from a follower, eventual consistency.
How to Talk About CAP in an Interview
The magic words that signal seniority:
- Acknowledge that partitions are not optional. 'In a real distributed system, partition tolerance is a given, so the choice is C vs A during a partition.'
- Name the system's PACELC class, not just CP/AP. 'DynamoDB is PA/EL; it picks availability under partition and low latency normally.'
- Map the choice to the product requirement. 'For a payment ledger, I would pick CP because oversell is worse than downtime.'
- Mention the dial. 'Most modern systems let you tune consistency per-request; the question is what the default is and what the application opts into.'
- Acknowledge the cost. 'Strong consistency costs latency (extra round trips) and availability (refuse writes during partition). Eventual costs application complexity (conflict resolution).'
Quick Review
- CAP says: during a partition, choose C or A. You cannot have both.
- 'CA' is not a real category - any distributed system must tolerate partitions.
- CP -> stops serving in minority partition; preserves consistency.
- AP -> keeps serving everywhere; reconciles inconsistencies later.
- PACELC adds: even without a partition, every system trades latency vs consistency.
- Pick CP for money, locks, inventory, configuration. Pick AP for feeds, carts, telemetry, analytics.
- Real systems give you a per-request dial; defaults matter.
Real-World Examples
How real systems implement this in production
Spanner is a globally-distributed SQL database that provides linearizable transactions across continents. It uses TrueTime (synchronized atomic clocks plus GPS at every data center) to bound clock uncertainty, then waits out that uncertainty before committing a write. This gives you strong consistency at global scale.
Trade-off: Strong global consistency costs latency. A write that needs cross-region quorum takes ~100 ms; a read at a stale read-timestamp is faster but trades freshness. You pay TrueTime hardware (atomic clocks) and the wait time for the safety it gives you.
DynamoDB defaults to availability and low latency. Reads are eventually consistent unless you explicitly request a strongly consistent read (which costs 2x read capacity units). Writes are always async-replicated. The design assumes that most workloads (shopping carts, sessions, IoT) can tolerate brief staleness in exchange for single-digit-ms latency at any scale.
Trade-off: You get extreme write availability and predictable low latency, but the application must handle eventual consistency. Conditional writes and atomic counters are provided as primitives so the app does not have to roll its own.
Cassandra exposes the CAP dial directly. Each query specifies a consistency level (ONE, QUORUM, ALL) for both reads and writes. Production deployments commonly use QUORUM/QUORUM (R + W > N for N=3 means R=2, W=2), which gives strong consistency with one node down. Lighter use cases use ONE for both, sacrificing consistency for speed.
Trade-off: The flexibility shifts the burden to the developer. Choosing the wrong level on the wrong query gives you mysterious bugs (stale reads on write-light queries, write timeouts on read-light queries). Most Cassandra outages trace back to misconfigured consistency levels.
Distributed coordination services like etcd (used by Kubernetes) and ZooKeeper choose CP. They use Raft (etcd) or ZAB (ZooKeeper) to maintain a strongly consistent log across an odd number of nodes. The minority side of any partition refuses writes and stale reads.
Trade-off: Configuration data must be correct everywhere, so consistency wins over availability. The cluster cannot serve writes if more than half the nodes are unreachable, but a stale Kubernetes config is far worse than a brief control-plane outage.
Quick Interview Phrases
Key terms to use in your answer
Common Interview Questions
Questions you might be asked about this topic
Define each letter precisely. State that partition tolerance is mandatory in any distributed system, so the real choice is C vs A during a partition. Map to a concrete product: 'For a payment system, I would pick CP because incorrect balances are worse than brief downtime; I would use a database like Spanner or PostgreSQL with synchronous replicas.' Bonus: mention PACELC to acknowledge the latency/consistency trade-off when there is no partition.
DynamoDB is AP by default (PA/EL in PACELC). Writes are accepted by any replica and replicated asynchronously; reads default to eventually consistent. You can opt into strongly consistent reads per-request at twice the cost, but writes remain async. The design favors availability and low latency over global consistency, which fits its target use cases (high-throughput key-value workloads, shopping carts, sessions).
Networks fail. Any system running on more than one machine will eventually face a partition (cable cut, switch reboot, cloud-provider issue). A 'CA' system is one that ignores partitions, which means either (a) it runs on a single machine (e.g., SQLite), or (b) it claims to be CA but actually fails badly under partition (split brain, data corruption). Practically, any distributed database is CP or AP.
Decision depends on the product. CP path: minority data center stops serving writes, returns 503; clients are routed to the majority side; on heal, minority catches up via replication. AP path: both sides keep accepting writes; on heal, conflicts are resolved (last-write-wins, vector clocks, or application-level merge). Discuss detection (health checks, heartbeats), failover (automatic vs manual), and reconciliation. For most user-facing systems with tolerable inconsistency, AP is the default.
PACELC extends CAP by addressing what happens when there is no partition - which is most of the time. The framework: 'If Partition, choose Availability or Consistency; Else, choose Latency or Consistency.' This captures the fact that synchronous replication (strong consistency) costs latency on every write, even in the happy path. Naming a system as PA/EL (e.g., Cassandra, DynamoDB) or PC/EC (e.g., MongoDB majority, Spanner) gives a more complete picture than CP/AP alone.
Interview Tips
How to discuss this topic effectively
Always say 'during a partition' before discussing C vs A. Without that qualifier you sound like you have only read the Wikipedia page.
Name PACELC. It signals you understand the latency/consistency trade-off that applies even without a partition, which is most of the time.
Map the CAP choice to the product. 'Bank ledger: CP because oversell is worse than 30 seconds of downtime' is much stronger than 'CP for ACID'.
Avoid claiming a real system is 'CA'. Any database that runs on multiple nodes must be either CP or AP under a partition.
Mention the per-request dial in modern databases (Cassandra R/W, MongoDB write concern, DynamoDB consistent read). It shows you know the defaults are not the whole story.
Common Mistakes
Pitfalls to avoid in interviews
Treating CAP as 'pick any two of three'
Partitions will happen in any distributed system, so the real choice is C vs A during a partition. Calling a system 'CA' usually means it is not actually distributed (single-node databases like SQLite).
Thinking 'eventual consistency' means data is never consistent
Eventual consistency means that, in the absence of new writes, all replicas will converge to the same value within some bounded time (often milliseconds). It is not 'maybe consistent' - it is a precise model with measurable convergence guarantees.
Assuming SQL means CP and NoSQL means AP
The choice is per-system, not per-paradigm. PostgreSQL with async replicas is AP-leaning; DynamoDB with strong reads is CP-leaning. Spanner is SQL and CP; Cassandra is NoSQL and tunable. Always check the actual configuration.
Believing CP systems have zero downtime
CP systems trade availability for consistency. During a partition, the minority side returns errors. The 'consistency' in CP is achieved by refusing requests, not by magically working through the partition.
Forgetting that latency is part of the CAP picture
Strong consistency requires synchronous replication, which adds round-trip latency on every write. PACELC makes this explicit: even without a partition, you trade latency for consistency. A 'strongly consistent' globally-distributed database is also a high-latency one.
