Community Article

CAP, PACELC, and the Trade-off People Misquote

CAP is a real theorem about a narrow edge case. PACELC is the framing that captures the trade-off teams actually make in production.

CAP, PACELC, and the Trade-off People Misquote

CAP is a real theorem about a narrow edge case. PACELC is the framing that captures the trade-off teams actually make in production.

cap-theorem
consistency
availability
distributed-systems
system-design
calebhadid

By @calebhadid

May 8, 2026

·

Updated May 18, 2026

1,061 views

25

4.4 (9)

When I interview senior backend candidates and CAP comes up, I hear a version of this answer about half the time: "CAP says you can pick two of consistency, availability, and partition tolerance." That sentence is wrong in three different ways at once, and the wrongness matters because the people who repeat it then go on to design systems based on it.

This article is the version of the conversation I would rather have. My stance: CAP is a real theorem about a narrow worst-case scenario, but it is misquoted so often that PACELC is the more useful framing for actual production decisions. PACELC is the part most articles skip, and it is the part that captures the choice teams actually make.

What CAP actually says

CAP, due to Eric Brewer, formalized by Gilbert and Lynch, makes a precise claim: in a distributed system, when a network partition happens, you cannot simultaneously guarantee both linearizable consistency and availability for every request. During a partition, a node that is cut off from its peers must either refuse the request (sacrificing availability) or answer it without checking with the others (potentially sacrificing consistency).

That is the whole theorem. Note three things it does not say:

  1. It does not say you have to give up partition tolerance. You cannot give up partition tolerance; the network can fail whether you like it or not. Saying "I chose CA" is saying "I chose to assume the network is perfect", which is not a design choice, it is a fantasy.
  2. It does not apply when there is no partition. The theorem only constrains behavior during the partition window. When the network is healthy, you can have both consistency and availability.
  3. It does not say "pick two". The CAP triangle drawing where three databases sit on three vertices is the part the internet got hooked on, but it is a teaching aid, not the theorem. The theorem is about a forced trade-off during partitions, not about a permanent personality trait of the system.

The third point is the one I push hardest in interviews. "Pick two" treats CAP like a menu. The theorem is about behavior under an event (a partition), not a menu of features.

Why PACELC is the better question

Daniel Abadi added the necessary second clause: PACELC. It reads as: when there is a Partition, the system must choose between Availability and Consistency; Else (when there is no partition), it chooses between Latency and Consistency. The interesting choice in production is usually the second one.

Why? Because partitions are rare in well-run modern infrastructure. They happen, but they are minority of operating time. The decision a system makes during normal operation, the latency-vs-consistency trade-off, dominates user-perceived behavior. CAP only describes the rare-failure path. PACELC describes both the rare-failure path and the common path.

In practice, the latency-vs-consistency choice shows up like this: do you wait for an acknowledgment from a quorum of replicas before returning success (slower writes, stronger consistency), or do you return as soon as the leader has the write and replicate asynchronously (faster writes, possible read-your-writes failure on a different replica)? That is a real product decision. CAP cannot help you make it. PACELC names it directly.

A concrete shape: the same database, different settings

The same database can sit in different PACELC quadrants depending on configuration. Consider a typical leader-follower setup with three replicas:

Configuration A: synchronous replication to majority
  P -> C  (during partition: refuse writes that cannot reach majority)
  E -> C  (during normal: wait for majority ack, ~10-20ms latency floor)
  PACELC: PC/EC

Configuration B: leader-only writes, async replication
  P -> A  (during partition: leader keeps accepting writes, followers diverge)
  E -> L  (during normal: 1-3ms latency, possible stale reads on followers)
  PACELC: PA/EL

In configuration A you have something like Spanner-style behavior: you pay tens of milliseconds per write to keep things consistent; you refuse to accept writes during a network partition that isolates a minority. In configuration B you have something closer to default MongoDB or Cassandra: faster writes, faster reads, but read-your-writes is not guaranteed.

Real databases let you tune this. Cassandra's CONSISTENCY level lets you pick ONE, QUORUM, or ALL per query. MongoDB's writeConcern and readConcern are the same idea. DynamoDB has consistent-read and eventually-consistent-read modes. Postgres with synchronous replicas behaves like config A; Postgres with async replicas behaves like config B. The PACELC quadrant is not a property of the database; it is a property of the configuration.

The misquote that costs teams real money

The misquote I see most often in design docs: "we chose AP" when describing a system that does not actually have a partition. What that usually means is "we accept eventual consistency for performance reasons." That is a perfectly defensible choice. It just is not what AP means in CAP terms. AP describes behavior during a partition. The choice the team is actually making is in the EL/EC dimension of PACELC.

Why does this matter? Because a team that thinks they have chosen AP is reasoning about partitions, but the bug they will hit is read-your-writes inconsistency on a happy-path day with no partition at all. A user updates their email, refreshes the page, sees the old email because the read went to a follower that has not replicated yet. That is an EL choice, not a P choice. The team blames the partition that did not happen.

The cost is real: incidents get misdiagnosed, fixes target the wrong layer, and the team's mental model of their system stays wrong. I have sat in postmortems where someone wrote "this was the CAP trade-off catching up to us" and the fix was to change a write concern, which has nothing to do with CAP.

What I want every backend engineer to be able to answer

Two questions. If you cannot answer both for a given service, you do not understand its consistency posture.

  1. During a partition, does this service prioritize availability or consistency? That is the P clause. Answer is usually "availability for reads, consistency for writes" or "consistency for everything (we'd rather refuse writes than diverge)".
  2. During normal operation, does this service prioritize latency or consistency? That is the EL/EC clause. Answer is usually "latency for reads (followers can serve stale data), consistency for writes (leader confirms before responding)".

Most production services are PA/EL or PC/EC at the database layer. The interesting middle ground is PA/EC: "during partitions we'll keep accepting writes, but during normal operation we always check the leader." That is what some payment systems aim for, and the implementation is non-trivial.

A shopping cart, walked through PACELC

Imagine a shopping cart service. The product owner wants:

Cart service requirements
  - cart writes (add to cart) must never be lost
  - cart reads should be fast (sub-50ms p99)
  - it is acceptable if a freshly added item takes 1-2 seconds to appear on a different device
  - on a network partition, the user should still be able to add to cart on their primary device

How does that map to PACELC?

The first requirement (no lost writes) is durability, not consistency. Quorum write to a majority handles it. The second requirement (fast reads) is the EL choice: serve reads from a local replica even if it is slightly stale. The third requirement (cross-device delay tolerance) is EL again: the team accepts read-your-writes failures across replicas. The fourth requirement (write availability under partition) is PA.

So the cart service is PA/EL. That maps cleanly to a leader-follower database with quorum writes and asynchronous replication, and reads served from any replica. Cassandra-style quorum levels, or Postgres with async replicas plus a read-replica-routing layer, both fit.

If the requirements changed, the answer would change. "Cart writes must always be visible on the next read from any device, no exceptions" pushes the system toward PC/EC: stronger consistency, slower reads, refuse writes during partitions. That is what banking services and inventory-counting services tend to choose.

Things that are not CAP trade-offs

Three confusions I have run into and pushed back on:

The first is caching. "We added Redis in front of Postgres, so we made a CAP trade-off." No. A cache is a performance optimization with its own consistency model (write-through, write-behind, cache-aside, with TTLs). The cache is not partition behavior; it is a separate question with its own answer.

The second is replication lag. "We have read replicas, so we are AP." No, you are EL. Replication lag is a normal-operation latency-consistency trade-off. CAP only constrains you during a partition.

The third is multi-region writes. "We do active-active across regions, so we are AP." Closer, but not quite. Active-active multi-region is genuinely a place where partitions happen often (cross-region links fail more than intra-region links), so the P clause is exercised. But the trade-off the team made is usually still EL on the within-region path; the cross-region replication adds an extra dimension that PACELC does not fully capture, and you might want a finer-grained model like "causal consistency" or "bounded staleness".

Where the theorem is genuinely useful

I want to be fair to CAP. There are cases where the theorem captures a real, sharp trade-off:

In a leader-elected system where the leader is on one side of a partition and a quorum of followers is on the other side, the followers can either elect a new leader (sacrificing the writes the old leader was accepting) or refuse to elect a new leader (sacrificing availability for the half that wanted to elect). That is a CAP moment. Raft and Paxos are designed around exactly this dilemma.

In a distributed lock service, partition behavior is the entire game. If a node thinks it holds the lock but cannot reach the coordinator, does it keep holding (consistency at the cost of liveness) or release (availability at the cost of consistency)? That is a CAP moment.

In a globally-replicated database with strong consistency goals (Spanner-style), the synchronous replication strategy is a direct response to CAP: pay the latency cost to avoid the partition-time inconsistency. That is a CAP-driven design.

These are real cases. They just do not describe most of the services most of us work on, which is why the misquote-rate is so high.

How I would frame the question in a design review

Three questions I write into design templates I review:

  1. What is this service's consistency posture during normal operation? Answer in EL/EC terms, not CAP.
  2. What is its consistency posture during a partition? Answer in PA/PC terms.
  3. What user-visible behavior would change if we flipped the EL/EC choice? Make the team articulate the real trade-off, not the abstract one.

The third question is the one that usually shakes loose a wrong assumption. A team that says "we chose eventual consistency" but cannot describe what user-visible bug that would cause has not actually thought about it. A team that can describe it, and can name the user populations affected, has thought about it and the answer is much more likely to be right.

The position I would defend

CAP is a real theorem about a real edge case. PACELC is a more honest framing for production systems because it surfaces the latency-versus-consistency trade-off that dominates normal operation. Most teams that say "we chose AP" are actually choosing in the EL dimension and not the P dimension; the misquote is so widespread that it is a reliable signal of unfamiliarity with the literature. If I am hiring for senior backend roles, the candidate who can describe their service's PACELC quadrant in concrete terms (with numbers, with the actual replication setup, with the user-visible failure modes during partition vs during normal operation) is a stronger candidate than the one who can recite the CAP triangle from memory. Memorize the triangle for trivia; reach for PACELC when designing the system.

Back to Articles