Community Article

SQL vs NoSQL: Stop Asking the Wrong Question

The choice that matters is not the data model. It is which guarantees you need on read and write. A decision table and the JSONB middle ground that retired half my Mongo use cases.

SQL vs NoSQL: Stop Asking the Wrong Question

The choice that matters is not the data model. It is which guarantees you need on read and write. A decision table and the JSONB middle ground that retired half my Mongo use cases.

database

sql

nosql

data-modeling

scalability

By @tylerperry

April 20, 2026

Updated May 18, 2026

967 views

4.3 (14)

I have been on too many architecture review calls that opened with someone announcing "we should use Mongo because our data is unstructured" and ended with that same person shipping a relational schema with five JSON columns. The conversation is almost never about data structure. It is about transactional guarantees, query patterns, write throughput, and operational familiarity. SQL vs NoSQL is the wrong frame because the choice that matters is which guarantees you need, not which data model you ship on top of them.

This article is the question I now lead with on those calls, the one I want every team asking before they pick a data store: what does my workload actually need, and which database engine matches it? The answer might still be "Postgres". It might be "Postgres plus Redis". It might be "DynamoDB". But it will not be "Mongo because our data is unstructured".

The two questions that decide the database

I distill it down to two questions when I help a team pick.

Question one: what are your access patterns? Do you read by primary key only, or do you need to filter, join, aggregate, sort? Do reads come in predictable shapes (one endpoint, one query) or arbitrary shapes (analytics, ad-hoc filtering)? Are queries point lookups, range scans, or full-table aggregations?

Question two: what guarantees do you need on write? Strong consistency across rows, or eventual? Multi-row transactions, or single-row atomic? Synchronous durability, or fire and forget? What is your tolerated data-loss window if a node dies right now?

Those two questions narrow the choice more than any "SQL vs NoSQL" list ever did. The data model (rows vs documents vs wide columns) follows the answers; it does not lead them.

What "NoSQL" actually contains

The term "NoSQL" lumps together at least four very different categories. Pretending they are one is most of the confusion.

Document stores (MongoDB, Couchbase, Firestore) hold JSON-like records, queryable by fields. Schema-less in name, schema-shaped in practice. Single-document atomicity is the strong guarantee; multi-document transactions exist but are an opt-in with a cost.
Key-value stores (Redis, Memcached, RocksDB) take a key, return a value. Sometimes with TTL, lists, sets. No queries beyond key lookup. Tiny per-op latency.
Wide-column stores (Cassandra, ScyllaDB, HBase) are designed for write-heavy workloads with predictable access patterns. You denormalize hard, you query by partition key, you trade query flexibility for horizontal write scalability.
Graph databases (Neo4j, Neptune) optimize for relationship traversal. If your workload is "find all friends-of-friends within three hops", the relational answer is painful and the graph answer is one query.

The first time I saw someone reach for Cassandra to replace a relational schema with multi-row joins, I knew they had made the call from a slide deck and not from their workload. Cassandra does not do joins. It does not do ad-hoc queries. It does write throughput, partitioned by a key you pick in advance, and it does that better than almost anything else. Picking it for the wrong reasons produces a system that is both expensive and constrained.

Where SQL still wins, by default

I default to a relational database (Postgres, specifically) for almost every new system. The reasons stack up faster than people think.

Multi-row ACID transactions. Most product features touch more than one row. Creating an order writes the order, decrements the inventory, inserts a charge. Doing this atomically in Postgres is one BEGIN/COMMIT. Doing it in a document store typically means either wrapping the writes in the document store's multi-document transaction (slower, with restrictions) or designing your way around the absence (idempotency keys, sagas, compensating transactions). The cost of designing around the absence is real engineering time.

Ad-hoc queries. When the product team asks "how many users signed up last month and converted to paid within seven days", you write that as a SQL query and run it. In a document store, the same question requires either a pre-computed aggregation pipeline or a full scan that reads every document. Postgres can answer the question in seconds the first time you ask it, without preparation.

The query planner. Postgres's planner reasons about your data, your indexes, and your statistics, and picks an execution plan. The plan adapts as your data grows. NoSQL datastores typically push that responsibility to the developer: you choose the access path when you design the schema, and changing access patterns means redesigning the data model.

Operational maturity. Postgres has 25+ years of production scars. Replication, backup, failover, monitoring, query tuning, all the operational tooling exists, is well documented, and works. NoSQL operational tooling has gotten much better, but "my Mongo cluster needs a major version upgrade with zero downtime" is still a more interesting weekend than "my Postgres needs a major version upgrade".

Where SQL pushes back

I do not pretend SQL wins everything. Specific workloads where I reach for a non-SQL store:

Caching. Redis is not a primary store; it is a cache and a coordination service. Anything I would query with sub-millisecond latency, where the source of truth lives elsewhere, lives in Redis. Sessions, rate limit counters, leaderboards, deduplication keys, ephemeral state.

High-write time-series workloads. Logs, metrics, IoT events, click streams. The write pattern is append-only, the read pattern is range scans by time, the volume is huge. Postgres can handle a lot here, especially with TimescaleDB, but at very high ingest rates a purpose-built time-series store (InfluxDB, ClickHouse) earns its keep.

Schema flexibility for genuinely heterogeneous data. If you are storing user-submitted forms with no consistent shape, where each tenant has their own field set and you cannot control or evolve those shapes, a document store is a reasonable fit. Postgres has JSONB, which covers a lot of this case, but truly schema-less use cases (CMS-style content, varied integrations) are a fair Mongo or DynamoDB use.

Massive horizontal write scale. When you genuinely need to write a million rows per second across multiple regions, Cassandra-style wide-column stores win. The number of products that genuinely need that level is small, but they exist (telemetry, ad serving, very large social networks).

The JSONB middle ground

The single biggest reason I have stopped reaching for Mongo as often as I used to is JSONB in Postgres. JSONB is a binary JSON column type with indexable paths, GIN indexes for full-document search, and full SQL access. You get document storage when you need it, with relational guarantees and SQL queries surrounding it.

CREATE TABLE events (
    id BIGSERIAL PRIMARY KEY,
    user_id INT NOT NULL REFERENCES users(id),
    payload JSONB NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX events_user_idx ON events(user_id, created_at DESC);
CREATE INDEX events_payload_gin ON events USING GIN (payload);

-- query both relational and document-shaped fields together
SELECT id, payload->>'event_name', payload->'metadata'->>'source'
FROM events
WHERE user_id = 42
  AND created_at >= NOW() - INTERVAL '7 days'
  AND payload @> '{"event_name": "signup"}';

This shape covers most workloads where teams reach for "document database". Per-tenant custom fields, varied event payloads, configuration objects with unknown shapes. You get the relational joins and the ACID transactions and the SQL queries. You also get JSONB without giving up indexes or the planner. The combination is hard to match elsewhere.

The decision table I actually use

Distilled to a small table:

Workload	My pick	Why
Standard product CRUD with relations	Postgres	Joins, transactions, planner, JSONB for the soft parts
Cache, sessions, ephemeral state	Redis	Sub-ms latency, TTL, structures
Logs, metrics, time-series	ClickHouse / Timescale	Write-optimized, range-scan queries, columnar compression
Single-region high-write key-value	DynamoDB / Postgres + partitioning	Horizontal write scale; managed scaling on Dynamo
Heavy graph traversals	Neo4j	Relationship traversal as a primitive
Search-heavy text queries	Postgres FTS / Elasticsearch	Postgres for moderate volumes, Elastic when relevance ranking matters
Multi-region writes with eventual consistency	DynamoDB / Cassandra	Designed for it; Postgres is single-primary

Mistakes I keep seeing on review

Three patterns that keep showing up in code review and architecture sessions:

One: picking Mongo because the team thinks JSON-shaped data needs a JSON store. JSONB exists. Postgres handles documents. The actual question is whether you need cross-document transactions, cross-document joins, ad-hoc queries. If yes, Postgres wins. If no, Mongo can fit, but the choice should be based on the access pattern, not on the shape of the records.

Two: picking DynamoDB without an access pattern document. DynamoDB is great if you know the access patterns up front and design the partition key and sort key around them. Picking it for a workload where queries are still being discovered is asking for a year of pain. Postgres lets you be wrong about access patterns and fix it with an index later. DynamoDB does not.

Three: picking microservices with one database per service "because that is the pattern", then needing a join across services. The boundaries between services were not the boundaries between aggregates. The fix is to put those tables back in the same database, not to invent cross-database transactions. The data model should follow the consistency boundaries, not the deployment topology.

The default that has earned its keep

If I had to pick one database for a green-field product without knowing what the product would become, I would pick Postgres every time, and add Redis when I needed a cache. The combination covers maybe 90% of all workloads I have seen in the last decade. The remaining 10% need a specialty store; you reach for that store when you have evidence (a benchmark, a measured bottleneck, a query pattern that does not fit) that Postgres is the wrong tool for that specific workload.

The asymmetry is what makes the default sticky. The cost of being wrong about Postgres is small: add an index, add a JSONB column, scale up the box, throw a read replica behind a load balancer. The cost of being wrong about a NoSQL choice is much larger: redesign the data model around new access patterns, rewrite the access layer, migrate data while the system is live, retrain the team on a second operational stack. I have lived through both kinds of correction, and one of them is a Tuesday and the other is a quarter.

Open with the workload, not the database

Stop opening architecture calls with the database. Open with the workload. Two questions: what are the access patterns, and what guarantees do you need on write. If the conversation gets to the database before those two answers are clear, the conversation is happening in the wrong order. The database falls out of the answers. The answers do not fall out of the database. That is the frame I wish more teams started from, and the one that has saved me from a long string of regrettable choices on systems I had to maintain for years afterward.

Back to Articles