I have been on too many architecture review calls that opened with someone announcing "we should use Mongo because our data is unstructured" and ended with that same person shipping a relational schema with five JSON columns. The conversation is almost never about data structure. It is about transactional guarantees, query patterns, write throughput, and operational familiarity. SQL vs NoSQL is the wrong frame because the choice that matters is which guarantees you need, not which data model you ship on top of them.
This article is the question I now lead with on those calls, the one I want every team asking before they pick a data store: what does my workload actually need, and which database engine matches it? The answer might still be "Postgres". It might be "Postgres plus Redis". It might be "DynamoDB". But it will not be "Mongo because our data is unstructured".
The two questions that decide the database
I distill it down to two questions when I help a team pick.
Question one: what are your access patterns? Do you read by primary key only, or do you need to filter, join, aggregate, sort? Do reads come in predictable shapes (one endpoint, one query) or arbitrary shapes (analytics, ad-hoc filtering)? Are queries point lookups, range scans, or full-table aggregations?
Question two: what guarantees do you need on write? Strong consistency across rows, or eventual? Multi-row transactions, or single-row atomic? Synchronous durability, or fire and forget? What is your tolerated data-loss window if a node dies right now?
Those two questions narrow the choice more than any "SQL vs NoSQL" list ever did. The data model (rows vs documents vs wide columns) follows the answers; it does not lead them.
What "NoSQL" actually contains
The term "NoSQL" lumps together at least four very different categories. Pretending they are one is most of the confusion.
- Document stores (MongoDB, Couchbase, Firestore) hold JSON-like records, queryable by fields. Schema-less in name, schema-shaped in practice. Single-document atomicity is the strong guarantee; multi-document transactions exist but are an opt-in with a cost.
- Key-value stores (Redis, Memcached, RocksDB) take a key, return a value. Sometimes with TTL, lists, sets. No queries beyond key lookup. Tiny per-op latency.
- Wide-column stores (Cassandra, ScyllaDB, HBase) are designed for write-heavy workloads with predictable access patterns. You denormalize hard, you query by partition key, you trade query flexibility for horizontal write scalability.
- Graph databases (Neo4j, Neptune) optimize for relationship traversal. If your workload is "find all friends-of-friends within three hops", the relational answer is painful and the graph answer is one query.
The first time I saw someone reach for Cassandra to replace a relational schema with multi-row joins, I knew they had made the call from a slide deck and not from their workload. Cassandra does not do joins. It does not do ad-hoc queries. It does write throughput, partitioned by a key you pick in advance, and it does that better than almost anything else. Picking it for the wrong reasons produces a system that is both expensive and constrained.
Where SQL still wins, by default
I default to a relational database (Postgres, specifically) for almost every new system. The reasons stack up faster than people think.
Multi-row ACID transactions. Most product features touch more than one row. Creating an order writes the order, decrements the inventory, inserts a charge. Doing this atomically in Postgres is one BEGIN/COMMIT. Doing it in a document store typically means either wrapping the writes in the document store's multi-document transaction (slower, with restrictions) or designing your way around the absence (idempotency keys, sagas, compensating transactions). The cost of designing around the absence is real engineering time.
Ad-hoc queries. When the product team asks "how many users signed up last month and converted to paid within seven days", you write that as a SQL query and run it. In a document store, the same question requires either a pre-computed aggregation pipeline or a full scan that reads every document. Postgres can answer the question in seconds the first time you ask it, without preparation.
The query planner. Postgres's planner reasons about your data, your indexes, and your statistics, and picks an execution plan. The plan adapts as your data grows. NoSQL datastores typically push that responsibility to the developer: you choose the access path when you design the schema, and changing access patterns means redesigning the data model.
Operational maturity. Postgres has 25+ years of production scars. Replication, backup, failover, monitoring, query tuning, all the operational tooling exists, is well documented, and works. NoSQL operational tooling has gotten much better, but "my Mongo cluster needs a major version upgrade with zero downtime" is still a more interesting weekend than "my Postgres needs a major version upgrade".
Where SQL pushes back
I do not pretend SQL wins everything. Specific workloads where I reach for a non-SQL store:
Caching. Redis is not a primary store; it is a cache and a coordination service. Anything I would query with sub-millisecond latency, where the source of truth lives elsewhere, lives in Redis. Sessions, rate limit counters, leaderboards, deduplication keys, ephemeral state.
High-write time-series workloads. Logs, metrics, IoT events, click streams. The write pattern is append-only, the read pattern is range scans by time, the volume is huge. Postgres can handle a lot here, especially with TimescaleDB, but at very high ingest rates a purpose-built time-series store (InfluxDB, ClickHouse) earns its keep.
Schema flexibility for genuinely heterogeneous data. If you are storing user-submitted forms with no consistent shape, where each tenant has their own field set and you cannot control or evolve those shapes, a document store is a reasonable fit. Postgres has JSONB, which covers a lot of this case, but truly schema-less use cases (CMS-style content, varied integrations) are a fair Mongo or DynamoDB use.
Massive horizontal write scale. When you genuinely need to write a million rows per second across multiple regions, Cassandra-style wide-column stores win. The number of products that genuinely need that level is small, but they exist (telemetry, ad serving, very large social networks).
The JSONB middle ground
The single biggest reason I have stopped reaching for Mongo as often as I used to is JSONB in Postgres. JSONB is a binary JSON column type with indexable paths, GIN indexes for full-document search, and full SQL access. You get document storage when you need it, with relational guarantees and SQL queries surrounding it.
This shape covers most workloads where teams reach for "document database". Per-tenant custom fields, varied event payloads, configuration objects with unknown shapes. You get the relational joins and the ACID transactions and the SQL queries. You also get JSONB without giving up indexes or the planner. The combination is hard to match elsewhere.
The decision table I actually use
Distilled to a small table:
| Workload | My pick | Why |
|---|---|---|
| Standard product CRUD with relations | Postgres | Joins, transactions, planner, JSONB for the soft parts |
| Cache, sessions, ephemeral state | Redis | Sub-ms latency, TTL, structures |
| Logs, metrics, time-series | ClickHouse / Timescale | Write-optimized, range-scan queries, columnar compression |
| Single-region high-write key-value | DynamoDB / Postgres + partitioning | Horizontal write scale; managed scaling on Dynamo |
| Heavy graph traversals | Neo4j | Relationship traversal as a primitive |
| Search-heavy text queries | Postgres FTS / Elasticsearch | Postgres for moderate volumes, Elastic when relevance ranking matters |
| Multi-region writes with eventual consistency | DynamoDB / Cassandra | Designed for it; Postgres is single-primary |
Mistakes I keep seeing on review
Three patterns that keep showing up in code review and architecture sessions:
One: picking Mongo because the team thinks JSON-shaped data needs a JSON store. JSONB exists. Postgres handles documents. The actual question is whether you need cross-document transactions, cross-document joins, ad-hoc queries. If yes, Postgres wins. If no, Mongo can fit, but the choice should be based on the access pattern, not on the shape of the records.
Two: picking DynamoDB without an access pattern document. DynamoDB is great if you know the access patterns up front and design the partition key and sort key around them. Picking it for a workload where queries are still being discovered is asking for a year of pain. Postgres lets you be wrong about access patterns and fix it with an index later. DynamoDB does not.
Three: picking microservices with one database per service "because that is the pattern", then needing a join across services. The boundaries between services were not the boundaries between aggregates. The fix is to put those tables back in the same database, not to invent cross-database transactions. The data model should follow the consistency boundaries, not the deployment topology.
The default that has earned its keep
If I had to pick one database for a green-field product without knowing what the product would become, I would pick Postgres every time, and add Redis when I needed a cache. The combination covers maybe 90% of all workloads I have seen in the last decade. The remaining 10% need a specialty store; you reach for that store when you have evidence (a benchmark, a measured bottleneck, a query pattern that does not fit) that Postgres is the wrong tool for that specific workload.
The asymmetry is what makes the default sticky. The cost of being wrong about Postgres is small: add an index, add a JSONB column, scale up the box, throw a read replica behind a load balancer. The cost of being wrong about a NoSQL choice is much larger: redesign the data model around new access patterns, rewrite the access layer, migrate data while the system is live, retrain the team on a second operational stack. I have lived through both kinds of correction, and one of them is a Tuesday and the other is a quarter.
Open with the workload, not the database
Stop opening architecture calls with the database. Open with the workload. Two questions: what are the access patterns, and what guarantees do you need on write. If the conversation gets to the database before those two answers are clear, the conversation is happening in the wrong order. The database falls out of the answers. The answers do not fall out of the database. That is the frame I wish more teams started from, and the one that has saved me from a long string of regrettable choices on systems I had to maintain for years afterward.
