System Design Article
SQL vs NoSQL - Choosing the Right Database
Difficulty: Easy
SQL vs NoSQL is the most common storage decision in system design interviews. SQL databases give you ACID guarantees, joins, and a fixed relational schema; NoSQL databases give you flexible schemas, horizontal scaling, and specialized data models. This lesson teaches you the four NoSQL families, the real engineering trade-offs, and a clear decision framework so you can defend your database choice in any interview.
SQL vs NoSQL - Choosing the Right Database
SQL vs NoSQL is the most common storage decision in system design interviews. SQL databases give you ACID guarantees, joins, and a fixed relational schema; NoSQL databases give you flexible schemas, horizontal scaling, and specialized data models. This lesson teaches you the four NoSQL families, the real engineering trade-offs, and a clear decision framework so you can defend your database choice in any interview.
525 views
15
What the SQL vs NoSQL Question Really Asks
When an interviewer asks 'SQL or NoSQL?', they are not asking which is better. They are asking whether you understand the trade-offs between a strict relational model with ACID guarantees and a flexible model that scales horizontally.
The wrong answer is 'NoSQL because it scales' or 'SQL because it is reliable'. Both statements are wrong. The right answer always sounds like:
'Given the access patterns (X), the consistency requirements (Y), and the expected scale (Z), I would choose [database] because [specific reason]. The main trade-off I am accepting is [specific trade-off].'
This lesson gives you the vocabulary and the framework to produce that answer for any workload.
SQL: The Relational Model
Core Idea
A SQL database stores data in tables with a fixed schema. Rows in different tables are connected through foreign keys and combined at query time using joins. The schema is enforced by the database, so every row in users must have the columns the schema declares.
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id),
total_cents INT NOT NULL,
status VARCHAR(20) NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
-- Get all orders for a user with email
SELECT o.id, o.total_cents, o.status
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE u.email = '[email protected]';ACID: The Four Guarantees
SQL databases provide ACID transactions. These are the guarantees that make your money safe.
| Letter | Meaning | What it gives you |
|---|---|---|
| A | Atomicity | All statements in a transaction commit together, or none do. No half-applied transfers. |
| C | Consistency | The database moves from one valid state to another. Constraints (foreign keys, unique) are always enforced. |
| I | Isolation | Concurrent transactions do not see each other's partial writes. Configurable via isolation levels. |
| D | Durability | Once a transaction commits, it survives crashes. Written to disk before the client gets the ack. |
A classic ACID example is a bank transfer:
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;If the server crashes between the two UPDATE statements, atomicity guarantees neither happens. Money does not disappear.
Strengths of SQL
- Strong consistency by default. The database is the source of truth.
- Joins let you model relationships once and query them many ways. No need to denormalize.
- Mature query optimizer. The database figures out the best execution plan; you just declare the result you want.
- Schema enforcement catches application bugs at the storage layer.
- 30+ years of tooling - migrations, ORMs, BI tools, observability.
Weaknesses of SQL
- Vertical scaling first. Adding read replicas is straightforward; sharding writes is painful and usually application-controlled.
- Schema migrations on huge tables can lock writes (mitigated by tools like
pt-online-schema-changeor PostgreSQL'sCONCURRENTLYoperations, but still operationally heavy). - Joins across shards do not work. Once you shard a SQL database, you give up the convenience of
JOIN.
Popular SQL Engines
- PostgreSQL: extensible, JSON support, strong consistency, the default for most new projects.
- MySQL / MariaDB: extremely popular, great read scaling, weaker default isolation.
- CockroachDB / Spanner / YugabyteDB: 'NewSQL' - SQL with horizontal scaling and global consistency.
- SQLite: embedded, single-file, perfect for edge and local-first apps.
NoSQL: Four Families
'NoSQL' is not one thing. It is an umbrella over four very different data models. You must know all four because the interviewer might ask you to pick between them.
1. Key-Value Stores
The simplest model: a giant distributed hash map. You store a value (any blob) under a key, and you can GET, SET, or DELETE by key. There are no queries by value.
// Redis client
await redis.set('session:abc123', JSON.stringify({ userId: 42, exp: 1700000000 }), 'EX', 3600);
const session = JSON.parse(await redis.get('session:abc123'));- Engines: Redis, Memcached, DynamoDB (in pure KV mode), Riak.
- Best for: caching, session storage, rate-limit counters, leaderboards.
- Weakness: cannot query by anything except the key.
2. Document Stores
Store self-contained JSON-like documents. Each document has its own structure; the database does not enforce a schema. You can index any field and query by it.
{
"_id": "user_42",
"name": "Alice",
"email": "[email protected]",
"addresses": [
{ "label": "home", "city": "Paris" },
{ "label": "work", "city": "Lyon" }
],
"prefs": { "theme": "dark", "lang": "en" }
}- Engines: MongoDB, Couchbase, Firestore, DynamoDB (document mode).
- Best for: content with variable shape (user profiles with optional fields, product catalogs, CMS content).
- Weakness: joins are weak or absent. Heavy reliance on denormalization, which makes updates hard.
3. Wide-Column Stores
Picture a table where each row can have a different set of columns, optimized for massive write throughput and range scans by primary key. Internally based on the LSM-tree storage engine.
---------- Cassandra row example ----------
row key: user_42
column: profile.name = 'Alice'
column: profile.email = '[email protected]'
column: events.2026-04-26 = '{"type":"login"}'
column: events.2026-04-27 = '{"type":"purchase"}'- Engines: Cassandra, ScyllaDB, HBase, Bigtable.
- Best for: time-series data, event logs, write-heavy workloads at petabyte scale.
- Weakness: query patterns must be designed up front (you model around the queries, not the data). Ad-hoc queries are painful.
4. Graph Databases
First-class support for nodes and edges. Queries traverse relationships efficiently (find friends-of-friends in 2 hops).
---------- Graph snippet ----------
(Alice:User) -[FOLLOWS]-> (Bob:User) -[FOLLOWS]-> (Carol:User)
(Alice:User) -[LIKES]-> (Post:p123) <-[POSTED]- (Bob:User)- Engines: Neo4j, Amazon Neptune, ArangoDB, JanusGraph.
- Best for: social networks, fraud detection, recommendation engines, knowledge graphs.
- Weakness: harder to scale horizontally; smaller ecosystem; many problems do not actually need graph traversal.
BASE: The NoSQL Counterpart to ACID
Most distributed NoSQL systems offer BASE semantics instead of ACID:
- Basically Available - the system always responds, even during partial failures.
- Soft state - the state may change without input as replicas converge.
- Eventual consistency - all replicas converge to the same value, given enough time without new writes.
BASE is a deliberate trade: you give up immediate consistency in exchange for availability and horizontal scale. See the CAP Theorem lesson for why this trade-off is forced on distributed systems.
SQL vs NoSQL: The Decision Matrix
| Dimension | SQL | NoSQL |
|---|---|---|
| Schema | Fixed, enforced | Flexible, app-enforced |
| Joins | First-class | Weak or absent |
| Transactions | Multi-row ACID | Often single-row only |
| Consistency | Strong by default | Tunable, often eventual |
| Scaling | Vertical first, then read replicas, then shard manually | Horizontal sharding built in |
| Query language | Standard SQL | Vendor-specific (CQL, MQL, etc.) |
| Best for | Complex relationships, money, reporting | Variable schema, massive scale, simple access patterns |
A Decision Framework
Ask these questions in order. Stop at the first 'yes' that strongly signals one side.
- Do you need multi-row ACID transactions across related entities? -> SQL. (Payments, inventory reservations, double-entry ledgers.)
- Are your access patterns mostly key lookups at very high QPS? -> Key-value store. (Sessions, rate limiters, caches.)
- Is your data shape highly variable or evolving rapidly? -> Document store. (User-generated content, CMS, product catalogs with diverse attributes.)
- Are you ingesting massive write volume that one machine cannot handle? -> Wide-column. (IoT telemetry, event logs, time series, audit trails.)
- Are your queries primarily about traversing relationships? -> Graph. (Social graphs, recommendation, fraud rings.)
- None of the above strongly applies? -> Default to PostgreSQL. It is rarely the wrong call for a new system.
Polyglot Persistence: Use Both
Real systems usually combine multiple databases. Each store is chosen for the workload it handles best.
---------- Typical polyglot stack ----------
PostgreSQL -> users, orders, payments (ACID, source of truth)
Redis -> sessions, rate limits, cache (low-latency KV)
Elasticsearch -> full-text search index (inverted index)
S3 + DynamoDB -> uploaded files + metadata (blob + KV)
Kafka -> event log between services (durable queue)Real-World Examples
How real systems implement this in production
Stripe runs its core payment ledger on PostgreSQL because every transfer needs multi-row ACID and durable audit trails. They scale by sharding at the application layer rather than switching to NoSQL.
Trade-off: Sharded Postgres costs a lot of engineering effort to maintain (custom routing, cross-shard reports), but the team keeps SQL semantics, joins, and ACID for money flows.
Amazon's shopping cart and many internal services run on DynamoDB because it offers single-digit-millisecond key lookups at any scale with built-in multi-region replication.
Trade-off: DynamoDB requires you to design around access patterns up front. Ad-hoc queries and joins are not feasible without a secondary indexing layer (DAX, OpenSearch, or a stream-driven materialized view).
Discord stores trillions of chat messages in a wide-column store. They migrated from MongoDB (which struggled with hot collections) to Cassandra, then again to ScyllaDB for lower tail latency.
Trade-off: Wide-column stores deliver linear write scale but force you to model schema around the queries you will run. Two database migrations were needed because workload reality outgrew earlier choices.
Quick Interview Phrases
Key terms to use in your answer
Common Interview Questions
Questions you might be asked about this topic
Read-heavy workload with simple key (short code) -> long URL lookups. A key-value store like Redis (with persistence) or DynamoDB is ideal for sub-millisecond lookups. The metadata (creator, click count, created_at) can sit alongside the value. Mention you would still want a relational store if you needed user accounts, billing, and analytics queries.
Posts themselves are write-once and read-many, so a wide-column store like Cassandra works well for fan-out. The user/follower graph is better in a graph database or a sharded relational store with caching (like Meta's TAO). The newsfeed timeline is usually pre-computed and stored in Redis. So the answer is polyglot: Cassandra for posts, sharded MySQL or graph DB for the social graph, Redis for the feed.
Ask what specifically does not scale - is it write throughput, storage, or query latency? Postgres can scale to tens of TB with proper indexing, partitioning, and read replicas. Citus adds horizontal sharding while keeping SQL. MongoDB does not solve common Postgres scaling problems (slow queries, missing indexes) - it just hides them under different ones. The migration is a 6-12 month project that loses ACID, joins, and team familiarity. Suggest fixing the bottleneck first.
Wide-column store like Cassandra, ScyllaDB, or a purpose-built time-series database (InfluxDB, TimescaleDB). Partition key is sensor_id, clustering key is timestamp - reads become sequential within a partition. Use TTL to expire old data automatically. For dashboards, materialize hourly/daily rollups into a separate table. Mention that Postgres + TimescaleDB is a great middle-ground if the team already knows SQL.
When the dominant query is 'find all entities connected to X within N hops' and N is small but greater than 1. Examples: friend-of-friend recommendations, fraud ring detection, knowledge graph traversal, organizational hierarchies. For 1-hop relationships, a relational join is fine. For deep traversal at query time, a graph database avoids the explosion of recursive CTEs. Caveat: many 'graph problems' can be served by a sharded relational store with denormalized adjacency tables.
Interview Tips
How to discuss this topic effectively
Always tie your database choice to specific access patterns. 'I'd use Postgres because it's relational' is weak. 'I'd use Postgres because the order/payment flow needs multi-row ACID and joins to compute totals' is strong.
Default to PostgreSQL for the primary store unless you have a concrete reason not to. Interviewers see this as a sign of engineering maturity, not a lack of NoSQL knowledge.
When you do propose NoSQL, name the family (key-value, document, wide-column, graph) and the specific engine. 'We'd use Cassandra for the message store because writes are append-mostly and we need to scale linearly' beats 'we'd use NoSQL'.
Mention polyglot persistence early. Real systems use multiple databases. Saying 'Postgres for the source of truth, Redis for sessions, S3 for blobs, Elasticsearch for search' shows production experience.
Acknowledge the migration tax. If asked 'could you swap stores later?', say 'yes, but it's a multi-quarter project, so we want to get the primary store right the first time'.
Common Mistakes
Pitfalls to avoid in interviews
Saying 'NoSQL scales better than SQL' as a blanket statement
Modern SQL databases scale to billions of rows with read replicas, partitioning, and Citus or Vitess for sharding. NewSQL systems like CockroachDB and Spanner give you SQL semantics with horizontal scaling. Scaling is about access patterns and operational maturity, not the SQL/NoSQL label.
Treating MongoDB and Cassandra as interchangeable NoSQL options
MongoDB is a document store optimized for flexible schemas and rich queries. Cassandra is a wide-column store optimized for write-heavy linear scaling with predetermined access patterns. They solve completely different problems. Pick the family first, then the engine.
Picking NoSQL to 'avoid schema migrations'
NoSQL does not eliminate schema; it pushes schema enforcement into the application. You still need migration code to reshape documents when fields evolve. Postgres with a `JSONB` column gives you the same flexibility without giving up joins or constraints.
Thinking ACID and BASE are mutually exclusive
Many NoSQL stores now offer ACID transactions (MongoDB multi-document transactions, DynamoDB transactions) and many SQL stores expose tunable consistency (Postgres read replicas can be eventually consistent). The line is blurred. Always state the consistency mode you actually need.
Forgetting that joins disappear once you shard
A sharded relational database loses easy cross-shard joins, which is the main reason to use joins in the first place. If you know you must shard from day one, plan to denormalize and accept that part of the SQL value proposition is gone.
