System Design Article

SQL vs NoSQL - Choosing the Right Database

Difficulty: Easy

SQL vs NoSQL is the most common storage decision in system design interviews. SQL databases give you ACID guarantees, joins, and a fixed relational schema; NoSQL databases give you flexible schemas, horizontal scaling, and specialized data models. This lesson teaches you the four NoSQL families, the real engineering trade-offs, and a clear decision framework so you can defend your database choice in any interview.

SQL vs NoSQL - Choosing the Right Database

System Design

Easy

sql

nosql

database

acid

data-modeling

horizontal-scaling

system-design

beginner

525 views

What the SQL vs NoSQL Question Really Asks

When an interviewer asks 'SQL or NoSQL?', they are not asking which is better. They are asking whether you understand the trade-offs between a strict relational model with ACID guarantees and a flexible model that scales horizontally.

The wrong answer is 'NoSQL because it scales' or 'SQL because it is reliable'. Both statements are wrong. The right answer always sounds like:

'Given the access patterns (X), the consistency requirements (Y), and the expected scale (Z), I would choose [database] because [specific reason]. The main trade-off I am accepting is [specific trade-off].'

This lesson gives you the vocabulary and the framework to produce that answer for any workload.

SQL: The Relational Model

Core Idea

A SQL database stores data in tables with a fixed schema. Rows in different tables are connected through foreign keys and combined at query time using joins. The schema is enforced by the database, so every row in users must have the columns the schema declares.

SQL

CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT NOW()
);

CREATE TABLE orders (
    id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL REFERENCES users(id),
    total_cents INT NOT NULL,
    status VARCHAR(20) NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT NOW()
);

-- Get all orders for a user with email
SELECT o.id, o.total_cents, o.status
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE u.email = '[email protected]';

ACID: The Four Guarantees

SQL databases provide ACID transactions. These are the guarantees that make your money safe.

Letter	Meaning	What it gives you
A	Atomicity	All statements in a transaction commit together, or none do. No half-applied transfers.
C	Consistency	The database moves from one valid state to another. Constraints (foreign keys, unique) are always enforced.
I	Isolation	Concurrent transactions do not see each other's partial writes. Configurable via isolation levels.
D	Durability	Once a transaction commits, it survives crashes. Written to disk before the client gets the ack.

A classic ACID example is a bank transfer:

SQL

BEGIN;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

If the server crashes between the two UPDATE statements, atomicity guarantees neither happens. Money does not disappear.

Strengths of SQL

Strong consistency by default. The database is the source of truth.
Joins let you model relationships once and query them many ways. No need to denormalize.
Mature query optimizer. The database figures out the best execution plan; you just declare the result you want.
Schema enforcement catches application bugs at the storage layer.
30+ years of tooling - migrations, ORMs, BI tools, observability.

Weaknesses of SQL

Vertical scaling first. Adding read replicas is straightforward; sharding writes is painful and usually application-controlled.
Schema migrations on huge tables can lock writes (mitigated by tools like pt-online-schema-change or PostgreSQL's CONCURRENTLY operations, but still operationally heavy).
Joins across shards do not work. Once you shard a SQL database, you give up the convenience of JOIN.

Popular SQL Engines

PostgreSQL: extensible, JSON support, strong consistency, the default for most new projects.
MySQL / MariaDB: extremely popular, great read scaling, weaker default isolation.
CockroachDB / Spanner / YugabyteDB: 'NewSQL' - SQL with horizontal scaling and global consistency.
SQLite: embedded, single-file, perfect for edge and local-first apps.

NoSQL: Four Families

'NoSQL' is not one thing. It is an umbrella over four very different data models. You must know all four because the interviewer might ask you to pick between them.

1. Key-Value Stores

The simplest model: a giant distributed hash map. You store a value (any blob) under a key, and you can GET, SET, or DELETE by key. There are no queries by value.

JavaScript

Python

// Redis client
await redis.set('session:abc123', JSON.stringify({ userId: 42, exp: 1700000000 }), 'EX', 3600);
const session = JSON.parse(await redis.get('session:abc123'));

Engines: Redis, Memcached, DynamoDB (in pure KV mode), Riak.
Best for: caching, session storage, rate-limit counters, leaderboards.
Weakness: cannot query by anything except the key.

2. Document Stores

Store self-contained JSON-like documents. Each document has its own structure; the database does not enforce a schema. You can index any field and query by it.

Jsonc

{
  "_id": "user_42",
  "name": "Alice",
  "email": "[email protected]",
  "addresses": [
    { "label": "home", "city": "Paris" },
    { "label": "work", "city": "Lyon" }
  ],
  "prefs": { "theme": "dark", "lang": "en" }
}

Engines: MongoDB, Couchbase, Firestore, DynamoDB (document mode).
Best for: content with variable shape (user profiles with optional fields, product catalogs, CMS content).
Weakness: joins are weak or absent. Heavy reliance on denormalization, which makes updates hard.

3. Wide-Column Stores

Picture a table where each row can have a different set of columns, optimized for massive write throughput and range scans by primary key. Internally based on the LSM-tree storage engine.

Text

---------- Cassandra row example ----------
row key: user_42
  column: profile.name      = 'Alice'
  column: profile.email     = '[email protected]'
  column: events.2026-04-26 = '{"type":"login"}'
  column: events.2026-04-27 = '{"type":"purchase"}'

Engines: Cassandra, ScyllaDB, HBase, Bigtable.
Best for: time-series data, event logs, write-heavy workloads at petabyte scale.
Weakness: query patterns must be designed up front (you model around the queries, not the data). Ad-hoc queries are painful.

4. Graph Databases

First-class support for nodes and edges. Queries traverse relationships efficiently (find friends-of-friends in 2 hops).

Text

---------- Graph snippet ----------
(Alice:User) -[FOLLOWS]-> (Bob:User) -[FOLLOWS]-> (Carol:User)
(Alice:User) -[LIKES]-> (Post:p123) <-[POSTED]- (Bob:User)

Engines: Neo4j, Amazon Neptune, ArangoDB, JanusGraph.
Best for: social networks, fraud detection, recommendation engines, knowledge graphs.
Weakness: harder to scale horizontally; smaller ecosystem; many problems do not actually need graph traversal.

BASE: The NoSQL Counterpart to ACID

Most distributed NoSQL systems offer BASE semantics instead of ACID:

Basically Available - the system always responds, even during partial failures.
Soft state - the state may change without input as replicas converge.
Eventual consistency - all replicas converge to the same value, given enough time without new writes.

BASE is a deliberate trade: you give up immediate consistency in exchange for availability and horizontal scale. See the CAP Theorem lesson for why this trade-off is forced on distributed systems.

SQL vs NoSQL: The Decision Matrix

Dimension	SQL	NoSQL
Schema	Fixed, enforced	Flexible, app-enforced
Joins	First-class	Weak or absent
Transactions	Multi-row ACID	Often single-row only
Consistency	Strong by default	Tunable, often eventual
Scaling	Vertical first, then read replicas, then shard manually	Horizontal sharding built in
Query language	Standard SQL	Vendor-specific (CQL, MQL, etc.)
Best for	Complex relationships, money, reporting	Variable schema, massive scale, simple access patterns

A Decision Framework

Ask these questions in order. Stop at the first 'yes' that strongly signals one side.

Do you need multi-row ACID transactions across related entities? -> SQL. (Payments, inventory reservations, double-entry ledgers.)
Are your access patterns mostly key lookups at very high QPS? -> Key-value store. (Sessions, rate limiters, caches.)
Is your data shape highly variable or evolving rapidly? -> Document store. (User-generated content, CMS, product catalogs with diverse attributes.)
Are you ingesting massive write volume that one machine cannot handle? -> Wide-column. (IoT telemetry, event logs, time series, audit trails.)
Are your queries primarily about traversing relationships? -> Graph. (Social graphs, recommendation, fraud rings.)
None of the above strongly applies? -> Default to PostgreSQL. It is rarely the wrong call for a new system.

Polyglot Persistence: Use Both

Real systems usually combine multiple databases. Each store is chosen for the workload it handles best.

Text

---------- Typical polyglot stack ----------
PostgreSQL      -> users, orders, payments       (ACID, source of truth)
Redis           -> sessions, rate limits, cache  (low-latency KV)
Elasticsearch   -> full-text search index        (inverted index)
S3 + DynamoDB   -> uploaded files + metadata     (blob + KV)
Kafka           -> event log between services    (durable queue)

Real-World Examples

How real systems implement this in production

Stripe (PostgreSQL)

Stripe runs its core payment ledger on PostgreSQL because every transfer needs multi-row ACID and durable audit trails. They scale by sharding at the application layer rather than switching to NoSQL.

Trade-off: Sharded Postgres costs a lot of engineering effort to maintain (custom routing, cross-shard reports), but the team keeps SQL semantics, joins, and ACID for money flows.

Amazon DynamoDB

Amazon's shopping cart and many internal services run on DynamoDB because it offers single-digit-millisecond key lookups at any scale with built-in multi-region replication.

Trade-off: DynamoDB requires you to design around access patterns up front. Ad-hoc queries and joins are not feasible without a secondary indexing layer (DAX, OpenSearch, or a stream-driven materialized view).

Discord (Cassandra to ScyllaDB)

Discord stores trillions of chat messages in a wide-column store. They migrated from MongoDB (which struggled with hot collections) to Cassandra, then again to ScyllaDB for lower tail latency.

Trade-off: Wide-column stores deliver linear write scale but force you to model schema around the queries you will run. Two database migrations were needed because workload reality outgrew earlier choices.

Quick Interview Phrases

Key terms to use in your answer

ACID vs BASE

polyglot persistence

denormalize at write time

horizontal sharding

OLTP versus OLAP

schema-on-write vs schema-on-read

Common Interview Questions

Questions you might be asked about this topic

Design a URL shortener - what database would you pick and why?

Read-heavy workload with simple key (short code) -> long URL lookups. A key-value store like Redis (with persistence) or DynamoDB is ideal for sub-millisecond lookups. The metadata (creator, click count, created_at) can sit alongside the value. Mention you would still want a relational store if you needed user accounts, billing, and analytics queries.

We have a social network feed. Should we use SQL or NoSQL for posts?

Your team is migrating from PostgreSQL to MongoDB to scale. Convince me that's the wrong move.

How would you store and query 10 billion IoT sensor readings per day?

When would you use a graph database over a relational database?

Interview Tips

How to discuss this topic effectively

Always tie your database choice to specific access patterns. 'I'd use Postgres because it's relational' is weak. 'I'd use Postgres because the order/payment flow needs multi-row ACID and joins to compute totals' is strong.

Default to PostgreSQL for the primary store unless you have a concrete reason not to. Interviewers see this as a sign of engineering maturity, not a lack of NoSQL knowledge.

When you do propose NoSQL, name the family (key-value, document, wide-column, graph) and the specific engine. 'We'd use Cassandra for the message store because writes are append-mostly and we need to scale linearly' beats 'we'd use NoSQL'.

Mention polyglot persistence early. Real systems use multiple databases. Saying 'Postgres for the source of truth, Redis for sessions, S3 for blobs, Elasticsearch for search' shows production experience.

Acknowledge the migration tax. If asked 'could you swap stores later?', say 'yes, but it's a multi-quarter project, so we want to get the primary store right the first time'.

Common Mistakes

Pitfalls to avoid in interviews

Saying 'NoSQL scales better than SQL' as a blanket statement

Modern SQL databases scale to billions of rows with read replicas, partitioning, and Citus or Vitess for sharding. NewSQL systems like CockroachDB and Spanner give you SQL semantics with horizontal scaling. Scaling is about access patterns and operational maturity, not the SQL/NoSQL label.

Treating MongoDB and Cassandra as interchangeable NoSQL options

MongoDB is a document store optimized for flexible schemas and rich queries. Cassandra is a wide-column store optimized for write-heavy linear scaling with predetermined access patterns. They solve completely different problems. Pick the family first, then the engine.

Picking NoSQL to 'avoid schema migrations'

NoSQL does not eliminate schema; it pushes schema enforcement into the application. You still need migration code to reshape documents when fields evolve. Postgres with a `JSONB` column gives you the same flexibility without giving up joins or constraints.

Thinking ACID and BASE are mutually exclusive

Many NoSQL stores now offer ACID transactions (MongoDB multi-document transactions, DynamoDB transactions) and many SQL stores expose tunable consistency (Postgres read replicas can be eventually consistent). The line is blurred. Always state the consistency mode you actually need.

Forgetting that joins disappear once you shard

A sharded relational database loses easy cross-shard joins, which is the main reason to use joins in the first place. If you know you must shard from day one, plan to denormalize and accept that part of the SQL value proposition is gone.

Back to System Design