System Design Article
Database Indexing & Query Optimization
Difficulty: Easy
Indexes turn O(N) full-table scans into O(log N) lookups, but every index costs storage and slows writes. This lesson teaches how B-tree and hash indexes work, when to use composite or covering indexes, how to read an EXPLAIN plan, and the common indexing mistakes that cause production outages. By the end you can defend any indexing decision in an interview and diagnose a slow query in production.
Database Indexing & Query Optimization
Indexes turn O(N) full-table scans into O(log N) lookups, but every index costs storage and slows writes. This lesson teaches how B-tree and hash indexes work, when to use composite or covering indexes, how to read an EXPLAIN plan, and the common indexing mistakes that cause production outages. By the end you can defend any indexing decision in an interview and diagnose a slow query in production.
332 views
10
Why Indexes Exist
Without an index, the database answers SELECT * FROM users WHERE email = '[email protected]' by scanning every row in the table. With one billion users, that is one billion comparisons - several minutes per query. With an index on email, the database does roughly log2(1_000_000_000) = 30 comparisons. That is the entire point of indexing: turn linear scans into logarithmic lookups.
Think of an index as the alphabetical thumb tabs on a paper dictionary. The dictionary itself is the table. Without tabs, you flip page by page. With tabs, you jump directly to 'M' and search a small range.
How a B-Tree Index Works
Most relational databases (Postgres, MySQL/InnoDB, SQL Server, Oracle) use a B+ tree as the default index structure. A B+ tree is a balanced tree where:
- Each internal node holds many keys (typically 100-1000) so the tree stays shallow even for huge tables.
- All actual data pointers live in the leaf nodes, which are linked together in a sorted doubly-linked list. That linked list is what makes range scans (
WHERE created_at BETWEEN ... AND ...) cheap. - Trees stay balanced as you insert and delete: nodes split or merge to keep the height roughly
log_B(N)where B is the branching factor.
---------- B+ Tree on email column ----------
[ M ]
/ \
[ E , H ] [ R , T ]
/ | \ / | \
[A,B] [F,G] [I,K] [N,P] [S] [U,Z]
\____\____\____ leaf list ____/____/____/A lookup for email = 'I...' walks the tree: root -> middle child -> leaf containing I,K. The actual table row is fetched via the row pointer stored at the leaf.
For a 1 billion row table with branching factor 100, the tree height is only 5. Five page reads per lookup, even at huge scale.
Clustered vs Secondary Indexes
- A clustered index stores the table rows themselves in the leaves. There is one per table (because the data can only be sorted one way). MySQL's InnoDB makes the primary key the clustered index. Looking up by primary key returns the full row in one tree traversal.
- A secondary (non-clustered) index stores only the indexed columns plus a pointer to the row (in InnoDB, that pointer is the primary key). Looking up by a secondary index requires two tree walks: one in the secondary index, then one in the clustered index to fetch the row. Postgres calls this a 'heap fetch' and it is one reason composite/covering indexes matter.
Other Index Types
B-tree is the default but not the only choice. Different workloads need different structures.
| Index Type | Best For | Engines |
|---|---|---|
| B-tree | Equality and range queries on ordered values | Postgres, MySQL, SQL Server (default) |
| Hash | Equality only, fastest single-key lookup | Postgres (HASH index), Redis, Memcached |
| Bitmap | Low-cardinality columns in OLAP/data-warehouse queries | Oracle, ClickHouse, Druid |
| GIN / inverted | JSON, arrays, full-text search | Postgres GIN, Elasticsearch, Solr |
| GiST / R-tree | Geospatial, range types | Postgres GiST, PostGIS |
| LSM-tree | Write-heavy workloads, sorted keys | Cassandra, RocksDB, ScyllaDB |
For a web app you will use B-tree 95% of the time. Reach for the others when you have a specific access pattern (full text, geo, time series).
Composite and Covering Indexes
Real queries filter on multiple columns. Composite indexes (also called multi-column indexes) speed those up.
Leftmost-Prefix Rule
A composite index (a, b, c) can serve queries that filter on a, (a, b), or (a, b, c). It cannot help a query that filters only on b or only on c. Always order the columns in the index from highest to lowest selectivity, with WHERE = columns before WHERE > columns and finally ORDER BY columns.
-- Composite index supporting (status, created_at)
CREATE INDEX idx_orders_status_created
ON orders (status, created_at DESC);
-- Uses the index efficiently:
SELECT id FROM orders WHERE status = 'pending' ORDER BY created_at DESC LIMIT 50;
-- Can also use the index (just the leftmost column):
SELECT id FROM orders WHERE status = 'pending';
-- Cannot use the index efficiently (no leading column):
SELECT id FROM orders WHERE created_at > NOW() - INTERVAL '1 day';Covering Indexes
If the index already contains every column the query needs (in WHERE, ORDER BY, and SELECT), the database can answer the query without ever touching the heap. This is called an index-only scan in Postgres or a covering index in MySQL/SQL Server.
-- Covering index: includes status + created_at + id (the columns we SELECT)
CREATE INDEX idx_orders_covering
ON orders (status, created_at DESC) INCLUDE (id, total_cents);
-- This query is now an index-only scan: no heap fetch
SELECT id, total_cents
FROM orders
WHERE status = 'pending'
ORDER BY created_at DESC
LIMIT 50;Covering indexes are a tail-latency superpower. They commonly cut p99 latency by 5-10x for hot read paths.
Reading an EXPLAIN Plan
Every serious DB engineer reads query plans. The shape of the plan tells you whether your index is doing its job.
EXPLAIN ANALYZE
SELECT id FROM orders WHERE status = 'pending' ORDER BY created_at DESC LIMIT 50;A Postgres plan:
---------- Postgres EXPLAIN sample ----------
Limit (cost=0.43..18.20 rows=50 width=8) (actual time=0.05..0.42 rows=50 loops=1)
-> Index Scan using idx_orders_status_created on orders
(cost=0.43..3554.21 rows=10000 width=8)
(actual time=0.05..0.41 rows=50 loops=1)
Index Cond: (status = 'pending')
Planning Time: 0.18 ms
Execution Time: 0.46 msLook for these signals:
Seq Scanon a large table -> probably missing or unused index.Index Scan-> index is being used, but a heap fetch follows.Index Only Scan-> covering index, no heap fetch (best case).Bitmap Index Scan+Bitmap Heap Scan-> useful for combining multiple index conditions.- A huge gap between the planner's estimated
rowsand the actualrows-> stats are stale; runANALYZE.
Programmatic example: comparing with and without index
import pg from 'pg';
const client = new pg.Client();
await client.connect();
async function explain(sql) {
const res = await client.query(`EXPLAIN ANALYZE ${sql}`);
return res.rows.map((r) => r['QUERY PLAN']).join('\n');
}
console.log(await explain(
"SELECT id FROM orders WHERE status = 'pending' ORDER BY created_at DESC LIMIT 50"
));When Indexes Hurt
Indexes are not free. Each index pays cost on every write because the database must update the index pages alongside the table.
| Cost | Effect |
|---|---|
| Write amplification | An INSERT into a table with 5 indexes does 1 + 5 = 6 page modifications. |
| Storage | A non-trivial table can have indexes larger than the data itself. |
| Buffer pool pressure | Hot index pages compete with table pages for memory. |
| Locking | Index maintenance can serialize writes on highly contended pages. |
| Plan complexity | Too many indexes confuses the planner; it may pick the wrong one. |
A practical rule: add an index when a frequent query is slow, drop indexes that are never used. Postgres exposes usage stats in pg_stat_user_indexes; MySQL has sys.schema_unused_indexes.
Common Index-Killing Patterns
These are the patterns interviewers love to ask about. Memorize them.
1. Leading wildcard in LIKE
-- Cannot use a B-tree index on email
SELECT * FROM users WHERE email LIKE '%@gmail.com';Fix: store the reversed string and search for LIKE 'moc.liamg@%', or use a trigram index (Postgres pg_trgm) for substring search.
2. Function on indexed column
-- Index on created_at is useless: the function hides the column
SELECT * FROM events WHERE DATE(created_at) = '2026-04-26';Fix: rewrite as a range, which preserves index usage:
SELECT * FROM events
WHERE created_at >= '2026-04-26'
AND created_at < '2026-04-27';Or create an expression index: CREATE INDEX ON events (DATE(created_at));
3. Type mismatch
-- user_id is BIGINT; comparing to a string forces an implicit cast
SELECT * FROM orders WHERE user_id = '42';Most engines drop the index in this case. Always pass parameters with the correct type from your driver.
4. Low-cardinality columns
A boolean is_active column has only two values. An index on it is rarely useful because the planner expects to scan half the table anyway. Use a partial index that only indexes rows matching a specific value:
CREATE INDEX idx_users_active_only
ON users (created_at)
WHERE is_active = true;5. OR across columns
SELECT * FROM orders WHERE status = 'pending' OR user_id = 42;The planner often cannot use two separate indexes for an OR. Either rewrite as UNION ALL, or create a multi-column index, or accept a sequential scan if the table is small.
Real-World Examples
How real systems implement this in production
Stripe applies strict index discipline: every new query pattern triggers a discussion about which index will serve it, and unused indexes are dropped after monitoring pg_stat_user_indexes. They use GIN indexes on JSONB columns to query metadata without forcing a schema migration.
Trade-off: Carrying many indexes increases write amplification on hot tables. Stripe accepts the cost on read-heavy ledger tables and is more conservative on write-heavy event tables.
GitHub stores billions of rows of issue and PR data in InnoDB. They use auto-increment primary keys so that inserts append to the rightmost B-tree leaf, keeping write throughput high.
Trade-off: Auto-increment primary keys leak ordering information and create a global hotspot on the rightmost page; for some workloads ULIDs or sharded sequences are a better compromise.
Discord stores trillions of chat messages in Cassandra/ScyllaDB, which uses LSM-trees instead of B-trees. Writes are appended to a memtable and flushed to immutable SSTables, giving linear write scaling.
Trade-off: LSM-trees pay back the write cost at read time: a read might check multiple SSTables, and background compaction consumes I/O. Discord trades read complexity for predictable write throughput.
Quick Interview Phrases
Key terms to use in your answer
Common Interview Questions
Questions you might be asked about this topic
A clustered index stores the actual table rows in the leaves of the B-tree, ordered by the index key. There can only be one per table. A secondary index stores only the indexed columns plus a pointer to the row, so a lookup by secondary index needs two tree walks unless the index covers all selected columns. InnoDB makes the primary key the clustered index; Postgres has no clustered index by default (heap storage).
A composite index on (user_id, created_at DESC) handles both the filter and the sort with no separate sort step. Two single indexes force the planner to merge or pick one, leaving a sort step. A covering index that also INCLUDEs the SELECT columns lets the engine skip the heap fetch entirely. Pick the composite first; promote to covering if profiling shows the heap fetch dominates.
Only when the workload is pure equality lookups, never ranges, never ORDER BY. Postgres hash indexes are slightly faster than B-tree for equality and slightly smaller, but they cannot serve ranges or sorts. In practice we almost always default to B-tree because tomorrow's query might add a range, and the gap is small.
Identify the slow query from logs or APM, run EXPLAIN ANALYZE, confirm there is no usable index, choose columns matching the WHERE/ORDER BY/SELECT (ESR rule), benchmark on a copy of production, deploy with monitoring on write latency, and revisit pg_stat_user_indexes after a week to confirm it is being used.
Every INSERT/UPDATE/DELETE must update the table and every index that references the changed columns. With five indexes you do six page modifications per write. On contended pages this also increases lock waits. For very write-heavy tables, prefer fewer indexes, use partial indexes to limit work, or move to an LSM-based store like Cassandra.
Interview Tips
How to discuss this topic effectively
Always state the kind of scan you want: 'a covering index turns this into an index-only scan'. Naming the scan type signals you actually understand the engine.
When designing a query, decide the index in the same breath. Saying 'I'll add an index on (status, created_at DESC) so the WHERE filter and ORDER BY both come from the index' is much stronger than 'I'll add an index'.
Acknowledge the write trade-off. 'I'd add this index, but on a write-heavy table I'd benchmark insert latency before and after, since each index multiplies write amplification.'
Memorize the ESR rule for compound indexes: Equality columns first, Sort columns next, Range columns last. It applies to Postgres, MySQL, and MongoDB.
Show curiosity about query plans. Asking 'what does EXPLAIN show before vs after?' is a senior-engineer move that interviewers love.
Common Mistakes
Pitfalls to avoid in interviews
Indexing every column 'just in case'
Indexes slow down writes, consume storage, and compete for buffer pool memory. Add indexes that match real query patterns from the access log; drop the ones with zero usage in pg_stat_user_indexes.
Putting low-selectivity columns first in a composite index
An index on (is_active, created_at) is mostly useless because is_active has two values. Reverse the order or use a partial index: CREATE INDEX ... WHERE is_active = true.
Wrapping the indexed column in a function (DATE(created_at), LOWER(email))
Functions on indexed columns hide them from the planner. Either rewrite the predicate as a range, or create an expression index that matches the function call exactly.
Using random UUIDs as the primary key in InnoDB without thinking
InnoDB stores rows in primary-key order. Random UUIDs scatter inserts across the B-tree and tank write throughput. Use auto-increment IDs, ULIDs, or UUIDv7 (time-ordered) instead.
Assuming the database will pick the right index automatically
The optimizer can choose a worse plan when statistics are stale or when many indexes overlap. Always verify with EXPLAIN, run ANALYZE on changed tables, and consider planner hints (PG_HINT_PLAN, MySQL FORCE INDEX) only as a last resort.
