System Design Article

Database Indexing & Query Optimization

Difficulty: Easy

Indexes turn O(N) full-table scans into O(log N) lookups, but every index costs storage and slows writes. This lesson teaches how B-tree and hash indexes work, when to use composite or covering indexes, how to read an EXPLAIN plan, and the common indexing mistakes that cause production outages. By the end you can defend any indexing decision in an interview and diagnose a slow query in production.

System Design
/

Database Indexing & Query Optimization

Database Indexing & Query Optimization

Indexes turn O(N) full-table scans into O(log N) lookups, but every index costs storage and slows writes. This lesson teaches how B-tree and hash indexes work, when to use composite or covering indexes, how to read an EXPLAIN plan, and the common indexing mistakes that cause production outages. By the end you can defend any indexing decision in an interview and diagnose a slow query in production.

System Design
Easy
database-indexing
query-optimization
sql
btree
performance
database
system-design
beginner

332 views

10

Why Indexes Exist

Without an index, the database answers SELECT * FROM users WHERE email = '[email protected]' by scanning every row in the table. With one billion users, that is one billion comparisons - several minutes per query. With an index on email, the database does roughly log2(1_000_000_000) = 30 comparisons. That is the entire point of indexing: turn linear scans into logarithmic lookups.

Think of an index as the alphabetical thumb tabs on a paper dictionary. The dictionary itself is the table. Without tabs, you flip page by page. With tabs, you jump directly to 'M' and search a small range.

How a B-Tree Index Works

Most relational databases (Postgres, MySQL/InnoDB, SQL Server, Oracle) use a B+ tree as the default index structure. A B+ tree is a balanced tree where:

  • Each internal node holds many keys (typically 100-1000) so the tree stays shallow even for huge tables.
  • All actual data pointers live in the leaf nodes, which are linked together in a sorted doubly-linked list. That linked list is what makes range scans (WHERE created_at BETWEEN ... AND ...) cheap.
  • Trees stay balanced as you insert and delete: nodes split or merge to keep the height roughly log_B(N) where B is the branching factor.
Text
---------- B+ Tree on email column ----------
                       [ M ]
                  /            \
          [ E , H ]            [ R , T ]
         /    |    \          /    |    \
      [A,B] [F,G] [I,K]    [N,P] [S]    [U,Z]
        \____\____\____ leaf list ____/____/____/

A lookup for email = 'I...' walks the tree: root -> middle child -> leaf containing I,K. The actual table row is fetched via the row pointer stored at the leaf.

For a 1 billion row table with branching factor 100, the tree height is only 5. Five page reads per lookup, even at huge scale.

Clustered vs Secondary Indexes

  • A clustered index stores the table rows themselves in the leaves. There is one per table (because the data can only be sorted one way). MySQL's InnoDB makes the primary key the clustered index. Looking up by primary key returns the full row in one tree traversal.
  • A secondary (non-clustered) index stores only the indexed columns plus a pointer to the row (in InnoDB, that pointer is the primary key). Looking up by a secondary index requires two tree walks: one in the secondary index, then one in the clustered index to fetch the row. Postgres calls this a 'heap fetch' and it is one reason composite/covering indexes matter.

Other Index Types

B-tree is the default but not the only choice. Different workloads need different structures.

Index TypeBest ForEngines
B-treeEquality and range queries on ordered valuesPostgres, MySQL, SQL Server (default)
HashEquality only, fastest single-key lookupPostgres (HASH index), Redis, Memcached
BitmapLow-cardinality columns in OLAP/data-warehouse queriesOracle, ClickHouse, Druid
GIN / invertedJSON, arrays, full-text searchPostgres GIN, Elasticsearch, Solr
GiST / R-treeGeospatial, range typesPostgres GiST, PostGIS
LSM-treeWrite-heavy workloads, sorted keysCassandra, RocksDB, ScyllaDB

For a web app you will use B-tree 95% of the time. Reach for the others when you have a specific access pattern (full text, geo, time series).

Composite and Covering Indexes

Real queries filter on multiple columns. Composite indexes (also called multi-column indexes) speed those up.

Leftmost-Prefix Rule

A composite index (a, b, c) can serve queries that filter on a, (a, b), or (a, b, c). It cannot help a query that filters only on b or only on c. Always order the columns in the index from highest to lowest selectivity, with WHERE = columns before WHERE > columns and finally ORDER BY columns.

SQL
-- Composite index supporting (status, created_at)
CREATE INDEX idx_orders_status_created
    ON orders (status, created_at DESC);

-- Uses the index efficiently:
SELECT id FROM orders WHERE status = 'pending' ORDER BY created_at DESC LIMIT 50;

-- Can also use the index (just the leftmost column):
SELECT id FROM orders WHERE status = 'pending';

-- Cannot use the index efficiently (no leading column):
SELECT id FROM orders WHERE created_at > NOW() - INTERVAL '1 day';

Covering Indexes

If the index already contains every column the query needs (in WHERE, ORDER BY, and SELECT), the database can answer the query without ever touching the heap. This is called an index-only scan in Postgres or a covering index in MySQL/SQL Server.

SQL
-- Covering index: includes status + created_at + id (the columns we SELECT)
CREATE INDEX idx_orders_covering
    ON orders (status, created_at DESC) INCLUDE (id, total_cents);

-- This query is now an index-only scan: no heap fetch
SELECT id, total_cents
  FROM orders
 WHERE status = 'pending'
 ORDER BY created_at DESC
 LIMIT 50;

Covering indexes are a tail-latency superpower. They commonly cut p99 latency by 5-10x for hot read paths.

Reading an EXPLAIN Plan

Every serious DB engineer reads query plans. The shape of the plan tells you whether your index is doing its job.

SQL
EXPLAIN ANALYZE
SELECT id FROM orders WHERE status = 'pending' ORDER BY created_at DESC LIMIT 50;

A Postgres plan:

Text
---------- Postgres EXPLAIN sample ----------
Limit  (cost=0.43..18.20 rows=50 width=8) (actual time=0.05..0.42 rows=50 loops=1)
  ->  Index Scan using idx_orders_status_created on orders
        (cost=0.43..3554.21 rows=10000 width=8)
        (actual time=0.05..0.41 rows=50 loops=1)
        Index Cond: (status = 'pending')
Planning Time: 0.18 ms
Execution Time: 0.46 ms

Look for these signals:

  • Seq Scan on a large table -> probably missing or unused index.
  • Index Scan -> index is being used, but a heap fetch follows.
  • Index Only Scan -> covering index, no heap fetch (best case).
  • Bitmap Index Scan + Bitmap Heap Scan -> useful for combining multiple index conditions.
  • A huge gap between the planner's estimated rows and the actual rows -> stats are stale; run ANALYZE.

Programmatic example: comparing with and without index

import pg from 'pg';
const client = new pg.Client();
await client.connect();

async function explain(sql) {
    const res = await client.query(`EXPLAIN ANALYZE ${sql}`);
    return res.rows.map((r) => r['QUERY PLAN']).join('\n');
}

console.log(await explain(
    "SELECT id FROM orders WHERE status = 'pending' ORDER BY created_at DESC LIMIT 50"
));

When Indexes Hurt

Indexes are not free. Each index pays cost on every write because the database must update the index pages alongside the table.

CostEffect
Write amplificationAn INSERT into a table with 5 indexes does 1 + 5 = 6 page modifications.
StorageA non-trivial table can have indexes larger than the data itself.
Buffer pool pressureHot index pages compete with table pages for memory.
LockingIndex maintenance can serialize writes on highly contended pages.
Plan complexityToo many indexes confuses the planner; it may pick the wrong one.

A practical rule: add an index when a frequent query is slow, drop indexes that are never used. Postgres exposes usage stats in pg_stat_user_indexes; MySQL has sys.schema_unused_indexes.

Common Index-Killing Patterns

These are the patterns interviewers love to ask about. Memorize them.

1. Leading wildcard in LIKE

SQL
-- Cannot use a B-tree index on email
SELECT * FROM users WHERE email LIKE '%@gmail.com';

Fix: store the reversed string and search for LIKE 'moc.liamg@%', or use a trigram index (Postgres pg_trgm) for substring search.

2. Function on indexed column

SQL
-- Index on created_at is useless: the function hides the column
SELECT * FROM events WHERE DATE(created_at) = '2026-04-26';

Fix: rewrite as a range, which preserves index usage:

SQL
SELECT * FROM events
 WHERE created_at >= '2026-04-26'
   AND created_at <  '2026-04-27';

Or create an expression index: CREATE INDEX ON events (DATE(created_at));

3. Type mismatch

SQL
-- user_id is BIGINT; comparing to a string forces an implicit cast
SELECT * FROM orders WHERE user_id = '42';

Most engines drop the index in this case. Always pass parameters with the correct type from your driver.

4. Low-cardinality columns

A boolean is_active column has only two values. An index on it is rarely useful because the planner expects to scan half the table anyway. Use a partial index that only indexes rows matching a specific value:

SQL
CREATE INDEX idx_users_active_only
    ON users (created_at)
    WHERE is_active = true;

5. OR across columns

SQL
SELECT * FROM orders WHERE status = 'pending' OR user_id = 42;

The planner often cannot use two separate indexes for an OR. Either rewrite as UNION ALL, or create a multi-column index, or accept a sequential scan if the table is small.

Real-World Examples

How real systems implement this in production

PostgreSQL at Stripe

Stripe applies strict index discipline: every new query pattern triggers a discussion about which index will serve it, and unused indexes are dropped after monitoring pg_stat_user_indexes. They use GIN indexes on JSONB columns to query metadata without forcing a schema migration.

Trade-off: Carrying many indexes increases write amplification on hot tables. Stripe accepts the cost on read-heavy ledger tables and is more conservative on write-heavy event tables.

MySQL InnoDB at GitHub

GitHub stores billions of rows of issue and PR data in InnoDB. They use auto-increment primary keys so that inserts append to the rightmost B-tree leaf, keeping write throughput high.

Trade-off: Auto-increment primary keys leak ordering information and create a global hotspot on the rightmost page; for some workloads ULIDs or sharded sequences are a better compromise.

Cassandra LSM-Tree at Discord

Discord stores trillions of chat messages in Cassandra/ScyllaDB, which uses LSM-trees instead of B-trees. Writes are appended to a memtable and flushed to immutable SSTables, giving linear write scaling.

Trade-off: LSM-trees pay back the write cost at read time: a read might check multiple SSTables, and background compaction consumes I/O. Discord trades read complexity for predictable write throughput.

Quick Interview Phrases

Key terms to use in your answer

B-tree index
covering index
index-only scan
leftmost prefix rule
EXPLAIN ANALYZE
write amplification

Common Interview Questions

Questions you might be asked about this topic

A clustered index stores the actual table rows in the leaves of the B-tree, ordered by the index key. There can only be one per table. A secondary index stores only the indexed columns plus a pointer to the row, so a lookup by secondary index needs two tree walks unless the index covers all selected columns. InnoDB makes the primary key the clustered index; Postgres has no clustered index by default (heap storage).

Interview Tips

How to discuss this topic effectively

1

Always state the kind of scan you want: 'a covering index turns this into an index-only scan'. Naming the scan type signals you actually understand the engine.

2

When designing a query, decide the index in the same breath. Saying 'I'll add an index on (status, created_at DESC) so the WHERE filter and ORDER BY both come from the index' is much stronger than 'I'll add an index'.

3

Acknowledge the write trade-off. 'I'd add this index, but on a write-heavy table I'd benchmark insert latency before and after, since each index multiplies write amplification.'

4

Memorize the ESR rule for compound indexes: Equality columns first, Sort columns next, Range columns last. It applies to Postgres, MySQL, and MongoDB.

5

Show curiosity about query plans. Asking 'what does EXPLAIN show before vs after?' is a senior-engineer move that interviewers love.

Common Mistakes

Pitfalls to avoid in interviews

Indexing every column 'just in case'

Indexes slow down writes, consume storage, and compete for buffer pool memory. Add indexes that match real query patterns from the access log; drop the ones with zero usage in pg_stat_user_indexes.

Putting low-selectivity columns first in a composite index

An index on (is_active, created_at) is mostly useless because is_active has two values. Reverse the order or use a partial index: CREATE INDEX ... WHERE is_active = true.

Wrapping the indexed column in a function (DATE(created_at), LOWER(email))

Functions on indexed columns hide them from the planner. Either rewrite the predicate as a range, or create an expression index that matches the function call exactly.

Using random UUIDs as the primary key in InnoDB without thinking

InnoDB stores rows in primary-key order. Random UUIDs scatter inserts across the B-tree and tank write throughput. Use auto-increment IDs, ULIDs, or UUIDv7 (time-ordered) instead.

Assuming the database will pick the right index automatically

The optimizer can choose a worse plan when statistics are stale or when many indexes overlap. Always verify with EXPLAIN, run ANALYZE on changed tables, and consider planner hints (PG_HINT_PLAN, MySQL FORCE INDEX) only as a last resort.