Community Article

Database Indexes Explained with Real EXPLAIN Output

What an index actually is, how the planner picks one, and the EXPLAIN output I read every day. Postgres examples, real numbers, and the three indexing mistakes I keep finding in code review.

Database Indexes Explained with Real EXPLAIN Output

What an index actually is, how the planner picks one, and the EXPLAIN output I read every day. Postgres examples, real numbers, and the three indexing mistakes I keep finding in code review.

database
database-indexing
sql
query-optimization
performance
theowatanabe

By @theowatanabe

December 2, 2025

·

Updated May 20, 2026

1,171 views

24

4.4 (16)

I have given a version of this talk to maybe twelve teams over the years, and the first slide is always the same. It is the EXPLAIN output of a slow query they sent me, with one line highlighted, and the question "why is this a Seq Scan?". The conversation that follows usually reveals that the team knows indexes exist, knows they are good, and has no idea what the planner is actually doing. This article is the conversation, written down once, with the EXPLAIN output you can read tomorrow morning to fix your own queries.

The thesis up front, because I want my opinion to be visible: you do not really understand a slow query until you have read its EXPLAIN ANALYZE output and matched each line to a physical operation. Reasoning about indexes from the schema alone, without ever reading the planner's actual choices, is how teams end up with twenty redundant indexes, none of which the planner uses on the query that matters.

What an index physically is

A B-tree index in Postgres is a separate on-disk structure that stores (indexed_columns, row_pointer) pairs sorted by the indexed columns. "Row pointer" in Postgres is the ctid (a (page, offset) tuple). The tree depth is typically 3 to 4 levels for tables with millions of rows, so a lookup is a handful of disk reads instead of a full scan.

I find a tiny visual helps people more than the formal definition.

B-tree index over orders(user_id)
                    [user_id < 1000]
                   /                \
         [user_id < 500]      [user_id < 2000]
          /        \           /        \
    leaf pages with sorted (user_id, ctid) pairs:
    (1, p3o4) (1, p7o9) (2, p1o2) (3, p5o0) ...

The index is not the table. It is a sorted lookup structure that says "if you want rows with user_id = 42, here are their ctids; go to those pages of the heap to read them.". The heap is the actual table data on disk; the index is metadata pointing into it.

Two consequences fall out of this model that everything else builds on. First, an index makes lookups fast but slows down writes (every insert, update, or delete on the indexed column has to update the index too). Second, an index can answer queries without touching the heap at all, but only if every column the query needs is in the index. That second one is what "covering index" means and why it matters for read-heavy paths.

EXPLAIN output, line by line

Here is a real query I tuned recently. The table has 4 million rows; the user-facing endpoint was timing out at 3 seconds.

EXPLAIN ANALYZE
SELECT id, status, total_cents
FROM orders
WHERE user_id = 42
  AND created_at >= NOW() - INTERVAL '30 days'
ORDER BY created_at DESC
LIMIT 20;

Before I added an index, the plan looked like this:

Limit  (cost=89234.12..89234.17 rows=20 width=32)
       (actual time=2418.011..2418.019 rows=20 loops=1)
  ->  Sort  (cost=89234.12..89351.45 rows=46932 width=32)
             (actual time=2418.009..2418.013 rows=20 loops=1)
        Sort Key: created_at DESC
        Sort Method: top-N heapsort  Memory: 27kB
        ->  Seq Scan on orders  (cost=0..87501.00 rows=46932 width=32)
                              (actual time=0.214..2401.882 rows=51234 loops=1)
              Filter: ((user_id = 42) AND (created_at >= now() - '30 days'::interval))
              Rows Removed by Filter: 3948766
 Planning Time: 0.412 ms
 Execution Time: 2418.058 ms

Reading from the bottom up, which is how I always read an EXPLAIN tree: the Seq Scan reads the entire heap (4 million rows, dropping 3.9 million via filter), the Sort then sorts the 51,234 surviving rows by created_at, and the Limit takes the top 20. The execution time is 2418ms; the planning time is 0.4ms. The query is dominated by the seq scan.

After adding CREATE INDEX orders_user_created_idx ON orders(user_id, created_at DESC):

Limit  (cost=0.43..18.91 rows=20 width=32)
       (actual time=0.041..0.213 rows=20 loops=1)
  ->  Index Scan using orders_user_created_idx on orders
        (cost=0.43..43398.20 rows=46932 width=32)
        (actual time=0.040..0.207 rows=20 loops=1)
        Index Cond: ((user_id = 42)
                    AND (created_at >= now() - '30 days'::interval))
 Planning Time: 0.521 ms
 Execution Time: 0.245 ms

From 2418ms to 0.245ms. Roughly a 10,000x speedup, on a query that drove our p99 latency. Three things changed:

  1. Seq Scan became Index Scan. The planner found an index whose leading column matches user_id = 42, used it to seek directly into the small slice of rows for that user, then walked the second index column (created_at) in descending order.
  2. The Sort step disappeared entirely. Because the index is (user_id, created_at DESC), the rows come out of the index already in the desired order. The planner does not need an explicit sort.
  3. The Index Cond covers both filter conditions. Both user_id = 42 and created_at >= ... are evaluated as part of the index seek, not as a post-filter. The planner only reads the rows that match.

The single most useful EXPLAIN reading skill

If I had to teach one thing, it would be the difference between Index Cond, Filter, and Recheck Cond in the EXPLAIN output. They are not the same.

Index Cond means the condition was used to traverse the index. The planner only visits rows that match. This is fast.

Filter means the condition is applied after the rows are pulled from the heap. The planner reads the row, then checks the condition. If most rows fail the filter, you are doing a lot of useless I/O.

Recheck Cond appears in bitmap scans. The bitmap may be lossy (block-level, not row-level), so each row is rechecked after fetch. Usually this is fine; it becomes a problem only when the rechecked condition is expensive.

When you see Filter with a high Rows Removed by Filter value, that is your sign. The planner is reading too many rows from the heap and discarding most of them. Either the index does not cover the filter condition, or the planner did not pick the right index.

Composite index column order matters more than people think

An index on (user_id, created_at) and one on (created_at, user_id) are not interchangeable. The planner can use a B-tree index efficiently for any prefix of its columns. So:

IndexWHERE user_id = 42WHERE created_at > XWHERE user_id = 42 AND created_at > X
(user_id, created_at)FastSlow (full scan)Fast
(created_at, user_id)SlowFastFast (but less efficient)

The rule of thumb: put the column you filter by exact-equality on first, and the column you filter by range or sort on second. user_id = 42 AND created_at > X matches the equality-then-range shape, so (user_id, created_at) is the right order.

If a query never filters on user_id alone, you might still want it as the leading column because it is more selective. "Selectivity" is the fraction of rows a condition keeps. A condition that keeps 1 in 10,000 rows is more selective than one that keeps 1 in 10. The planner uses statistics from ANALYZE to estimate selectivity; it picks the index whose leading column gives the smallest estimated row count.

The three indexing mistakes I keep finding in code review

Mistake one: indexing every foreign key by reflex. The advice "always index foreign keys" is not wrong, but it is incomplete. You index a foreign key when you query by it, or when you delete the parent row and need the child rows scanned to enforce or cascade. If neither applies, the index is just write-amplification with no payoff. I have removed many indexes that existed only because someone heard "always index foreign keys".

Mistake two: indexing low-cardinality columns alone. A boolean is_active column has two values. An index on it alone is almost never used; the planner correctly decides a seq scan and filter is cheaper. Indexes on low-cardinality columns are useful only as part of a composite (e.g., (is_active, created_at) works because the leading equality narrows the search before the range scan).

Mistake three: not running ANALYZE after a big data load. Without recent statistics, the planner's row estimates can be wildly wrong, and it picks bad plans. After a big migration or bulk import, run ANALYZE table_name. I have spent hours debugging "wrong index choice" only to discover the table statistics were three months stale.

Partial indexes: the underrated tool

A partial index is a B-tree on a subset of rows defined by a WHERE clause. They are smaller, faster to maintain, and ideal for queries that always filter on a low-cardinality predicate.

CREATE INDEX active_orders_idx ON orders(user_id, created_at DESC)
WHERE status = 'active';

If 95% of your queries are over status = 'active' and 95% of the rows are not active, this index is a fraction of the size of a full one and just as fast for the queries that hit it. The planner uses the partial index automatically when the query's WHERE clause logically implies the index's predicate.

I have shipped partial indexes on soft-delete tables (WHERE deleted_at IS NULL), on tenant scopes (WHERE tenant_id = 'x'), and on archive boundaries (WHERE created_at >= '2025-01-01'). All three patterns are common; almost no one reaches for them by default.

When the planner ignores your index

Sometimes you add an index, run the query, and the planner still does a Seq Scan. Three usual culprits:

  1. The query returns too many rows. If the planner estimates the query will return 30% of the table, a Seq Scan is genuinely faster than chasing index pointers. The planner is right; remove the index or change the query.
  2. A function on the indexed column. WHERE LOWER(email) = '[email protected]' does not match an index on email alone. Either index the expression (CREATE INDEX ... ON users(LOWER(email))) or store the lowercased form in a column.
  3. Stale statistics. Run ANALYZE. Repeat. If the plan changes, statistics were the issue.

When in doubt, you can force the planner to consider only specific access methods via SET enable_seqscan = off for the session; if the indexed plan is faster, you have your answer. Do not leave that setting on in production; it is a debugging knob, not a tuning one.

What I check before shipping any new query

My personal pre-merge checklist for any query that touches a hot table:

  1. Run EXPLAIN ANALYZE on production-like data (use a snapshot, not toy data; planner choices change with table size).
  2. Confirm the plan uses Index Scan or Index Only Scan, not Seq Scan, on tables larger than 100k rows.
  3. Check Rows Removed by Filter is small or zero. If it is large, the index is not covering the filter.
  4. Confirm Sort does not appear unless the data really needs sorting that no index can provide.
  5. Confirm Planning Time is small (sub-millisecond). If planning takes longer than execution, the query may be too complex for the planner to handle well, and you might need a query hint or rewrite.

It is five minutes of work and it has caught real regressions that would have hit production otherwise.

Reading EXPLAIN as a daily habit

The skill that has paid me back the most over my career is treating EXPLAIN output as something I read every day, not as a tool I reach for when something is on fire. Adding an index in a migration without running EXPLAIN before and after is a coin flip. Removing an index because nobody is using it, without checking the planner's actual usage statistics (pg_stat_user_indexes), is a different coin flip. Both go badly often enough that I have stopped doing either without the planner's input.

If the only thing you take from this article is the habit of pasting your slow query into a psql session with EXPLAIN ANALYZE before you guess, I will count it as time well spent. Indexes are not magic. They are a sorted lookup structure with rules the planner can read. Once you can read those rules too, the conversation about "why is this slow" stops being mystical and starts being mechanical.

Back to Articles