I have given a version of this talk to maybe twelve teams over the years, and the first slide is always the same. It is the EXPLAIN output of a slow query they sent me, with one line highlighted, and the question "why is this a Seq Scan?". The conversation that follows usually reveals that the team knows indexes exist, knows they are good, and has no idea what the planner is actually doing. This article is the conversation, written down once, with the EXPLAIN output you can read tomorrow morning to fix your own queries.
The thesis up front, because I want my opinion to be visible: you do not really understand a slow query until you have read its EXPLAIN ANALYZE output and matched each line to a physical operation. Reasoning about indexes from the schema alone, without ever reading the planner's actual choices, is how teams end up with twenty redundant indexes, none of which the planner uses on the query that matters.
What an index physically is
A B-tree index in Postgres is a separate on-disk structure that stores (indexed_columns, row_pointer) pairs sorted by the indexed columns. "Row pointer" in Postgres is the ctid (a (page, offset) tuple). The tree depth is typically 3 to 4 levels for tables with millions of rows, so a lookup is a handful of disk reads instead of a full scan.
I find a tiny visual helps people more than the formal definition.
The index is not the table. It is a sorted lookup structure that says "if you want rows with user_id = 42, here are their ctids; go to those pages of the heap to read them.". The heap is the actual table data on disk; the index is metadata pointing into it.
Two consequences fall out of this model that everything else builds on. First, an index makes lookups fast but slows down writes (every insert, update, or delete on the indexed column has to update the index too). Second, an index can answer queries without touching the heap at all, but only if every column the query needs is in the index. That second one is what "covering index" means and why it matters for read-heavy paths.
EXPLAIN output, line by line
Here is a real query I tuned recently. The table has 4 million rows; the user-facing endpoint was timing out at 3 seconds.
Before I added an index, the plan looked like this:
Reading from the bottom up, which is how I always read an EXPLAIN tree: the Seq Scan reads the entire heap (4 million rows, dropping 3.9 million via filter), the Sort then sorts the 51,234 surviving rows by created_at, and the Limit takes the top 20. The execution time is 2418ms; the planning time is 0.4ms. The query is dominated by the seq scan.
After adding CREATE INDEX orders_user_created_idx ON orders(user_id, created_at DESC):
From 2418ms to 0.245ms. Roughly a 10,000x speedup, on a query that drove our p99 latency. Three things changed:
Seq ScanbecameIndex Scan. The planner found an index whose leading column matchesuser_id = 42, used it to seek directly into the small slice of rows for that user, then walked the second index column (created_at) in descending order.- The
Sortstep disappeared entirely. Because the index is(user_id, created_at DESC), the rows come out of the index already in the desired order. The planner does not need an explicit sort. - The
Index Condcovers both filter conditions. Bothuser_id = 42andcreated_at >= ...are evaluated as part of the index seek, not as a post-filter. The planner only reads the rows that match.
The single most useful EXPLAIN reading skill
If I had to teach one thing, it would be the difference between Index Cond, Filter, and Recheck Cond in the EXPLAIN output. They are not the same.
Index Cond means the condition was used to traverse the index. The planner only visits rows that match. This is fast.
Filter means the condition is applied after the rows are pulled from the heap. The planner reads the row, then checks the condition. If most rows fail the filter, you are doing a lot of useless I/O.
Recheck Cond appears in bitmap scans. The bitmap may be lossy (block-level, not row-level), so each row is rechecked after fetch. Usually this is fine; it becomes a problem only when the rechecked condition is expensive.
When you see Filter with a high Rows Removed by Filter value, that is your sign. The planner is reading too many rows from the heap and discarding most of them. Either the index does not cover the filter condition, or the planner did not pick the right index.
Composite index column order matters more than people think
An index on (user_id, created_at) and one on (created_at, user_id) are not interchangeable. The planner can use a B-tree index efficiently for any prefix of its columns. So:
| Index | WHERE user_id = 42 | WHERE created_at > X | WHERE user_id = 42 AND created_at > X |
|---|---|---|---|
(user_id, created_at) | Fast | Slow (full scan) | Fast |
(created_at, user_id) | Slow | Fast | Fast (but less efficient) |
The rule of thumb: put the column you filter by exact-equality on first, and the column you filter by range or sort on second. user_id = 42 AND created_at > X matches the equality-then-range shape, so (user_id, created_at) is the right order.
If a query never filters on user_id alone, you might still want it as the leading column because it is more selective. "Selectivity" is the fraction of rows a condition keeps. A condition that keeps 1 in 10,000 rows is more selective than one that keeps 1 in 10. The planner uses statistics from ANALYZE to estimate selectivity; it picks the index whose leading column gives the smallest estimated row count.
The three indexing mistakes I keep finding in code review
Mistake one: indexing every foreign key by reflex. The advice "always index foreign keys" is not wrong, but it is incomplete. You index a foreign key when you query by it, or when you delete the parent row and need the child rows scanned to enforce or cascade. If neither applies, the index is just write-amplification with no payoff. I have removed many indexes that existed only because someone heard "always index foreign keys".
Mistake two: indexing low-cardinality columns alone. A boolean is_active column has two values. An index on it alone is almost never used; the planner correctly decides a seq scan and filter is cheaper. Indexes on low-cardinality columns are useful only as part of a composite (e.g., (is_active, created_at) works because the leading equality narrows the search before the range scan).
Mistake three: not running ANALYZE after a big data load. Without recent statistics, the planner's row estimates can be wildly wrong, and it picks bad plans. After a big migration or bulk import, run ANALYZE table_name. I have spent hours debugging "wrong index choice" only to discover the table statistics were three months stale.
Partial indexes: the underrated tool
A partial index is a B-tree on a subset of rows defined by a WHERE clause. They are smaller, faster to maintain, and ideal for queries that always filter on a low-cardinality predicate.
If 95% of your queries are over status = 'active' and 95% of the rows are not active, this index is a fraction of the size of a full one and just as fast for the queries that hit it. The planner uses the partial index automatically when the query's WHERE clause logically implies the index's predicate.
I have shipped partial indexes on soft-delete tables (WHERE deleted_at IS NULL), on tenant scopes (WHERE tenant_id = 'x'), and on archive boundaries (WHERE created_at >= '2025-01-01'). All three patterns are common; almost no one reaches for them by default.
When the planner ignores your index
Sometimes you add an index, run the query, and the planner still does a Seq Scan. Three usual culprits:
- The query returns too many rows. If the planner estimates the query will return 30% of the table, a
Seq Scanis genuinely faster than chasing index pointers. The planner is right; remove the index or change the query. - A function on the indexed column.
WHERE LOWER(email) = '[email protected]'does not match an index onemailalone. Either index the expression (CREATE INDEX ... ON users(LOWER(email))) or store the lowercased form in a column. - Stale statistics. Run
ANALYZE. Repeat. If the plan changes, statistics were the issue.
When in doubt, you can force the planner to consider only specific access methods via SET enable_seqscan = off for the session; if the indexed plan is faster, you have your answer. Do not leave that setting on in production; it is a debugging knob, not a tuning one.
What I check before shipping any new query
My personal pre-merge checklist for any query that touches a hot table:
- Run
EXPLAIN ANALYZEon production-like data (use a snapshot, not toy data; planner choices change with table size). - Confirm the plan uses
Index ScanorIndex Only Scan, notSeq Scan, on tables larger than 100k rows. - Check
Rows Removed by Filteris small or zero. If it is large, the index is not covering the filter. - Confirm
Sortdoes not appear unless the data really needs sorting that no index can provide. - Confirm
Planning Timeis small (sub-millisecond). If planning takes longer than execution, the query may be too complex for the planner to handle well, and you might need a query hint or rewrite.
It is five minutes of work and it has caught real regressions that would have hit production otherwise.
Reading EXPLAIN as a daily habit
The skill that has paid me back the most over my career is treating EXPLAIN output as something I read every day, not as a tool I reach for when something is on fire. Adding an index in a migration without running EXPLAIN before and after is a coin flip. Removing an index because nobody is using it, without checking the planner's actual usage statistics (pg_stat_user_indexes), is a different coin flip. Both go badly often enough that I have stopped doing either without the planner's input.
If the only thing you take from this article is the habit of pasting your slow query into a psql session with EXPLAIN ANALYZE before you guess, I will count it as time well spent. Indexes are not magic. They are a sorted lookup structure with rules the planner can read. Once you can read those rules too, the conversation about "why is this slow" stops being mystical and starts being mechanical.
