The first paginated endpoint I shipped used LIMIT 20 OFFSET ?. It was fine for two years. Then a customer started scrolling page 4,000 of an audit log and the database went sideways for everyone. The query plan looked harmless until I read it carefully: OFFSET 80000 was not a free skip, it was scanning and discarding eighty thousand rows on every fetch. The user was paying our database to throw away their own data eighty thousand rows at a time.
That afternoon I learned the difference between offset, cursor, and keyset pagination, and I have been picking deliberately ever since. My stance: offset is a default that breaks under load, keyset is what you actually want for most lists, and cursor is the public-API-friendly version of keyset that you should reach for when you have callers you do not control.
Three strategies, one query each
Same scenario throughout: a messages table sorted by created_at DESC with around fifty million rows. We want twenty messages at a time.
Offset is the textbook version. The server takes a page number and a page size:
Page 4,001. The database has to materialize the first 80,020 rows in sort order, throw away the first 80,000, and return the remaining twenty. The cost grows linearly with page depth.
Keyset (sometimes called "seek" pagination) replaces the offset with a "where I left off" predicate:
The composite predicate (created_at, id) < (...) is a row-value comparison: any message strictly older than the last one I saw. With an index on (conversation_id, created_at DESC, id DESC) the query is O(log n) to find the entry point and then a sequential read of twenty rows. Page depth does not matter. Page 4,001 is as fast as page 1.
Cursor is keyset wearing a public costume. The server hands the client an opaque token and the client returns it on the next call:
Inside the server, you decode the cursor (here, base64-encoded JSON of { createdAt, id }), and the SQL is the keyset query above. The cursor is just the encoded "where I left off" tuple plus whatever metadata you want to bind into it (the sort key, a query fingerprint, an issued-at).
Why offset feels right and is wrong
Offset feels right because it matches the mental model of a numbered book. Page 1, page 2, page 3. Implementing "Jump to page 47" is trivial. Showing a "Page X of Y" pager is trivial. Most ORMs and admin panels give offset to you for free.
Three things go wrong as the dataset grows.
The first is the cost. OFFSET N in every database I have used (Postgres, MySQL, SQLite, MongoDB) is implemented by reading and discarding the first N rows. There is no algorithm I am aware of in mainstream OLTP databases that can skip directly to row N+1 without producing the rows before it. So the wall-clock time of "page N" grows linearly with N. On a cold cache it is worse, because you are reading pages you will immediately throw away.
The second is correctness under writes. If the underlying list changes between two page fetches, offset breaks in subtle ways. A new message arrives between fetching page 1 and page 2: the user sees the most-recent message twice (once on page 1, once shifted into page 2). A message gets deleted: the user skips a message they should have seen. Pagers that read fast-changing data with offset always have this bug; users either notice and complain or never notice and silently miss things. Keyset pagination is immune to this because it anchors to a specific row identity.
The third is the COUNT(*) cost. Offset pagination almost always pairs with "show total count" because the UI wants "Page 4 of 200". SELECT COUNT(*) FROM messages WHERE conversation_id = $1 on a fifty-million-row table is its own problem. You can mitigate with cached counts, approximate counts, or just removing the total, but the count is a separate cost.
Why keyset wins for most internal lists
Keyset gets ridiculed sometimes because it cannot do "Jump to page 47" easily. That is true. It is also a feature. Most lists in real products do not need random page access. They need: load the first page; load more as the user scrolls; refresh when the data changes. Keyset gives you all three with constant time per page and zero double-counting under writes.
The shape of the SQL is the part that trips people up the first time, especially for composite sort keys. The pattern is to order by your sort columns followed by the primary key as a tie-breaker (so that ties in the sort columns produce a deterministic order), and to use a row-value predicate in the WHERE clause that mirrors the ORDER BY exactly:
If you order ascending, flip both: ORDER BY created_at ASC, id ASC with WHERE (created_at, id) > (...). The trap I have hit twice in my career is mismatched directions: ordering DESC but using > in the WHERE. The query returns wrong rows in a hard-to-spot way (you get one or two duplicates per page transition). Keep the directions in lockstep.
The index has to match too. CREATE INDEX ON messages (conversation_id, created_at DESC, id DESC) is what makes keyset O(log n). Without that index, the database will sort fifty million rows on each query, which is even worse than offset.
When cursor pagination is the right call
If your endpoint is consumed only by your own backends or your own UI, internal keyset pagination with raw "after this id and timestamp" parameters is fine. If your endpoint is public, you almost certainly want cursor pagination instead. Three reasons.
First, cursors are opaque. The client should not need to know that you sort by created_at DESC, id DESC. If you change the sort order tomorrow (say, from created_at to published_at), an opaque cursor lets you migrate without breaking clients. They send back whatever you handed them; what is inside is your business.
Second, cursors are tamperproof if you sign them. A naive cursor is a base64-encoded JSON tuple, which a client can decode and modify. If you do not want clients reaching into your sort key (because you might add fields that are filter-sensitive, or because you want to bind the cursor to a specific issuing query so it cannot be replayed against a different one), sign the cursor with HMAC and reject any cursor whose signature does not verify.
Third, cursors carry context that keyset alone does not. A typical cursor payload looks like this:
The fingerprint and version let the server detect when a client is paginating with a stale cursor against a query whose parameters have changed. When that happens, return a clear error ("cursor invalid for this query, restart from the beginning") rather than silently returning wrong rows.
The wire format looks something like:
I have shipped this exact shape twice and it has never let me down. The payload is small (a hundred bytes), the verification is fast, and the failure modes are explicit.
The trade-off table
| Concern | Offset | Keyset | Cursor |
|---|---|---|---|
| Page-depth cost | Linear in page number | Constant | Constant |
| Random page access ("jump to 47") | Yes | No | No |
| Stable under inserts/deletes | No (skips/dupes) | Yes | Yes |
| Total count visible | Easy | Needs separate count | Needs separate count |
| Public API friendly | OK but leaks shape | Reveals sort key | Opaque |
| Server complexity | Trivial | Composite WHERE + matching index | Cursor encode/decode + validation |
| Caller complexity | Page number | Last seen sort tuple | Opaque token |
The honest truth is that keyset and cursor have the same SQL underneath. The difference is the contract you expose to the caller.
The four pagination footguns
The first is forgetting the tie-breaker. Sorting by a non-unique column (say, created_at) without a secondary sort by the primary key gives you nondeterministic order under ties. Two messages with the same timestamp can swap places between calls, and the keyset comparison loses one or duplicates one. Always include the primary key as the last sort column, and include it in the WHERE predicate too.
The second is using offset on a WHERE clause that does not match the index. WHERE conversation_id = $1 ORDER BY created_at DESC LIMIT 20 OFFSET 80000 without an index on (conversation_id, created_at DESC) will sort fifty million rows even when you only want twenty. EXPLAIN will tell you. Run it.
The third is not signing cursors and letting clients mutate the contents. I have seen a vulnerability where a client decoded the cursor, changed the conversation_id, and re-encoded it, fetching messages from a conversation they did not have access to. The fix is either to never trust cursor contents (re-validate the query parameters server-side every page) or to sign the cursor (so any tampering invalidates it). I prefer the latter because it is a smaller code change and it composes cleanly.
The fourth is the +1 trick. If the client asks for page size 20, fetch 21 rows. If you got 21 back, there is at least one more page (drop the 21st row from the response, encode its sort key as nextCursor). If you got 20 or fewer, you reached the end (hasMore: false). This is much cheaper than a separate "is there a next page" query.
What I default to now
For internal lists in admin panels with low row counts: offset is fine. The simplicity wins.
For user-facing infinite-scroll feeds, chat history, audit logs, anything that grows without bound: keyset pagination with the composite sort key index. Random page access is not a real requirement; users do not jump to page 4,000.
For public APIs and SDKs: cursor pagination with signed, fingerprinted cursors. The cursor is opaque, the contract is stable, and I can change the sort column without breaking clients tomorrow.
The one I never default to anymore is offset on user-facing data that grows. The day you ship it, it works. Eight months and a million rows later, your users teach you what OFFSET 80000 actually costs.
Where pagination is going next
Two trends I expect over the next couple of years. The first is more libraries shipping cursor pagination as the default in their query builders. Prisma already exposes cursor-style pagination cleanly; SQLAlchemy's with_loader_criteria and Drizzle's API both lean keyset-friendly. As these defaults get better, the activation energy to "do pagination right" drops. The second is more standardized cursor formats in public APIs (the GitHub, Stripe, and Twitter APIs all use slightly different shapes; a shared convention would help SDK authors). For now, the shape is: opaque, signed, fingerprinted, with a clear "cursor invalid" error path. That is what I would build today, and it has held up to every load profile I have shipped against.
