System Design Article

Database Sharding & Partitioning Strategies

Difficulty: Medium

Sharding splits a database into many smaller pieces (shards) so writes and storage can scale across servers. The hard part is not the splitting; it is choosing a shard key that avoids hot shards, supporting cross-shard queries, and rebalancing as the data grows. This lesson covers the four sharding strategies, how to pick a shard key, the operational realities of resharding, and when sharding is the wrong answer.

Database Sharding & Partitioning Strategies

System Design

Medium

data-partitioning

partitioning

database

horizontal-scaling

consistent-hashing

sql

system-design

intermediate

282 views

Vertical vs Horizontal Partitioning vs Sharding

The terms get tossed around interchangeably, but they mean different things.

Vertical partitioning: split a wide table by columns. The users table becomes users_core (id, email, created_at) and users_profile (id, bio, avatar_url). Useful when a table has hot small columns and cold large ones.
Horizontal partitioning (single-machine): split a table by rows but keep all partitions on the same database server. Postgres PARTITION BY RANGE (created_at) is the canonical example. Speeds up scans and lets you drop old data by dropping a partition.
Sharding: horizontal partitioning where each partition lives on its own database server. This is what scales writes and total storage past one machine.

In an interview, when someone says 'shard', they almost always mean the third one: row-level splits across multiple physical servers.

When (and When Not) to Shard

Sharding is expensive. It complicates joins, transactions, schema migrations, backups, and observability. Reach for it only when you cannot avoid it.

Good signals you need sharding:

A single leader cannot keep up with write QPS even after vertical scaling.
The dataset is larger than what one node can hold or back up in a reasonable window (usually multi-TB and growing fast).
Geographic latency requires data to live near users in different regions.

Signals you do not need sharding yet:

Read traffic dominates - add read replicas instead.
Specific tables are huge but the rest are normal - try table-level partitioning or moving cold data to archive storage.
Slow queries on indexed columns - the problem is query design, not horizontal scale.

A common rule of thumb: a tuned PostgreSQL or MySQL instance on modern hardware comfortably serves 10-50k write QPS and tens of TB of data. Most apps never need to shard.

The Four Sharding Strategies

1. Hash-Based Sharding

Apply a hash function to the shard key. Take hash(key) mod N to pick a shard. Distributes load uniformly because any clustering in the key (sequential IDs, common prefixes) is destroyed by the hash.

Text

---------- Hash sharding ----------
shard_id = hash(user_id) % 4

user_id 42 -> hash 0x9F2A -> shard 2
user_id 43 -> hash 0x1B07 -> shard 3
user_id 44 -> hash 0xC4E1 -> shard 1

Pros: even distribution, simple routing.
Cons: range scans are impossible (related keys land on different shards), and mod N rebalancing reshuffles almost everything when N changes. Use consistent hashing (covered in its own lesson) to avoid the second problem.

2. Range-Based Sharding

Assign each shard a contiguous range of the shard key. user_id 1-1M -> shard 1, 1M-2M -> shard 2, and so on.

Pros: range scans are fast (WHERE user_id BETWEEN ... hits a single shard).
Cons: monotonically increasing keys cause hot shards (every new user lands on the latest shard). Mitigation: use a non-monotonic key, prepend a hash bucket, or use ordered keys with rotation.

Used by HBase, Bigtable, MongoDB ranged sharding, and CockroachDB.

3. Geographic / Directory-Based Sharding

A lookup table maps each entity (user, tenant) to a specific shard. Common in multi-tenant SaaS:

tenant_id	shard
acme	shard-us-1
globex	shard-eu-3
initech	shard-ap-2

Pros: maximum flexibility (move a tenant to its own shard for noisy-neighbor reasons), easy compliance with data-residency rules.
Cons: every read needs a directory lookup (cache it!), the directory is itself a single point of failure.

4. Functional Sharding (a.k.a. service-based)

Different tables live on different databases entirely. users on one cluster, orders on another, analytics_events on a third. Often a stepping stone before true horizontal sharding.

Choosing the Shard Key

The shard key is the single most important decision in any sharded system. Get it wrong and the database is forever painful.

A good shard key:

Has high cardinality - many distinct values, so load can spread.
Has even access frequency - no key dominates the read or write load.
Matches the dominant query pattern - so most queries hit a single shard.
Is immutable per row - changing a row's shard key means moving it, which is slow.

Classic shard keys by domain:

Multi-tenant SaaS: tenant_id. All of a tenant's data lives together; cross-tenant queries are rare.
Social networks: user_id. The user's posts, friends, settings all live together.
Time-series and IoT: (device_id, day) or just device_id with TTL. Range scans within a device are fast.
E-commerce: customer_id for orders, but product_id for inventory - often a system uses both depending on the table.

Bad shard key examples

country - low cardinality and skewed distribution (the US shard explodes).
status - tiny number of values, plus values change so rows would have to migrate.
created_at - monotonic, so the newest shard is always the hot one.

When the dominant queries do not have a single key

Reality check: most apps have multiple query patterns. You can:

Pick the most frequent pattern's key as the shard key.
Maintain a secondary index in a separate sharded table indexed by the other key.
Use a search engine (Elasticsearch) for ad-hoc queries that span shards.
Periodically materialize aggregates into a small global table.

This is the kind of trade-off senior engineers make explicitly.

Cross-Shard Operations: The Painful Part

Cross-Shard Joins

A SQL JOIN between rows on different shards is not natively supported. Options:

Co-locate by shard key - if both tables are sharded by user_id, the join is local to one shard.
Replicate small reference tables - copy currencies or countries to every shard so joins always work.
Application-side join - fetch from each shard and merge in code. Pay the latency.
Scatter-gather - send the query to all shards in parallel, merge results. Bounded by the slowest shard.

Cross-Shard Transactions

Single-shard ACID is easy; cross-shard ACID needs two-phase commit (2PC) or the Saga pattern (covered in the Distributed Transactions lesson). Either adds latency and failure modes. The pragmatic answer is to design the schema so that the things that must be transactional live on the same shard.

Resharding

When one shard fills up or runs hot, you must split it. This is the operationally hardest part of sharding:

Add new shard(s) and start a logical-replication stream from the affected source shard.
Wait for the new shard to catch up.
Switch the routing layer to send the moved keys to the new shard.
Verify and clean up data on the old shard.

With hash sharding and a fixed N, resharding requires moving most of the data. With consistent hashing or with virtual shards (logically thousands of shards mapped to a smaller set of physical nodes), only a small fraction moves. Always design for resharding from day one.

Query Routing

When the application sends a query, something must route it to the right shard(s). Three architectures:

Text

---------- Routing options ----------
1. Smart client     - app library knows the shard map, routes directly
2. Routing tier     - a stateless proxy (Vitess vtgate, Citus coordinator) accepts SQL,
                       parses it, routes to the right shard, merges results
3. Coordinator node - a cluster member that knows the shard map and forwards

Most teams choose option 2 because it keeps app code simple, lets you change the shard map without redeploying every service, and provides a natural place for connection pooling, query rewriting, and metrics.

Pseudocode for a smart client routing layer

JavaScript

Python

const SHARDS = [pgShard0, pgShard1, pgShard2, pgShard3];

function shardFor(userId) {
    // FNV-1a 32-bit hash, then mod by shard count
    const h = fnv1a(String(userId));
    return SHARDS[h % SHARDS.length];
}

async function getUser(userId) {
    const shard = shardFor(userId);
    return shard.query('SELECT * FROM users WHERE id = $1', [userId]);
}

async function getUsersByEmail(email) {
    // Cross-shard scatter-gather (we have no email -> shard map)
    const results = await Promise.all(
        SHARDS.map((s) => s.query('SELECT * FROM users WHERE email = $1', [email]))
    );
    return results.flatMap((r) => r.rows);
}

Note how the second function pays N times the cost of the first. Cross-shard queries are why the shard key matters so much.

Real-World Examples

How real systems implement this in production

YouTube and Vitess

YouTube built Vitess to shard MySQL transparently. The vtgate router speaks the MySQL protocol, parses queries, and routes them to the right shards. Slack, GitHub, and HubSpot run Vitess at very large scale today.

Trade-off: Vitess hides most sharding complexity from the application but constrains schema design - some SQL features (foreign keys across shards, certain joins) are not supported. Teams adapt the schema to fit the routing layer.

Instagram sharded Postgres

Instagram sharded Postgres by user_id from early on, encoding the shard ID inside the primary key. A user's posts are co-located on one shard, so profile rendering is a single-shard read. Cross-user feeds are computed by separate denormalized pipelines.

Trade-off: Per-user queries are blazing fast and trivially cacheable, but cross-shard analytics requires a separate data warehouse path. The split between OLTP and OLAP became a permanent architectural decision.

Discord bucketed Cassandra

Discord uses (channel_id, time_bucket) as the partition key for messages. Within a channel, recent messages cluster together for fast reads, but a viral channel only heats up the current bucket instead of the entire channel partition.

Trade-off: Bucketing complicates queries that span buckets (the client must iterate over recent buckets) but keeps tail latency predictable even when individual channels go viral.

Quick Interview Phrases

Key terms to use in your answer

shard key selection

hot shard

scatter-gather query

co-located joins

consistent hashing

virtual shards

Common Interview Questions

Questions you might be asked about this topic

How would you shard a multi-tenant SaaS database?

Shard by tenant_id. All of a tenant's rows live on one shard, so per-tenant queries are local and tenant-level isolation is easier. Use a directory service (small lookup table) to map tenant_id -> shard, and cache it. For very large tenants, consider giving them a dedicated shard. For very small tenants, pack many of them onto one shard.

Compare hash sharding versus range sharding for user data.

Why is consistent hashing better than mod-N hashing for sharding?

How do you handle a transaction that spans two shards?

Your shard 7 is at 90% CPU while others are idle. What's the diagnosis and fix?

Interview Tips

How to discuss this topic effectively

State the shard key in the first sentence and tie it to the dominant query pattern. 'I'd shard by user_id because the profile read - 80% of traffic - is then a single-shard lookup.'

Always raise the cross-shard query problem unprompted. 'Joins across shards are expensive, so I'd co-locate friends and posts on the same shard or denormalize.' Interviewers expect to hear this.

Acknowledge the operational tax of sharding: harder migrations, harder backups, harder transactions, harder analytics. Then justify why this workload still needs it.

Bring up resharding strategy proactively. 'I'd use virtual shards / consistent hashing so adding a node moves only 1/N of the data, not most of it.'

Know when not to shard. 'Postgres on a 96-core box handles 50k QPS; we should saturate one node before adding shard complexity.' Restraint is a senior signal.

Common Mistakes

Pitfalls to avoid in interviews

Sharding by a low-cardinality column like country or status

Low cardinality means too few distinct values to spread load. The US shard or the 'pending' shard becomes a hot spot. Pick a high-cardinality column, ideally one whose values are accessed roughly uniformly.

Sharding by a monotonically increasing key (created_at, auto-increment id)

All new writes land on the same shard. Either hash the key first, prepend a small bucket, or use an explicit per-shard sequence so growth is spread out.

Choosing the shard key based on what is convenient now, not the dominant query

The shard key has to align with the queries that run the most. Decide your top 3 query patterns first, then pick a key that makes the top 1 or 2 single-shard lookups.

Sharding before you have to

Sharding multiplies the operational load: harder schema migrations, no easy joins, complicated transactions. A well-tuned single Postgres instance handles most workloads up to 50k QPS. Saturate one node first.

Forgetting to plan for resharding from day one

Workloads grow unevenly and shards need to split. If you used naive hash-mod-N, splitting requires moving most rows. Use consistent hashing or virtual shards so adding capacity moves only a small fraction.

Back to System Design