System Design Article

Design Twitter / X (Social Feed)

Difficulty: Medium

Design a microblogging service like Twitter or X with 250M daily active users posting 500M tweets a day, served as a personalized timeline at sub-200 ms p99. The interview centerpiece is the home timeline: hybrid fan-out at the celebrity boundary, write amplification math, and how Twitter built Manhattan and the Timeline Service to make 250M people see fresh tweets within seconds. We also cover trending topics, the search index, retweet semantics, and how Twitter handles 50,000 tweets per second when a major event happens.

System Design
/

Design Twitter / X (Social Feed)

Design Twitter / X (Social Feed)

Design a microblogging service like Twitter or X with 250M daily active users posting 500M tweets a day, served as a personalized timeline at sub-200 ms p99. The interview centerpiece is the home timeline: hybrid fan-out at the celebrity boundary, write amplification math, and how Twitter built Manhattan and the Timeline Service to make 250M people see fresh tweets within seconds. We also cover trending topics, the search index, retweet semantics, and how Twitter handles 50,000 tweets per second when a major event happens.

System Design
Medium
design-twitter
case-study
social-content-platforms
fan-out-on-write
fan-out-on-read
hybrid-fan-out
celebrity-problem
timeline-service
feed-ranking
trending-topics
social-media
system-design
intermediate
premium

1,186 views

31

Requirements

Functional Requirements

  1. Post a tweet (up to 280 chars, with media attachments).
  2. Home timeline: see tweets from accounts you follow, ranked by recency (and ML in extended scope).
  3. User timeline: see all tweets by a single account.
  4. Follow / unfollow another user.
  5. Retweet, quote-tweet, reply, like.
  6. Search for tweets by keyword.
  7. Trending topics: top hashtags / keywords by region in the last hour.

Out of Scope (state explicitly)

  • Direct messages (Chat System case study).
  • Spaces (audio rooms, separate design).
  • Ad targeting and ranking pipeline.
  • Twitter Blue / verification / payments.

Non-Functional Requirements

  1. Scale: 250M DAU, 500M tweets per day, ~5,800 tweets/sec average.
  2. Read-heavy: 1000:1 timeline reads vs tweets (read amplification through fan-out).
  3. Timeline latency: p99 < 200 ms.
  4. Tweet propagation: a tweet should reach normal followers within ~5 sec, celebrities within seconds at most (it's expected slower for huge accounts).
  5. High availability: 99.99%. Twitter outages make global news.
  6. Eventually consistent likes / retweet counts / follower counts are acceptable.

Back-of-the-Envelope Estimation

Traffic

Text
---------- Tweet volume ----------
DAU:                     250M
Tweets / DAU / day:      ~2 (average; a small minority writes most)
Tweets per day:          500M
Tweets per second avg:   500M / 86400 = 5,800 /sec
Tweets per second peak:  ~17,000 /sec (3x average; 50,000+ during major events)

Timeline reads / DAU:    20 (open the app many times)
Timeline reads/sec avg:  250M * 20 / 86400 = 58,000 /sec
Timeline reads/sec peak: ~175,000 /sec

Fan-out Write Amplification

The scary number for Twitter. With pure fan-out on write:

Text
---------- Fan-out cost ----------
Avg followers per user:   ~700
Writes per tweet (pure):  ~700 fan-out inserts
Total fan-out writes/sec: 5,800 * 700 = 4M writes/sec to feed storage

With celebrity (1M+ followers):
Writes for ONE celebrity tweet: 1M-100M

A single celebrity tweet (Lady Gaga, Bieber, Musk) generates more writes than the entire normal-user write rate combined. This is why hybrid fan-out exists.

Storage

Text
---------- Tweet storage ----------
Tweet row (id, user_id, text, timestamps, counts, etc.):  ~300 bytes (text + metadata)
With media reference:                                       ~400 bytes

Daily tweet storage:    500M * 400 = 200 GB/day
Yearly:                 ~75 TB/year
5 years:                ~375 TB (manageable across sharded clusters)

Follower graph rows:    250M users * 700 follows avg = 175B rows
Row size:               ~24 bytes (two BIGINT + timestamp)
Graph storage:          175B * 24 = ~4 TB

Timeline storage in Redis (precomputed feeds):

Text
Active users:                    100M
Timeline length per user:        800 tweet IDs (cap)
Per-entry size:                  ~16 bytes (id + score)
Total Redis memory:              100M * 800 * 16 = ~1.3 TB

Fits in a few hundred Redis nodes.

High-Level Design

Text
---------- High-level architecture ----------
            +----------+
            |  Client  |
            +----------+
                 |
                 v
        +---------------------+
        |    API Gateway      |
        +---------------------+
           |        |       |
           v        v       v
    +--------+ +--------+ +--------+
    | Tweet  | |Timeline| | Search |
    | Service| | Service| | Service|
    +--------+ +--------+ +--------+
        |          |         |
        v          v         v
    +--------+ +-------+ +----------+
    | Manhattan| Redis | |Elastic   |
    | (tweets) | feeds | |Search    |
    +--------+ +-------+ +----------+
        |          ^
        v          |
    +--------+ +-------+
    | Kafka  |->|Fan-out|
    |        |  | Worker|
    +--------+ +-------+
                  |
                  v
           +---------------+
           | User Service  |
           | (followers)   |
           +---------------+

API Design

Jsonc
// Post a tweet
POST /api/v1/tweets
Authorization: Bearer <token>
{
    "text": "Just shipped a new design system!",
    "reply_to": null,
    "quoted_tweet_id": null,
    "media_ids": ["01HW..."]
}
// Response 201
{
    "id": "01HW3M9...",
    "created_at": "2026-04-26T10:00:00Z"
}

// Get home timeline
GET /api/v1/timeline?limit=20&cursor=<opaque>
// Response
{
    "items": [
        {
            "id": "01HW...",
            "author": { "handle": "alice", "name": "Alice", "avatar_url": "..." },
            "text": "Just shipped a new design system!",
            "created_at": "2026-04-26T10:00:00Z",
            "like_count": 42,
            "retweet_count": 5,
            "reply_count": 3,
            "reply_to": null,
            "quoted": null
        }
    ],
    "next_cursor": "<opaque>"
}

// Retweet
POST /api/v1/tweets/:id/retweet

// Trending topics for a region
GET /api/v1/trends?region=US
// Response
{
    "trends": [
        { "topic": "#NBA", "tweets_per_min": 12000, "rank": 1 },
        ...
    ]
}

Tweet Write Path

  1. Client POSTs tweet to Tweet Service.
  2. Tweet Service generates a Snowflake ID, writes the tweet to Manhattan (Twitter's distributed key-value store) sharded by tweet_id.
  3. Tweet Service emits a Kafka event tweet.created.
  4. Fan-out Worker consumes the event, looks up the author's followers (from User Service), and:
    • If follower count < 1M: enqueue per-follower writes to Redis ZSET home_timeline:<follower_id>.
    • If follower count >= 1M: do NOT fan out. The tweet stays only in the author's user timeline and Manhattan; followers pull it on read.
  5. The tweet also goes to the Search Service via a separate Kafka topic for indexing into Elasticsearch.

Home Timeline Read Path

  1. Client GETs /timeline?cursor=....
  2. Timeline Service: a. Fetches precomputed timeline from home_timeline:<user_id> (~800 tweet IDs). b. Looks up which celebrities the user follows (cached list of celebrity follow IDs). c. For each celebrity, pulls their last ~50 tweets from Manhattan. d. Merges by timestamp (or score), deduplicates, applies viewer-side filters (muted users, blocked, reported). e. Hydrates tweet objects (full text, author info, counts) by batched read from Manhattan + counter cache. f. Returns top 20.

Latency budget:

Text
---------- Timeline read latency budget (p99 200 ms) ----------
Network to API gateway:           20 ms
Fetch precomputed timeline:        2 ms (Redis ZSET)
Fetch celebrity follow list:       2 ms (cached)
Fetch celebrity recent tweets:    20 ms (parallel reads from Manhattan)
Merge + filter:                    5 ms
Hydrate tweet objects:            30 ms (batched Manhattan read)
Return:                           20 ms (network back)
Total:                            ~99 ms median, ~150 ms p99

Detailed Design

The Hybrid Fan-out Strategy

This is the central design decision and the source of most interview points.

Why pure fan-out on write fails

A Justin Bieber tweet (~100M followers) generates 100M Redis ZSET inserts. Even at 100K writes/sec per Redis node, that's 1,000 node-seconds of throughput - the cluster is saturated for several seconds per Bieber tweet. Now imagine 10 celebrities tweeting in the same minute.

Storage explodes too: 100M follower copies of one tweet ID. Even at 16 bytes per entry, a single celebrity tweet costs 1.6 GB of feed storage. Multiply by tweets/year and the math doesn't work.

Why pure fan-out on read fails

If Bob follows 700 accounts, his timeline read does 700 lookups. At 58K timeline reads/sec, that's 40M lookups/sec across the cluster. Even with caching it's wasteful.

The hybrid threshold

Define a 'celebrity' as a user with > N followers. Twitter historically used N around 10K to 1M depending on era; let's pick 1M for clarity.

CELEBRITY_THRESHOLD = 1_000_000

def on_tweet_created(tweet):
    if tweet.author.follower_count < CELEBRITY_THRESHOLD:
        fan_out_worker.enqueue_normal(tweet)
    else:
        # No fan-out. Followers will pull on read.
        pass
How followers see celebrity tweets

On timeline read:

def build_home_timeline(user_id):
    precomputed = redis.zrevrange(f"home_timeline:{user_id}", 0, 800)
    celebs = cache.get(f"celebrities_followed:{user_id}")  # the user's celebrity follow list
    pulled = []
    for celeb_id in celebs:
        pulled.extend(get_recent_tweets(celeb_id, limit=50))
    merged = merge_by_score(precomputed, pulled)
    return rank_and_filter(merged)[:20]
The threshold trade-off
  • Lower threshold (10K): more pull, less fan-out work, more read latency.
  • Higher threshold (10M): less pull, more fan-out work, faster reads.

Real Twitter tunes this dynamically per author and per follower (e.g., a power user follower may get celebrity tweets pushed; a light user pulls).

Manhattan: the Tweet Store

Twitter built Manhattan, a Cassandra-like distributed key-value store, because:

  • They needed sub-10ms p99 reads at high QPS.
  • Schema flexibility (tweets evolve).
  • Multi-DC replication with tunable consistency.

Key design:

  • Partition key: tweet_id (Snowflake; sortable by time).
  • Replication: 3 replicas per tweet, multi-region.
  • Consistency: writes are quorum (2 of 3); reads are LOCAL_QUORUM (low latency, AP under regional partition).

Schema (logical):

Text
tweets: { tweet_id, user_id, text, in_reply_to, quoted_tweet_id, media_ids, created_at, deleted_at }
user_tweets: { user_id, tweet_id, created_at } -- secondary index, partition by user_id, for user timeline

Counters: likes, retweets, replies

Do NOT increment counters on the Manhattan tweet row directly. Hot tweets (a celebrity's viral tweet) generate millions of likes per hour; lock contention kills throughput.

Use Redis counters with periodic flush:

  • like_count:<tweet_id> -> Redis INCR.
  • Background job flushes to Manhattan every 30 sec for durability.
  • On Redis failure, recompute from the Likes table (Cassandra).

For super-hot tweets, use approximate counters (HyperLogLog) or sharded counters across multiple Redis keys (like_count:<tweet_id>:<bucket>) and sum on read.

Trending Topics

Real-time top-K hashtags. Two pieces:

Stream processing: a Kafka consumer (built on Kafka Streams or Flink) ingests tweets, extracts hashtags/keywords, and updates a sliding-window count per topic per region.

Text
---------- Trending pipeline ----------
Kafka topic: tweet.created
  -> extract hashtags
  -> Flink job: window(15 minutes, sliding 1 minute) by hashtag x region
  -> output: (hashtag, region, count_in_window)
  -> writes top-K per region into Redis

Top-K maintenance: Redis sorted set per region, score = count, member = hashtag. Reads are O(log N) for top-N. Use Count-Min Sketch for memory efficiency on the long tail.

Anti-spam: weight each tweet by author quality (follower count, age, prior spam reports). Rate-limit per author so a botnet can't trend a hashtag.

Data Model

Manhattan: tweets and user_tweets

Text
---------- Manhattan tables ----------
tweets:
  partition key: tweet_id
  columns: user_id, text, in_reply_to, quoted_tweet_id, media_ids, created_at, status

user_tweets:
  partition key: user_id
  clustering key: tweet_id DESC
  -- supports 'all tweets by Alice in reverse chronological'

Postgres (sharded): users and the follow graph

SQL
-- users
CREATE TABLE users (
    id              BIGINT PRIMARY KEY,
    handle          VARCHAR(15) UNIQUE NOT NULL,
    name            VARCHAR(50),
    bio             TEXT,
    follower_count  INTEGER NOT NULL DEFAULT 0,    -- denormalized; reconcile via batch job
    following_count INTEGER NOT NULL DEFAULT 0,
    is_celebrity    BOOLEAN GENERATED ALWAYS AS (follower_count >= 1000000) STORED
);

-- follows: shard by follower_id (forward queries)
CREATE TABLE follows (
    follower_id BIGINT NOT NULL,
    followee_id BIGINT NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL,
    PRIMARY KEY (follower_id, followee_id)
);

-- followers: shard by followee_id (fan-out queries: 'who follows Alice?')
CREATE TABLE followers (
    followee_id BIGINT NOT NULL,
    follower_id BIGINT NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL,
    PRIMARY KEY (followee_id, follower_id)
);

Both follows and followers are written on every follow action (write-through dual write). Reads use whichever is sharded for the query. is_celebrity is denormalized so the Tweet Service can decide fan-out without a follower count query.

Cassandra: likes, retweets, replies

Text
CREATE TABLE likes (
    tweet_id  bigint,
    user_id   bigint,
    created_at timestamp,
    PRIMARY KEY ((tweet_id), user_id)
);

CREATE TABLE retweets (
    tweet_id    bigint,
    user_id     bigint,
    retweet_id  bigint,                     -- the new tweet ID created for the RT
    created_at  timestamp,
    PRIMARY KEY ((tweet_id), user_id)
);

Redis: timelines, counters, online presence

  • home_timeline:<user_id> -> ZSET (member=tweet_id, score=created_at unix ms). TTL 30 days.
  • user_timeline:<user_id> -> ZSET. (Optional: rebuild from Manhattan on miss.)
  • like_count:<tweet_id>, retweet_count:<tweet_id> -> integer.
  • celebrities_followed:<user_id> -> SET of celeb IDs (recomputed on follow/unfollow events).

Elasticsearch: tweet search

Index tweets (text, author handle, hashtags, language) for keyword search. Indexed asynchronously via Kafka. Use ILM to roll daily indices and delete after 90 days for cost. Older tweets are still in Manhattan for permalink reads but not searchable.

Scaling and Bottlenecks

Major event spike (Super Bowl, election)

Tweet rate spikes to 50K-100K/sec for short windows. Mitigations:

  • Tweet Service is stateless and autoscales.
  • Manhattan absorbs writes via partition spreading.
  • Fan-out workers backlog; the queue drains in minutes. Tweet appears in author's timeline immediately; reaches followers within 1-2 minutes.
  • Trending pipeline counts tweets in 15-min windows; spikes amplify the signal naturally.

One celebrity gains followers fast (a viral moment)

If an account crosses the 1M threshold mid-tweet, the design should not break:

  • The is_celebrity flag is recomputed during user updates.
  • For new tweets after the flip, no fan-out happens.
  • For old fan-out timeline entries, leave them; they're already there.
  • For the next ~10 minutes, the user's followers may see some tweets via fan-out and others via pull-on-read - that's fine; both paths produce correct results.

Hot read tweets (viral)

  • A single tweet permalink can hit 1M reads/sec.
  • CDN caches the tweet HTML (for web) with a short TTL.
  • API responses are cached at the edge with a 30-sec TTL where possible.
  • Manhattan serves the row from in-memory page cache trivially.

Multi-region

Active-active across 5+ regions. Manhattan replicates with eventual consistency between regions. A user's home region is sticky for write consistency. Read replicas in every region for low-latency timeline reads.

Counter inconsistency

During Redis failover, counters can lose recent increments. Mitigations:

  • Persist Redis with RDB+AOF.
  • Periodic reconciliation: a batch job recomputes like counts from Cassandra and updates the cached counter.
  • Display 'approximately' for counts > 100K to absorb minor drift.

Trade-offs and Alternatives

Why not use the Instagram design?

Twitter's tweet-rate (5,800/sec avg, 50K+ peak) is much higher than Instagram's photo upload rate (1,200/sec). The fan-out math is more punishing because text tweets are cheaper to create and users tweet much more frequently. Twitter's celebrity follower distribution is also more skewed (a top-100 account easily has 50M+ followers; Instagram's top photographer has 10M).

Push vs Pull thresholds

Real Twitter uses a per-edge decision rather than a global threshold:

  • High-value followers (active users, verified) get push even from celebrities.
  • Low-value or inactive followers always pull, even from normal users.

This adds complexity but optimizes the cost-per-follower-served metric. For an interview, mention this as an extension once you've defended the global threshold model.

Why Manhattan instead of Cassandra?

Twitter built Manhattan in 2014 because off-the-shelf Cassandra at that time had operational issues at Twitter's scale (multi-DC consistency, ops tooling). Today you'd probably use Cassandra, ScyllaDB, or DynamoDB. The architectural choice (sharded KV with partition by tweet_id) is the same.

Strong vs eventual consistency on follows

Follow / unfollow is eventually consistent: it can take seconds for the change to propagate to fan-out. A user might briefly see tweets from someone they just unfollowed. Acceptable for the product. If we needed strong consistency (rare), we'd add a synchronous write-through path that blocks the unfollow response until the follower's timeline is rewritten - but that's overkill.

Search: Elasticsearch vs custom inverted index

Elasticsearch handles most use cases. Twitter built Earlybird internally for real-time search because they needed seconds-fresh indexing across billions of tweets, which off-the-shelf Elasticsearch struggled with. Today Elasticsearch with ILM rollovers and aggressive shard sizing gets close.

Why GraphQL? (Twitter actually uses it.)

Twitter's product surface (timeline, profile, search, conversations) has very different read shapes. GraphQL lets the client request exactly what it needs without backend per-screen endpoints. Trade-off: more complex query planning and caching layer, but worth it for product velocity.

Real-World Examples

How real systems implement this in production

Mastodon

Federated microblogging via ActivityPub. Each instance handles its own users, fan-out, and timelines. Inter-instance fan-out happens via HTTP push. No central feed service.

Trade-off: Federation eliminates the centralized celebrity problem (each instance scales for its own users) but introduces inter-instance reliability problems. A celebrity on a small instance can DDoS the instance via federation traffic. Decentralized social media has its own pathologies.

Bluesky (AT Protocol)

Twitter alternative built on the AT Protocol with personal data stores per user and an algorithmic relay. Feeds are constructed by app-side aggregation from user PDS endpoints, not server-side fan-out.

Trade-off: Pull-based federated reads scale linearly with relay throughput, not with follower count, eliminating celebrity write amplification. The cost is a more complex client and slower cold-start for users following many accounts.

Threads (Meta)

Built on Instagram infrastructure. Reuses the Instagram fan-out machinery, follow graph, and CDN. Threads launched at hundreds of millions of users in days because it bolted onto an existing platform.

Trade-off: Reusing infrastructure is a massive launch advantage but constrains future divergence. Threads inherits Instagram's product decisions (no chronological feed for years, ML-heavy ranking) which were not always loved by users.

Reddit (front page feed)

Different model: feeds are per-subreddit (community), not per-user follow. The 'home feed' aggregates posts from subscribed subreddits ranked by hot/top algorithm. No celebrity problem because there's no celebrity follower graph; subreddit subscription is bounded.

Trade-off: Community-based feeds avoid the celebrity write amplification problem entirely but lose the personalized 'what my friends are doing' product. Different product = different scaling problem; copy the model that fits your product.

Quick Interview Phrases

Key terms to use in your answer

hybrid fan-out at the celebrity boundary
celebrity problem
write amplification
timeline service
approximate counters
Manhattan key-value store

Common Interview Questions

Questions you might be asked about this topic

It's not via fan-out; it's pull-on-read. Each follower pulls celebrity tweets when they open their timeline. The first follower sees it in milliseconds (tweet is in Manhattan immediately). Followers who haven't opened the app see nothing until they refresh. There's no 'propagation' for celebrity tweets in the push sense; the model is 'available immediately, delivered on demand.'

Interview Tips

How to discuss this topic effectively

1

Open with the write amplification math. 'Average user has 700 followers; a celebrity has 100M. The cost difference forces the hybrid.' This frames the entire interview.

2

Always quote the celebrity threshold (1M) and explain it's tunable. Real Twitter uses per-edge decisions; mention this as a follow-up.

3

Separate 'tweet creation' from 'fan-out' from 'timeline assembly'. Three distinct services, three distinct SLOs. Drawing them as one box is the most common red flag.

4

Mention asynchronous Search and Trending indexing via Kafka. Both must be decoupled from the hot tweet write path.

5

When asked about counters (likes, retweets), say 'Redis counter with periodic flush, sharded for hot tweets'. Saying 'increment a column in the database' is an instant downgrade.

Common Mistakes

Pitfalls to avoid in interviews

Choosing pure fan-out on write and not addressing celebrity accounts

A pure write fan-out melts your storage and queue infrastructure when a 100M-follower account tweets. The celebrity problem is the defining constraint of Twitter; never ignore it. Always propose a hybrid model with a celebrity threshold.

Storing tweets in a relational database with full text + media in one row

5,800 tweets/sec writes overwhelm a single Postgres primary. Tweets go in a sharded distributed KV store (Manhattan, Cassandra, DynamoDB) with tweet_id as the partition key. Media goes in S3, referenced by ID.

Treating likes as a column update on the tweet row

Hot tweets get millions of likes per hour. Updating a single Manhattan row that fast causes contention. Use Redis counters with sharding for hot tweets, periodically flushed to durable storage. The likes table itself (one row per like) lives in Cassandra partitioned by tweet_id.

Forgetting to maintain both follower-by-follower and followee-by-follower indexes

Forward queries ('who do I follow?') need shard by follower_id. Fan-out queries ('who follows Alice?') need shard by followee_id. Maintain both via dual-write on follow/unfollow. A single index is insufficient.

Indexing tweets into the search engine synchronously during write

Search indexing adds latency and couples write availability to Elasticsearch availability. Push tweets to a Kafka topic and index asynchronously. A few seconds of search lag is acceptable; tweet write must remain fast and durable independently.