System Design Article

Design Reddit (Forum / Voting)

Difficulty: Medium

Design a community-driven forum like Reddit with 50M daily active users, 500K subreddits, and the famous hot/top/best ranking algorithms that decide which posts you see. The interview centerpiece is the ranking system: how to score posts in real time as votes pour in, how to make the front page personalized without per-user fan-out, and how to render nested comment trees at sub-200 ms when a popular thread has 10,000 nested replies. We also cover voting fraud detection, the difference between hot and Wilson score, and the tiered cache that makes 50K reads per second on the front page survive a viral post.

System Design
/

Design Reddit (Forum / Voting)

Design Reddit (Forum / Voting)

Design a community-driven forum like Reddit with 50M daily active users, 500K subreddits, and the famous hot/top/best ranking algorithms that decide which posts you see. The interview centerpiece is the ranking system: how to score posts in real time as votes pour in, how to make the front page personalized without per-user fan-out, and how to render nested comment trees at sub-200 ms when a popular thread has 10,000 nested replies. We also cover voting fraud detection, the difference between hot and Wilson score, and the tiered cache that makes 50K reads per second on the front page survive a viral post.

System Design
Medium
design-reddit
case-study
social-content-platforms
voting-systems
hot-ranking
wilson-score
nested-comments
subreddits
feed-ranking
social-media
system-design
intermediate
premium

909 views

24

Requirements

Functional Requirements

  1. Post a submission (link or text) to a subreddit.
  2. Comment on a post or another comment (nested arbitrarily deep).
  3. Vote (upvote / downvote) on posts and comments.
  4. Subscribe / unsubscribe to subreddits.
  5. View feeds: front page (subscribed subreddits), subreddit page, user profile, all (global).
  6. Sort feeds by hot, new, top (time-windowed), controversial, best (for comments).
  7. Search posts and comments.
  8. Moderation: subreddit mods can remove posts, ban users.

Out of Scope (state explicitly)

  • Real-time chat (Reddit's chat is a separate product).
  • Live threads / streaming events.
  • Reddit Coins, awards, premium features.
  • Notifications (mentioned briefly but not designed).

Non-Functional Requirements

  1. Scale: 50M DAU, 5M posts per day, 50M comments per day, 500M votes per day.
  2. Read-heavy: 50:1 reads vs writes (front page reads dominate).
  3. Feed latency: p99 < 200 ms for the front page.
  4. Vote latency: p99 < 100 ms; the vote button must feel instant.
  5. High availability: 99.95%. Subreddit downtime is annoying but not catastrophic.
  6. Eventually consistent vote counts (within seconds) are acceptable.

Back-of-the-Envelope Estimation

Traffic

Text
---------- Traffic estimation ----------
DAU:                      50M
Posts/day:                5M           (~60/sec avg, ~180/sec peak)
Comments/day:             50M          (~580/sec avg, ~1,700/sec peak)
Votes/day:                500M         (~5,800/sec avg, ~17,000/sec peak)
Front-page reads/day:     500M         (~5,800/sec avg, ~17,000/sec peak)
Subreddit page reads/day: 1B           (~12,000/sec avg, ~35,000/sec peak)

Voting is the highest-rate write. Front-page assembly is the highest-latency-sensitive read.

Storage

Text
---------- Storage ----------
Post row:           ~600 bytes (id, author, subreddit, title, body or url, timestamps, scores)
Daily post storage: 5M * 600 = 3 GB/day
Yearly:             ~1.1 TB/year
5 years:            ~5.5 TB (manageable; sharded)

Comment row:        ~400 bytes
Daily comments:     50M * 400 = 20 GB/day
5 years:            ~36 TB across sharded clusters

Vote row:           ~30 bytes (post_id, user_id, value=+1/-1)
Daily votes:        500M * 30 = 15 GB/day
5 years:            ~27 TB

Cache

Front page is community-based, not personalized. Cache the top N posts per subreddit:

Text
---------- Subreddit hot list cache ----------
Subreddits:                500K
Top 1000 posts per sub:    1000
Per entry size:            ~30 bytes (post_id + score)
Total:                     500K * 1000 * 30 = ~15 GB across cluster

Fits in a Redis cluster with room to spare.

High-Level Design

Text
---------- High-level architecture ----------
            +----------+
            |  Client  |
            +----------+
                 |
                 v
        +---------------------+
        |   CDN (Fastly)      |  <- caches popular post bodies
        +---------------------+
                 |
                 v
        +---------------------+
        |    API Gateway      |
        +---------------------+
          /     |        \
         v      v         v
  +---------+ +---------+ +---------+
  |  Post   | |  Vote   | |  Feed   |
  | Service | | Service | | Service |
  +---------+ +---------+ +---------+
      |          |            |
      v          v            v
  +---------+ +-------+   +---------+
  | Postgres| | Redis |   | Redis   |
  | (posts, | |(vote  |   | (sub    |
  | comments| | counts|   |  hot    |
  | sharded)| | & rate|   |  lists) |
  +---------+ |  lim) |   +---------+
      |       +-------+        ^
      v          |             |
  +---------+    v             |
  | Cassandra| +-------+       |
  | (votes,  | | Kafka |-------+
  |  one row | |       |
  |  per vote|+-------+
  +---------+    |
                 v
           +----------------+
           | Score Updater  |
           | (hot, top, etc)|
           +----------------+

API Design

Jsonc
// Submit a post
POST /api/v1/subreddits/:name/posts
{
    "title": "TIL about hot ranking",
    "kind": "text",            // text | link
    "body": "Reddit's hot algorithm is...",
    "url": null
}
// Response
{ "id": "01HW...", "permalink": "/r/programming/01HW.../til-about-hot-ranking" }

// Vote
POST /api/v1/posts/:id/vote
{ "value": 1 }              // +1, -1, or 0 (unvote)
// Response 204

// Get a subreddit feed
GET /api/v1/subreddits/:name?sort=hot&limit=25&cursor=<opaque>
// Response
{
    "items": [
        {
            "id": "01HW...",
            "title": "TIL about hot ranking",
            "author": "alice",
            "subreddit": "programming",
            "score": 2453,
            "comment_count": 87,
            "created_at": "2026-04-26T08:00:00Z",
            "hot_rank_score": 12345.67   // computed by the score updater
        }
    ],
    "next_cursor": "<opaque>"
}

// Get the front page (personalized: union of subscribed subreddit feeds)
GET /api/v1/front?sort=hot&limit=25

// Get nested comments for a post
GET /api/v1/posts/:id/comments?depth=10&load=top
// Response
{
    "comments": [
        {
            "id": "c1",
            "author": "bob",
            "body": "Great post",
            "score": 42,
            "depth": 0,
            "replies": [
                {"id": "c2", "parent_id": "c1", "depth": 1, ...},
                {"id": "c3", "parent_id": "c1", "depth": 1, "more_replies": 50}
            ]
        }
    ]
}

Vote Path (the hot write)

  1. Client POSTs vote.
  2. Vote Service: a. Rate limit by user (e.g., 100 votes/min). b. Write the vote into Cassandra (insert; or upsert if user changes their vote). c. Update Redis counter for that post (INCRBY score:<post_id> 1). d. Publish a Kafka event vote.cast for downstream score recalculation.
  3. Return 204 immediately.

The Score Updater consumes vote.cast, recomputes the hot score, and updates the per-subreddit Redis ZSET (subreddit_hot:<name>).

Feed Read Path

  1. For a subreddit page: read top N from subreddit_hot:<name> (Redis ZSET, O(log N) per insert).
  2. For the front page: union the user's subscribed subreddits' top lists.
    • User subscribes to ~50 subreddits on average.
    • Read top 100 from each (subreddit_hot:<name> ZSET) in parallel = 5,000 candidate posts.
    • Merge by hot score, deduplicate (rare cross-posts), filter (NSFW, blocked), return top 25.

This is fan-out on read with bounded fan-out (50 subreddits, not 5,000 followers). The front page is essentially personalized but cheap because subreddit lists are pre-ranked.

Detailed Design

The two interesting components are the ranking algorithms and nested comments.

Ranking Algorithms

Hot (the front page sort)

Reddit's hot ranking balances vote score with post age. The original (open-source from circa 2010):

import math
from datetime import datetime

EPOCH = datetime(2005, 12, 8)  # Reddit launch

def hot_score(ups, downs, created_at):
    score = ups - downs
    order = math.log10(max(abs(score), 1))
    sign = 1 if score > 0 else (-1 if score < 0 else 0)
    seconds = (created_at - EPOCH).total_seconds() - 1134028003
    return round(sign * order + seconds / 45000, 7)

Key properties:

  • Logarithmic in score: a post with 10,000 votes is only ~4 'hotter' than one with 1; first votes matter most.
  • Linear in time: 12.5 hours (45,000 sec) of age cost the same as one log10 of votes. Newer posts win even with fewer votes.
  • Sign-aware: a heavily-downvoted post sinks fast.

The hot score is stored on the post row and recomputed on every vote (cheap; 5 multiplications). The front-end ZSET is updated with the new score.

Best (the comment sort - Wilson score)

Why not just ups - downs? Because a comment with 5 ups and 0 downs is more 'reliably good' than one with 20 ups and 15 downs, even though the latter has higher net score. We want a confidence interval on the true upvote ratio.

import math

def wilson_score(ups, downs, confidence=1.96):
    n = ups + downs
    if n == 0:
        return 0
    p = ups / n
    z = confidence  # 95%
    return (p + z*z/(2*n) - z * math.sqrt((p*(1-p) + z*z/(4*n)) / n)) / (1 + z*z/n)

This returns the lower bound of a 95% confidence interval for the true upvote ratio. A 5/0 comment scores ~0.45 (high confidence floor); a 20/15 comment scores ~0.43. Sample size matters more than raw count.

Top (time-windowed)

SELECT * FROM posts WHERE subreddit = ? AND created_at > now() - <window> ORDER BY (ups - downs) DESC LIMIT 25. Cache the result per (subreddit, window) for 60 seconds.

Controversial
Python
def controversial_score(ups, downs):
    if ups + downs == 0:
        return 0
    magnitude = ups + downs
    balance = downs / ups if ups > downs else ups / downs if downs > 0 else 0
    return magnitude ** balance

High when a post has many votes AND is roughly balanced upvotes-to-downvotes.

Subreddit Hot List Cache

Each subreddit has a Redis ZSET subreddit_hot:<name> with entries (post_id, hot_score). On every vote, the Score Updater writes the new hot score with ZADD. The ZSET auto-sorts. Reads are O(log N) for top-K.

Text
subreddit_hot:programming = ZSET [
    ("01HW...", 12345.67),
    ("01HV...", 12340.12),
    ...
] (capped at 1000 entries; trim on insert)

Nested Comments

A viral post can have 10,000+ comments nested up to dozens of levels. Two storage models exist; pick wisely.

Adjacency List (the simple choice)
SQL
CREATE TABLE comments (
    id          BIGINT PRIMARY KEY,
    post_id     BIGINT NOT NULL,
    parent_id   BIGINT,                -- null for top-level
    author_id   BIGINT,
    body        TEXT,
    score       INTEGER DEFAULT 0,
    created_at  TIMESTAMPTZ
);
CREATE INDEX idx_comments_post ON comments (post_id, parent_id, score DESC);

Pros: Simple. Inserts are trivial.

Cons: To render a tree, you query all comments for a post (potentially 10K rows) and reconstruct the tree in memory. Network and memory cost grows with thread size.

Materialized Path

Store the path from root: c1.c5.c12.c47. Range queries can fetch entire subtrees by prefix.

SQL
CREATE TABLE comments (
    id          BIGINT PRIMARY KEY,
    post_id     BIGINT NOT NULL,
    path        VARCHAR(500),          -- e.g., "00001.00005.00012.00047"
    depth       INTEGER,
    score       INTEGER,
    body        TEXT,
    ...
);
CREATE INDEX idx_comments_path ON comments (post_id, path);

Pros: Fetch subtree with WHERE path LIKE 'c1.c5.%'. Limit depth easily.

Cons: Path strings get long. Inserts under high-vote parents need a sequence to avoid path collisions.

Lazy Loading

For either model, don't fetch all 10K comments. Render top-K top-level (sorted by Wilson score), then their top-K children, recursively for ~3 levels. Show 'load 50 more replies' for the rest. The client requests subtrees on demand.

Jsonc
// Initial GET
{
    "comments": [
        { "id": "c1", "depth": 0, "replies": [...10 children...], "more_replies": 240 },
        ...
    ]
}

// On 'load more' click
GET /api/v1/posts/:post/comments/:c1/children?cursor=...

Vote Manipulation Detection

Real Reddit fights vote manipulation constantly (paid upvote services, bot rings). Mitigations:

  • Rate limit per user: 100 votes/min; rapid voting gets shadow-banned (votes accepted to UI, ignored on backend).
  • Vote weighting by account age: brand new accounts have 0.1x vote weight for 24 hours.
  • Anomaly detection: a Kafka consumer flags posts with abnormal vote-velocity from new accounts; these enter a review queue.
  • Subreddit-level vote fuzzing (controversial historical Reddit feature): displayed vote counts are slightly randomized to make manipulation harder to verify.

Data Model

Postgres (sharded by subreddit_id): posts and comments

Why shard by subreddit? Most queries are within one subreddit ('show me hot in /r/programming'). Cross-subreddit queries (the front page) parallelize across shards.

SQL
CREATE TABLE posts (
    id            BIGINT PRIMARY KEY,
    subreddit_id  BIGINT NOT NULL,
    author_id     BIGINT,
    title         VARCHAR(300),
    kind          VARCHAR(8),         -- text | link
    body          TEXT,
    url           VARCHAR(2048),
    score         INTEGER DEFAULT 0,
    hot_score     DOUBLE PRECISION,
    comment_count INTEGER DEFAULT 0,
    nsfw          BOOLEAN,
    created_at    TIMESTAMPTZ,
    deleted_at    TIMESTAMPTZ
);

CREATE TABLE comments (
    id          BIGINT PRIMARY KEY,
    post_id     BIGINT NOT NULL,
    parent_id   BIGINT,
    author_id   BIGINT,
    body        TEXT,
    score       INTEGER DEFAULT 0,
    created_at  TIMESTAMPTZ,
    deleted_at  TIMESTAMPTZ
);

Cassandra: votes (one row per vote, write-heavy, append-only)

Text
CREATE TABLE post_votes (
    post_id   bigint,
    user_id   bigint,
    value     tinyint,                   -- +1, -1
    voted_at  timestamp,
    PRIMARY KEY ((post_id), user_id)
);

CREATE TABLE user_votes (
    user_id   bigint,
    post_id   bigint,
    value     tinyint,
    voted_at  timestamp,
    PRIMARY KEY ((user_id), post_id)
);

The user_votes table is the inverse, sharded by user_id, used to render the user's vote state ('show me which posts I upvoted').

Redis: counters, hot lists, rate limits

  • score:<post_id> -> integer (live score).
  • subreddit_hot:<name> -> ZSET (post_id, hot_score), capped at 1000.
  • user_subscriptions:<user_id> -> SET of subreddit IDs.
  • rate:vote:<user_id> -> INCR with EXPIRE 60.

Object storage: post images / videos

Media goes to S3 with CDN front. Post rows store URLs.

Scaling and Bottlenecks

Viral front-page post: 1M concurrent readers

  • The post body is cached at the CDN (Cache-Control: public, max-age=60), absorbs ~99% of reads.
  • Comment tree: cache the rendered top-100 comments per post in Redis with a 30-second TTL.
  • Vote button stays responsive: votes go straight to Cassandra + Redis counter, no synchronous coupling to the read path.

Front-page assembly cost

Front-page reads do parallel ZRANGE on ~50 subreddit hot lists. Each ZRANGE is ~1 ms; 50 in parallel is bounded by the slowest. p99 ~10 ms for the assembly step. Hydrating 25 post objects is another ~30 ms.

Comment thread with 10K nested replies

Lazy load: initial render shows top 100 (limited depth). Subsequent 'load more' fetches subtrees on demand. A user rarely scrolls all 10K comments; we never need to ship them all.

Voting at 17K/sec peak

  • Cassandra absorbs writes via partition spreading (post_id is the partition key).
  • Redis counters handle the increment.
  • Rate limiting at the gateway prevents bot-driven floods.

Search at scale

Elasticsearch index of posts and comments. Asynchronous indexing via Kafka. Indices are sharded by date for ILM. Older posts (> 1 year) move to a slower tier.

Trade-offs and Alternatives

Community feed vs follow feed (vs Twitter)

Reddit's feed is community-based: subscribe to subreddits, not users. This means:

  • No celebrity problem (a popular subreddit isn't a single 'user' with 10M followers; it has 10M subscribers but the unit of distribution is the post, not the author).
  • Cheaper feed assembly (50 subreddits is bounded; 5,000 follows is not).
  • Different product feel (you discover via communities, not via friends).

Use community feeds when the content is topical, public, and discovery-focused. Use follow feeds when relationships matter (Instagram, Twitter friend feeds).

Materialized path vs adjacency list for comments

Real Reddit uses adjacency list with smart caching. The trees are big but lazy loading hides it. Materialized path is faster for subtree fetches but harder to maintain (path resequencing on edits, depth limits).

For an interview, propose adjacency list + lazy loading; mention materialized path as an option if the interviewer wants subtree queries to be ultra-fast.

Postgres sharded by subreddit vs Cassandra for posts

We chose Postgres because:

  • Posts are queried with JOINs (subreddit metadata, author info) and updates (score, comment_count).
  • Sharding by subreddit_id gives natural locality.
  • Reddit was historically Postgres-heavy; the operational pattern is well-understood.

Cassandra would handle the write rate fine but loses transactional updates and joins. Pick what your team operates well.

Why fuzz vote counts (historically)?

Reddit's old approach to anti-manipulation was randomizing displayed counts within a small range. This made it hard for paid-vote services to verify their work. The community hated it (dishonest counts). Modern Reddit ditched this in favor of behind-the-scenes shadow-banning of suspicious votes.

Hot algorithm vs ML ranking

The original hot algorithm is deterministic and explainable. Modern Reddit also uses ML for personalization on the front page (which subreddits to surface to YOU, given your subscriptions and read history). The hot score is the base signal; ML re-ranks at read time.

Strong vs eventual consistency on votes

Vote counts are eventually consistent: Redis counter and Cassandra row may be momentarily out of sync. The user sees their own vote reflected immediately (the Vote Service updates Redis and returns); other users see the new count within seconds. Acceptable.

Real-World Examples

How real systems implement this in production

Hacker News

Y Combinator's news site uses a similar hot algorithm: score / (age_in_hours + 2)^1.8. Single-machine architecture (Arc / Racket) for years; recently scaled with caching but still tiny operationally. ~10M monthly visitors.

Trade-off: HN proves you can run a top-tier social-content site with surprising simplicity if your moderation philosophy is consistent and your community is small and engaged. Reddit's complexity comes from its community count (500K subreddits), not its underlying social model.

Discourse (forum software)

Open-source forum used by Stack Overflow, Slack support, and many communities. Adjacency-list comments, server-rendered, Postgres-backed. Real-time updates via WebSockets. Optimized for long-form discussion rather than meme-scale virality.

Trade-off: Discourse trades viral-scale optimization (no millions-of-comments per post) for richer threading UX and easier self-hosting. Different sweet spot: medium-size focused communities, not Reddit-scale.

Lemmy

Federated alternative to Reddit using ActivityPub. Each instance hosts its own communities, federates posts and comments to subscribed instances. Same hot algorithm, similar data model.

Trade-off: Federation eliminates centralized scaling pressure but introduces inter-instance reliability and moderation complexity. A single instance can become unresponsive under federation load from a viral post on another instance.

Stack Overflow

Q&A site with voting on questions and answers. Uses a different ranking (acceptance, score, time) optimized for 'find the right answer fast' rather than 'discover content'. Server-rendered, cached aggressively.

Trade-off: Stack Overflow's Q&A model means there's no real-time hot list; pages are stable for hours. Trades Reddit's discovery feel for canonical-answer optimization. Same voting primitives, totally different ranking philosophy.

Quick Interview Phrases

Key terms to use in your answer

hot ranking algorithm
Wilson score for best
community-based feed
lazy comment loading
vote manipulation detection
shadow banning

Common Interview Questions

Questions you might be asked about this topic

Each vote triggers a Redis INCR on score:<post_id>. The Score Updater (Kafka consumer of vote.cast events) recomputes hot_score = log10(|score|) * sign + (created_at - epoch) / 45000 and updates the per-subreddit ZSET via ZADD. Hot ZSETs are capped at 1000 entries via ZADD GT (only update if new score is greater) plus periodic trim. Cost per vote: a few microseconds of math, two Redis writes.

Interview Tips

How to discuss this topic effectively

1

Open with 'Reddit's feed is community-based, not follow-based.' This is the most important architectural distinction from Twitter; surfacing it early signals depth.

2

Walk through the hot algorithm formula. Logarithm of score plus linear time penalty. Most candidates wave hands here; writing the formula puts you ahead.

3

Contrast hot (posts) and Wilson score (comments). Different problems, different math. Confidence intervals matter for low-sample data; that's why best uses Wilson and hot doesn't.

4

Bring up nested comment lazy loading. Saying 'I'd fetch all 10K comments and tree them in memory' fails the scaling question.

5

Mention shadow banning of suspicious votes. It's the magic phrase for vote manipulation that keeps the hot path fast.

Common Mistakes

Pitfalls to avoid in interviews

Sorting by raw vote count for the front page

Raw vote count means a 5-year-old post with 100K upvotes always beats today's posts. Use the hot algorithm: score + time decay, with logarithmic vote weighting so first votes matter most. Newer posts win at equal score.

Using ups - downs for comment sorting

Vote difference rewards loud comments and punishes new ones. A 5/0 comment is more reliable than a 20/15 comment. Use Wilson score: lower confidence bound on the true upvote ratio. Sample size matters.

Loading the entire comment tree on first request

A viral post has 10K nested comments; shipping all that on initial load is megabytes per request. Render top-K top-level comments with bounded depth, lazy-load deeper subtrees on demand. The client controls expansion.

Treating votes as immediate database writes that block the user response

At 17K votes/sec peak, blocking on durable storage adds latency. Write to Cassandra + Redis counter, return immediately. The Score Updater consumes vote events asynchronously to recompute hot scores. Vote button stays sub-100ms.

Doing fan-out-on-write for the front page (per user)

The front page is the union of subscribed subreddits' hot lists. Subreddit hot lists are computed once per subreddit (not per user), then assembled at read time from the user's subscription set. Per-user fan-out would be wasted work since most subscriptions overlap heavily across users.