System Design Article
Design Reddit (Forum / Voting)
Difficulty: Medium
Design a community-driven forum like Reddit with 50M daily active users, 500K subreddits, and the famous hot/top/best ranking algorithms that decide which posts you see. The interview centerpiece is the ranking system: how to score posts in real time as votes pour in, how to make the front page personalized without per-user fan-out, and how to render nested comment trees at sub-200 ms when a popular thread has 10,000 nested replies. We also cover voting fraud detection, the difference between hot and Wilson score, and the tiered cache that makes 50K reads per second on the front page survive a viral post.
Design Reddit (Forum / Voting)
Design a community-driven forum like Reddit with 50M daily active users, 500K subreddits, and the famous hot/top/best ranking algorithms that decide which posts you see. The interview centerpiece is the ranking system: how to score posts in real time as votes pour in, how to make the front page personalized without per-user fan-out, and how to render nested comment trees at sub-200 ms when a popular thread has 10,000 nested replies. We also cover voting fraud detection, the difference between hot and Wilson score, and the tiered cache that makes 50K reads per second on the front page survive a viral post.
909 views
24
Requirements
Functional Requirements
- Post a submission (link or text) to a subreddit.
- Comment on a post or another comment (nested arbitrarily deep).
- Vote (upvote / downvote) on posts and comments.
- Subscribe / unsubscribe to subreddits.
- View feeds: front page (subscribed subreddits), subreddit page, user profile, all (global).
- Sort feeds by hot, new, top (time-windowed), controversial, best (for comments).
- Search posts and comments.
- Moderation: subreddit mods can remove posts, ban users.
Out of Scope (state explicitly)
- Real-time chat (Reddit's chat is a separate product).
- Live threads / streaming events.
- Reddit Coins, awards, premium features.
- Notifications (mentioned briefly but not designed).
Non-Functional Requirements
- Scale: 50M DAU, 5M posts per day, 50M comments per day, 500M votes per day.
- Read-heavy: 50:1 reads vs writes (front page reads dominate).
- Feed latency: p99 < 200 ms for the front page.
- Vote latency: p99 < 100 ms; the vote button must feel instant.
- High availability: 99.95%. Subreddit downtime is annoying but not catastrophic.
- Eventually consistent vote counts (within seconds) are acceptable.
Back-of-the-Envelope Estimation
Traffic
---------- Traffic estimation ----------
DAU: 50M
Posts/day: 5M (~60/sec avg, ~180/sec peak)
Comments/day: 50M (~580/sec avg, ~1,700/sec peak)
Votes/day: 500M (~5,800/sec avg, ~17,000/sec peak)
Front-page reads/day: 500M (~5,800/sec avg, ~17,000/sec peak)
Subreddit page reads/day: 1B (~12,000/sec avg, ~35,000/sec peak)Voting is the highest-rate write. Front-page assembly is the highest-latency-sensitive read.
Storage
---------- Storage ----------
Post row: ~600 bytes (id, author, subreddit, title, body or url, timestamps, scores)
Daily post storage: 5M * 600 = 3 GB/day
Yearly: ~1.1 TB/year
5 years: ~5.5 TB (manageable; sharded)
Comment row: ~400 bytes
Daily comments: 50M * 400 = 20 GB/day
5 years: ~36 TB across sharded clusters
Vote row: ~30 bytes (post_id, user_id, value=+1/-1)
Daily votes: 500M * 30 = 15 GB/day
5 years: ~27 TBCache
Front page is community-based, not personalized. Cache the top N posts per subreddit:
---------- Subreddit hot list cache ----------
Subreddits: 500K
Top 1000 posts per sub: 1000
Per entry size: ~30 bytes (post_id + score)
Total: 500K * 1000 * 30 = ~15 GB across clusterFits in a Redis cluster with room to spare.
High-Level Design
---------- High-level architecture ----------
+----------+
| Client |
+----------+
|
v
+---------------------+
| CDN (Fastly) | <- caches popular post bodies
+---------------------+
|
v
+---------------------+
| API Gateway |
+---------------------+
/ | \
v v v
+---------+ +---------+ +---------+
| Post | | Vote | | Feed |
| Service | | Service | | Service |
+---------+ +---------+ +---------+
| | |
v v v
+---------+ +-------+ +---------+
| Postgres| | Redis | | Redis |
| (posts, | |(vote | | (sub |
| comments| | counts| | hot |
| sharded)| | & rate| | lists) |
+---------+ | lim) | +---------+
| +-------+ ^
v | |
+---------+ v |
| Cassandra| +-------+ |
| (votes, | | Kafka |-------+
| one row | | |
| per vote|+-------+
+---------+ |
v
+----------------+
| Score Updater |
| (hot, top, etc)|
+----------------+API Design
// Submit a post
POST /api/v1/subreddits/:name/posts
{
"title": "TIL about hot ranking",
"kind": "text", // text | link
"body": "Reddit's hot algorithm is...",
"url": null
}
// Response
{ "id": "01HW...", "permalink": "/r/programming/01HW.../til-about-hot-ranking" }
// Vote
POST /api/v1/posts/:id/vote
{ "value": 1 } // +1, -1, or 0 (unvote)
// Response 204
// Get a subreddit feed
GET /api/v1/subreddits/:name?sort=hot&limit=25&cursor=<opaque>
// Response
{
"items": [
{
"id": "01HW...",
"title": "TIL about hot ranking",
"author": "alice",
"subreddit": "programming",
"score": 2453,
"comment_count": 87,
"created_at": "2026-04-26T08:00:00Z",
"hot_rank_score": 12345.67 // computed by the score updater
}
],
"next_cursor": "<opaque>"
}
// Get the front page (personalized: union of subscribed subreddit feeds)
GET /api/v1/front?sort=hot&limit=25
// Get nested comments for a post
GET /api/v1/posts/:id/comments?depth=10&load=top
// Response
{
"comments": [
{
"id": "c1",
"author": "bob",
"body": "Great post",
"score": 42,
"depth": 0,
"replies": [
{"id": "c2", "parent_id": "c1", "depth": 1, ...},
{"id": "c3", "parent_id": "c1", "depth": 1, "more_replies": 50}
]
}
]
}Vote Path (the hot write)
- Client POSTs vote.
- Vote Service: a. Rate limit by user (e.g., 100 votes/min). b. Write the vote into Cassandra (insert; or upsert if user changes their vote). c. Update Redis counter for that post (
INCRBY score:<post_id> 1). d. Publish a Kafka eventvote.castfor downstream score recalculation. - Return 204 immediately.
The Score Updater consumes vote.cast, recomputes the hot score, and updates the per-subreddit Redis ZSET (subreddit_hot:<name>).
Feed Read Path
- For a subreddit page: read top N from
subreddit_hot:<name>(Redis ZSET, O(log N) per insert). - For the front page: union the user's subscribed subreddits' top lists.
- User subscribes to ~50 subreddits on average.
- Read top 100 from each (
subreddit_hot:<name>ZSET) in parallel = 5,000 candidate posts. - Merge by hot score, deduplicate (rare cross-posts), filter (NSFW, blocked), return top 25.
This is fan-out on read with bounded fan-out (50 subreddits, not 5,000 followers). The front page is essentially personalized but cheap because subreddit lists are pre-ranked.
Detailed Design
The two interesting components are the ranking algorithms and nested comments.
Ranking Algorithms
Hot (the front page sort)
Reddit's hot ranking balances vote score with post age. The original (open-source from circa 2010):
import math
from datetime import datetime
EPOCH = datetime(2005, 12, 8) # Reddit launch
def hot_score(ups, downs, created_at):
score = ups - downs
order = math.log10(max(abs(score), 1))
sign = 1 if score > 0 else (-1 if score < 0 else 0)
seconds = (created_at - EPOCH).total_seconds() - 1134028003
return round(sign * order + seconds / 45000, 7)Key properties:
- Logarithmic in score: a post with 10,000 votes is only ~4 'hotter' than one with 1; first votes matter most.
- Linear in time: 12.5 hours (45,000 sec) of age cost the same as one log10 of votes. Newer posts win even with fewer votes.
- Sign-aware: a heavily-downvoted post sinks fast.
The hot score is stored on the post row and recomputed on every vote (cheap; 5 multiplications). The front-end ZSET is updated with the new score.
Best (the comment sort - Wilson score)
Why not just ups - downs? Because a comment with 5 ups and 0 downs is more 'reliably good' than one with 20 ups and 15 downs, even though the latter has higher net score. We want a confidence interval on the true upvote ratio.
import math
def wilson_score(ups, downs, confidence=1.96):
n = ups + downs
if n == 0:
return 0
p = ups / n
z = confidence # 95%
return (p + z*z/(2*n) - z * math.sqrt((p*(1-p) + z*z/(4*n)) / n)) / (1 + z*z/n)This returns the lower bound of a 95% confidence interval for the true upvote ratio. A 5/0 comment scores ~0.45 (high confidence floor); a 20/15 comment scores ~0.43. Sample size matters more than raw count.
Top (time-windowed)
SELECT * FROM posts WHERE subreddit = ? AND created_at > now() - <window> ORDER BY (ups - downs) DESC LIMIT 25. Cache the result per (subreddit, window) for 60 seconds.
Controversial
def controversial_score(ups, downs):
if ups + downs == 0:
return 0
magnitude = ups + downs
balance = downs / ups if ups > downs else ups / downs if downs > 0 else 0
return magnitude ** balanceHigh when a post has many votes AND is roughly balanced upvotes-to-downvotes.
Subreddit Hot List Cache
Each subreddit has a Redis ZSET subreddit_hot:<name> with entries (post_id, hot_score). On every vote, the Score Updater writes the new hot score with ZADD. The ZSET auto-sorts. Reads are O(log N) for top-K.
subreddit_hot:programming = ZSET [
("01HW...", 12345.67),
("01HV...", 12340.12),
...
] (capped at 1000 entries; trim on insert)Nested Comments
A viral post can have 10,000+ comments nested up to dozens of levels. Two storage models exist; pick wisely.
Adjacency List (the simple choice)
CREATE TABLE comments (
id BIGINT PRIMARY KEY,
post_id BIGINT NOT NULL,
parent_id BIGINT, -- null for top-level
author_id BIGINT,
body TEXT,
score INTEGER DEFAULT 0,
created_at TIMESTAMPTZ
);
CREATE INDEX idx_comments_post ON comments (post_id, parent_id, score DESC);Pros: Simple. Inserts are trivial.
Cons: To render a tree, you query all comments for a post (potentially 10K rows) and reconstruct the tree in memory. Network and memory cost grows with thread size.
Materialized Path
Store the path from root: c1.c5.c12.c47. Range queries can fetch entire subtrees by prefix.
CREATE TABLE comments (
id BIGINT PRIMARY KEY,
post_id BIGINT NOT NULL,
path VARCHAR(500), -- e.g., "00001.00005.00012.00047"
depth INTEGER,
score INTEGER,
body TEXT,
...
);
CREATE INDEX idx_comments_path ON comments (post_id, path);Pros: Fetch subtree with WHERE path LIKE 'c1.c5.%'. Limit depth easily.
Cons: Path strings get long. Inserts under high-vote parents need a sequence to avoid path collisions.
Lazy Loading
For either model, don't fetch all 10K comments. Render top-K top-level (sorted by Wilson score), then their top-K children, recursively for ~3 levels. Show 'load 50 more replies' for the rest. The client requests subtrees on demand.
// Initial GET
{
"comments": [
{ "id": "c1", "depth": 0, "replies": [...10 children...], "more_replies": 240 },
...
]
}
// On 'load more' click
GET /api/v1/posts/:post/comments/:c1/children?cursor=...Vote Manipulation Detection
Real Reddit fights vote manipulation constantly (paid upvote services, bot rings). Mitigations:
- Rate limit per user: 100 votes/min; rapid voting gets shadow-banned (votes accepted to UI, ignored on backend).
- Vote weighting by account age: brand new accounts have 0.1x vote weight for 24 hours.
- Anomaly detection: a Kafka consumer flags posts with abnormal vote-velocity from new accounts; these enter a review queue.
- Subreddit-level vote fuzzing (controversial historical Reddit feature): displayed vote counts are slightly randomized to make manipulation harder to verify.
Data Model
Postgres (sharded by subreddit_id): posts and comments
Why shard by subreddit? Most queries are within one subreddit ('show me hot in /r/programming'). Cross-subreddit queries (the front page) parallelize across shards.
CREATE TABLE posts (
id BIGINT PRIMARY KEY,
subreddit_id BIGINT NOT NULL,
author_id BIGINT,
title VARCHAR(300),
kind VARCHAR(8), -- text | link
body TEXT,
url VARCHAR(2048),
score INTEGER DEFAULT 0,
hot_score DOUBLE PRECISION,
comment_count INTEGER DEFAULT 0,
nsfw BOOLEAN,
created_at TIMESTAMPTZ,
deleted_at TIMESTAMPTZ
);
CREATE TABLE comments (
id BIGINT PRIMARY KEY,
post_id BIGINT NOT NULL,
parent_id BIGINT,
author_id BIGINT,
body TEXT,
score INTEGER DEFAULT 0,
created_at TIMESTAMPTZ,
deleted_at TIMESTAMPTZ
);Cassandra: votes (one row per vote, write-heavy, append-only)
CREATE TABLE post_votes (
post_id bigint,
user_id bigint,
value tinyint, -- +1, -1
voted_at timestamp,
PRIMARY KEY ((post_id), user_id)
);
CREATE TABLE user_votes (
user_id bigint,
post_id bigint,
value tinyint,
voted_at timestamp,
PRIMARY KEY ((user_id), post_id)
);The user_votes table is the inverse, sharded by user_id, used to render the user's vote state ('show me which posts I upvoted').
Redis: counters, hot lists, rate limits
score:<post_id>-> integer (live score).subreddit_hot:<name>-> ZSET (post_id, hot_score), capped at 1000.user_subscriptions:<user_id>-> SET of subreddit IDs.rate:vote:<user_id>-> INCR with EXPIRE 60.
Object storage: post images / videos
Media goes to S3 with CDN front. Post rows store URLs.
Scaling and Bottlenecks
Viral front-page post: 1M concurrent readers
- The post body is cached at the CDN (
Cache-Control: public, max-age=60), absorbs ~99% of reads. - Comment tree: cache the rendered top-100 comments per post in Redis with a 30-second TTL.
- Vote button stays responsive: votes go straight to Cassandra + Redis counter, no synchronous coupling to the read path.
Front-page assembly cost
Front-page reads do parallel ZRANGE on ~50 subreddit hot lists. Each ZRANGE is ~1 ms; 50 in parallel is bounded by the slowest. p99 ~10 ms for the assembly step. Hydrating 25 post objects is another ~30 ms.
Comment thread with 10K nested replies
Lazy load: initial render shows top 100 (limited depth). Subsequent 'load more' fetches subtrees on demand. A user rarely scrolls all 10K comments; we never need to ship them all.
Voting at 17K/sec peak
- Cassandra absorbs writes via partition spreading (post_id is the partition key).
- Redis counters handle the increment.
- Rate limiting at the gateway prevents bot-driven floods.
Search at scale
Elasticsearch index of posts and comments. Asynchronous indexing via Kafka. Indices are sharded by date for ILM. Older posts (> 1 year) move to a slower tier.
Trade-offs and Alternatives
Community feed vs follow feed (vs Twitter)
Reddit's feed is community-based: subscribe to subreddits, not users. This means:
- No celebrity problem (a popular subreddit isn't a single 'user' with 10M followers; it has 10M subscribers but the unit of distribution is the post, not the author).
- Cheaper feed assembly (50 subreddits is bounded; 5,000 follows is not).
- Different product feel (you discover via communities, not via friends).
Use community feeds when the content is topical, public, and discovery-focused. Use follow feeds when relationships matter (Instagram, Twitter friend feeds).
Materialized path vs adjacency list for comments
Real Reddit uses adjacency list with smart caching. The trees are big but lazy loading hides it. Materialized path is faster for subtree fetches but harder to maintain (path resequencing on edits, depth limits).
For an interview, propose adjacency list + lazy loading; mention materialized path as an option if the interviewer wants subtree queries to be ultra-fast.
Postgres sharded by subreddit vs Cassandra for posts
We chose Postgres because:
- Posts are queried with JOINs (subreddit metadata, author info) and updates (score, comment_count).
- Sharding by subreddit_id gives natural locality.
- Reddit was historically Postgres-heavy; the operational pattern is well-understood.
Cassandra would handle the write rate fine but loses transactional updates and joins. Pick what your team operates well.
Why fuzz vote counts (historically)?
Reddit's old approach to anti-manipulation was randomizing displayed counts within a small range. This made it hard for paid-vote services to verify their work. The community hated it (dishonest counts). Modern Reddit ditched this in favor of behind-the-scenes shadow-banning of suspicious votes.
Hot algorithm vs ML ranking
The original hot algorithm is deterministic and explainable. Modern Reddit also uses ML for personalization on the front page (which subreddits to surface to YOU, given your subscriptions and read history). The hot score is the base signal; ML re-ranks at read time.
Strong vs eventual consistency on votes
Vote counts are eventually consistent: Redis counter and Cassandra row may be momentarily out of sync. The user sees their own vote reflected immediately (the Vote Service updates Redis and returns); other users see the new count within seconds. Acceptable.
Real-World Examples
How real systems implement this in production
Y Combinator's news site uses a similar hot algorithm: score / (age_in_hours + 2)^1.8. Single-machine architecture (Arc / Racket) for years; recently scaled with caching but still tiny operationally. ~10M monthly visitors.
Trade-off: HN proves you can run a top-tier social-content site with surprising simplicity if your moderation philosophy is consistent and your community is small and engaged. Reddit's complexity comes from its community count (500K subreddits), not its underlying social model.
Open-source forum used by Stack Overflow, Slack support, and many communities. Adjacency-list comments, server-rendered, Postgres-backed. Real-time updates via WebSockets. Optimized for long-form discussion rather than meme-scale virality.
Trade-off: Discourse trades viral-scale optimization (no millions-of-comments per post) for richer threading UX and easier self-hosting. Different sweet spot: medium-size focused communities, not Reddit-scale.
Federated alternative to Reddit using ActivityPub. Each instance hosts its own communities, federates posts and comments to subscribed instances. Same hot algorithm, similar data model.
Trade-off: Federation eliminates centralized scaling pressure but introduces inter-instance reliability and moderation complexity. A single instance can become unresponsive under federation load from a viral post on another instance.
Q&A site with voting on questions and answers. Uses a different ranking (acceptance, score, time) optimized for 'find the right answer fast' rather than 'discover content'. Server-rendered, cached aggressively.
Trade-off: Stack Overflow's Q&A model means there's no real-time hot list; pages are stable for hours. Trades Reddit's discovery feel for canonical-answer optimization. Same voting primitives, totally different ranking philosophy.
Quick Interview Phrases
Key terms to use in your answer
Common Interview Questions
Questions you might be asked about this topic
Each vote triggers a Redis INCR on score:<post_id>. The Score Updater (Kafka consumer of vote.cast events) recomputes hot_score = log10(|score|) * sign + (created_at - epoch) / 45000 and updates the per-subreddit ZSET via ZADD. Hot ZSETs are capped at 1000 entries via ZADD GT (only update if new score is greater) plus periodic trim. Cost per vote: a few microseconds of math, two Redis writes.
CDN absorbs ~99% via Cache-Control on the post body. The remaining 10K/sec hits the Read Service. Post metadata is in Postgres page cache (one row, hot). Comment top-100 is in Redis with a 30-sec TTL; one cache fill per 30 seconds against Postgres. Vote count is in Redis. The page renders in p99 ~50 ms. The vote button posts independently.
Rate limit per account (100 votes/min). Anomaly detection in a Kafka consumer flags abnormal velocity from new accounts or geographically clustered accounts. Suspicious votes are shadow-banned: accepted to the user's UI, ignored in score computation. Periodic reconciliation removes confirmed-fraudulent votes from totals. Subreddit mods can also remove posts manually.
Compute controversial_score = (ups + downs) ^ (min(ups,downs) / max(ups,downs)) on every vote update. Store on the comment row. Maintain a separate Redis ZSET per post for controversial sort: comment_controversial:<post_id>. Read top-K when sort=controversial. The other sorts (best/Wilson, top/score, new/created_at) each have their own ZSET or just use the indexed column on read.
Stickies bypass hot ranking; they pin to the top by a sticky=true flag. No fan-out needed because subreddit lists are read-time assembly. The 10M subscribers will see the sticky next time they load the subreddit; no precomputation. The post itself is one row in Postgres + entry in subreddit_hot ZSET. The CDN caches the rendered subreddit page for short TTLs (~30 sec) so a sticky propagates within seconds globally.
Interview Tips
How to discuss this topic effectively
Open with 'Reddit's feed is community-based, not follow-based.' This is the most important architectural distinction from Twitter; surfacing it early signals depth.
Walk through the hot algorithm formula. Logarithm of score plus linear time penalty. Most candidates wave hands here; writing the formula puts you ahead.
Contrast hot (posts) and Wilson score (comments). Different problems, different math. Confidence intervals matter for low-sample data; that's why best uses Wilson and hot doesn't.
Bring up nested comment lazy loading. Saying 'I'd fetch all 10K comments and tree them in memory' fails the scaling question.
Mention shadow banning of suspicious votes. It's the magic phrase for vote manipulation that keeps the hot path fast.
Common Mistakes
Pitfalls to avoid in interviews
Sorting by raw vote count for the front page
Raw vote count means a 5-year-old post with 100K upvotes always beats today's posts. Use the hot algorithm: score + time decay, with logarithmic vote weighting so first votes matter most. Newer posts win at equal score.
Using ups - downs for comment sorting
Vote difference rewards loud comments and punishes new ones. A 5/0 comment is more reliable than a 20/15 comment. Use Wilson score: lower confidence bound on the true upvote ratio. Sample size matters.
Loading the entire comment tree on first request
A viral post has 10K nested comments; shipping all that on initial load is megabytes per request. Render top-K top-level comments with bounded depth, lazy-load deeper subtrees on demand. The client controls expansion.
Treating votes as immediate database writes that block the user response
At 17K votes/sec peak, blocking on durable storage adds latency. Write to Cassandra + Redis counter, return immediately. The Score Updater consumes vote events asynchronously to recompute hot scores. Vote button stays sub-100ms.
Doing fan-out-on-write for the front page (per user)
The front page is the union of subscribed subreddits' hot lists. Subreddit hot lists are computed once per subreddit (not per user), then assembled at read time from the user's subscription set. Per-user fan-out would be wasted work since most subscriptions overlap heavily across users.
