System Design Article

Design a Ticketing System (Ticketmaster)

Difficulty: Medium

Design a Ticketmaster-style ticketing platform that sells reserved seats for concerts and sports events, with the central challenge being a flash onsale where 1M users compete for 50K seats in five minutes. The interview centerpiece is the seat reservation lock: each unique seat (Section A, Row 12, Seat 7) cannot be split or sub-bucketed like fungible inventory, so contention is unavoidable. We cover seat-level pessimistic holds with TTL, the virtual waiting room that randomizes queue position to absorb flash demand fairly, anti-bot defenses, dynamic pricing tiers, and the read-replica explosion that interactive seat maps cause.

System Design
/

Design a Ticketing System (Ticketmaster)

Design a Ticketing System (Ticketmaster)

Design a Ticketmaster-style ticketing platform that sells reserved seats for concerts and sports events, with the central challenge being a flash onsale where 1M users compete for 50K seats in five minutes. The interview centerpiece is the seat reservation lock: each unique seat (Section A, Row 12, Seat 7) cannot be split or sub-bucketed like fungible inventory, so contention is unavoidable. We cover seat-level pessimistic holds with TTL, the virtual waiting room that randomizes queue position to absorb flash demand fairly, anti-bot defenses, dynamic pricing tiers, and the read-replica explosion that interactive seat maps cause.

System Design
Medium
design-ticketing-system
case-study
ecommerce-marketplace
ticketmaster
seat-reservation
flash-sale
virtual-waiting-room
pessimistic-locking
websockets
system-design
intermediate
premium

998 views

29

Requirements

Functional Requirements

  1. Browse events: customers search by artist, venue, date, city.
  2. Interactive seat map: customer sees the venue layout with each seat colored by availability and price tier.
  3. Reserve seats: customer picks 1-8 seats; the system holds them for 5-10 minutes while the customer enters payment.
  4. Checkout: process payment, issue tickets (digital + barcode), email confirmation.
  5. Pricing tiers: seats are grouped into tiers (VIP, Floor, 100s, 200s, etc.) with different prices.
  6. Onsale time: a popular event opens at a specific moment (e.g., Friday 10:00 AM ET) with 1M users waiting.
  7. Resale marketplace: ticketholders can list at a price, buyers purchase, original ticket becomes invalid.

Out of Scope (state explicitly)

  • Venue management (where seats are defined; managed offline by venue ops).
  • Tax and fee computation (delegated to a fees service).
  • Physical ticket printing (most events are digital tickets now).
  • Scalper detection at the model level (heuristics only here).

Non-Functional Requirements

  1. Onsale concurrency: 1M concurrent users for a hot event; 100K admitted to seat selection over 5 minutes.
  2. Seat-reservation latency: p99 < 500 ms per attempt.
  3. No double-booking: strong consistency on every seat. Selling the same seat twice is unforgivable.
  4. Read latency: p99 < 200 ms for seat-map updates.
  5. Availability: 99.95% globally; the onsale is a single point of business risk that we engineer specifically for.
  6. Bot resistance: at least basic captcha + per-identity rate limit; advanced anti-bot is a separate workstream.

Back-of-the-Envelope Estimation

Onsale Traffic

Text
---------- Onsale moment ----------
Concurrent users at T=0:               1,000,000
Seats available:                       50,000
Users admitted to seat selection:      ~200,000 over 5 min (4x oversubscribe)
Seat reservation attempts/sec peak:    ~10,000
Seats sold in first 5 min:             ~50,000
Seats sold per minute (peak):          ~10,000

Steady-State Traffic

Text
---------- Steady-state daily ----------
DAU:                                   2M
Events browsed per DAU:                ~10
Seat maps loaded per DAU:              ~3
Tickets sold per day:                  ~500K
QPS browse:                            ~250
QPS seat-map view:                     ~60
QPS reservation attempt:               ~30

The gap between peak and average is enormous: 10K seats/sec at onsale vs 30 attempts/sec on average. Capacity planning targets the onsale moment.

Seat Map Read Multiplier

Text
---------- Read multiplier ----------
Users viewing the seat map at onsale:  ~200,000
Seat-map state pushes per user/sec:    ~1 (every available seat update)
Total pushes:                          ~200,000/sec
Message size per delta:                ~200 B
Bandwidth at edge:                     ~40 MB/sec

We must use server-pushed deltas (WebSocket / SSE), not polling. 200K clients polling at 1 Hz against the seat-map service would be 200K QPS of identical reads.

High-Level Design

Text
---------- Architecture overview ----------
[ Client ] -> [ CDN / Edge ] -> [ Waiting Room ] -> [ API Gateway ]
                                                            |
        +----------------+----------------+----------------+
        v                v                v                v
 +--------------+ +--------------+ +--------------+ +--------------+
 | Catalog Svc  | | Seat Map Svc | | Reservation  | | Order /      |
 |              | |              | | Service      | | Payment Svc  |
 +--------------+ +--------------+ +--------------+ +--------------+
        |                |                |                |
        v                v                v                v
 +--------------+ +--------------+ +--------------+ +--------------+
 | Postgres     | | Redis pub/sub| | Postgres +   | | Postgres +   |
 | (events)     | | + WebSocket  | | Redis (seat  | | payment      |
 |              | |              | | state)       | | provider     |
 +--------------+ +--------------+ +--------------+ +--------------+

Key APIs

Jsonc
GET  /api/v1/events/:id                   // event detail
GET  /api/v1/events/:id/seatmap           // seat map structure (cached)
WS   /ws/seatmap/:event_id                // live seat-state stream
POST /api/v1/reservations                 // try to reserve [seat_ids]
  body: { event_id, seat_ids: [...], idempotencyKey }
POST /api/v1/reservations/:id/confirm     // confirm + pay
POST /api/v1/waiting-room/enter           // join the waiting room
GET  /api/v1/waiting-room/status          // poll for admission token

Onsale Lifecycle

  1. Pre-onsale: customers visit event page; queue placement opens 30 min before onsale time.
  2. T=0: waiting room releases users in randomized batches at the rate the reservation backend can serve.
  3. Admitted user lands on the seat-map page; sees current availability via WebSocket; picks seats; clicks 'Hold'.
  4. Reservation Service runs the per-seat lock; either grants the hold (with 5-min TTL) or returns conflict.
  5. Customer enters payment; on success, hold becomes a permanent sale and the seat is removed from availability.
  6. On failure or timeout, hold expires; seat reappears in the live seat map for other admitted users.

Detailed Design

The two interesting components are the per-seat reservation lock and the virtual waiting room.

Per-Seat Reservation Lock

Unlike e-commerce inventory where 100 units of one SKU are fungible (and can be sub-bucketed across shards), seat A12-7 is unique. We cannot split it; contention is unavoidable when 50 customers all want that one seat.

The lock is per-seat with explicit ownership and TTL:

Text
---------- Seat state machine ----------
states: AVAILABLE -> HELD(by user X, expires_at) -> SOLD
        HELD timeout -> AVAILABLE
        HELD then payment success -> SOLD

Implementation: Redis hash per event keyed by seat id, plus a sorted set of holds for sweeping.

Text
---------- Redis structures per event ----------
seats:<event_id>                hash {seat_id: state}            (AVAILABLE/HELD/SOLD)
seats:<event_id>:holds          ZSET (member=seat_id, score=expires_ms)
seats:<event_id>:holders        hash {seat_id: user_id}
Atomic Multi-Seat Reservation (Lua)

To reserve 4 contiguous seats atomically (so the customer does not get 3 of 4), one Lua script that locks all-or-nothing:

Multi-seat reservation Lua script (KEYS = seats:<event_id>, seats:<event_id>:holds, seats:<event_id>:holders; ARGV = user_id, expires_ms, seat_id_1, seat_id_2, ...):

Lua
-- check all seats exist AND are AVAILABLE; reject if any are missing or held/sold
for i = 3, #ARGV do
    local state = redis.call('HGET', KEYS[1], ARGV[i])
    if not state or state ~= 'AVAILABLE' then
        return {0, ARGV[i]}  -- conflict; return which seat blocked us
    end
end
-- mark them HELD atomically
for i = 3, #ARGV do
    redis.call('HSET', KEYS[1], ARGV[i], 'HELD')
    redis.call('ZADD', KEYS[2], tonumber(ARGV[2]), ARGV[i])
    redis.call('HSET', KEYS[3], ARGV[i], ARGV[1])
end
return {1}

The whole script runs single-threaded on the Redis shard owning the event; for one event, all reservation traffic serializes through one CPU. Redis can do ~100K ops/sec on a single shard; a Lua script handling 4 seats per call gives us ~25K reservation attempts/sec, comfortably above our 10K peak.

Sharding Strategy

All seats for one event live on one Redis shard (event_id is the hash key). This is the right call because:

  • Multi-seat reservations need atomicity, which requires same-shard residency.
  • Different events do not share state, so we naturally horizontally scale by event.
Background Hold Sweeper

A worker scans the holds sorted set every second and releases any seat whose expires_at < now. The release is a small Lua that flips state back to AVAILABLE and emits a pub/sub event so all viewers see the seat reappear.

Virtual Waiting Room

During onsale, 1M users hit the front door. The Reservation Service can absorb ~10K reservation attempts/sec; admitting all 1M at once would crash everything.

The waiting room is a fair queue. Behaviorally:

Text
---------- Waiting-room flow ----------
1. User clicks 'Buy Tickets' before T=0; assigned a queue token + position.
2. At T=0, server randomizes the queue (so first-clickers do not always win) and starts admitting in batches.
3. Admission token is a signed JWT with a 5-min validity; user uses it to access seat-selection.
4. Server admits at ~5K/min so reservation backend never sees more than its capacity.
5. User sees 'You are #45,231 in line; estimated wait 8 minutes'.

Implementation: a Redis sorted set per event, where the score is the randomized priority assigned at queue entry.

Text
---------- Waiting room storage ----------
waitingroom:<event_id>             ZSET (member=session_id, score=randomized_priority)
waitingroom:<event_id>:admitted    SET of session_ids that have been admitted
waitingroom:<event_id>:admit_rate  current admission rate (admin-tunable)

A worker pops the lowest-score sessions at the configured admit rate and pushes admission JWTs to those sessions via WebSocket.

Why Randomize the Queue?

Naive FIFO (first-come-first-served) rewards browser refresh tricks: power-users with scripts hit T=0 microseconds before everyone else and dominate. Randomization within a window levels the field for everyone who showed up before T=0. Ticketmaster, Shopify, and Lyft all use this approach.

Admission Token
Jsonc
// JWT payload
{
    "sub": "u_42",
    "event_id": "e_123",
    "iat": 1714142400,
    "exp": 1714142700,
    "jti": "ticket_abc"
}

Reservation Service validates the JWT on every reservation attempt; expired or missing -> 401 with 'go back to the waiting room'.

Seat Map Live Updates (WebSocket)

Clients connecting to the seat map subscribe to ws://seatmap/<event_id>. Server maintains a per-event Redis Pub/Sub channel; whenever a seat changes state (HELD or SOLD or back to AVAILABLE), the seat map service publishes a delta. Connected WebSocket workers fan out to subscribed clients.

Jsonc
// Delta message
{
    "seat_id": "A12-7",
    "state": "HELD",
    "event_id": "e_123",
    "ts": 1714142400123
}

Client maintains a local view of the seat map and applies deltas as they arrive. On disconnect, reconnect and request a full snapshot to recover.

For a 50K-seat venue with 200K viewers at onsale, the fan-out per delta is 200K. With ~10 deltas/sec system-wide at peak (one per ~100 ms reservation cycle) the WebSocket fleet pushes ~2M messages/sec. At 200 B per message, that is 400 MB/sec, which we shard across ~10 WebSocket worker nodes (~40 MB/sec per node).

Anti-Bot Defenses

  • Captcha at queue entry; Ticketmaster also injects challenges deeper in the flow.
  • Per-identity rate limits on reservation attempts (4 attempts per minute per user).
  • Device fingerprinting to flag accounts attempting from many devices.
  • Velocity checks: if the same payment instrument tries to buy for many events at scalper-like volume, flag for review.

None of these are perfect; advanced bot operators always find ways. The goal is to raise cost, not stop completely.

Pricing Tiers

Seats belong to a tier (VIP, Floor, 100s, etc.) defined per event. Tier pricing is set offline; dynamic pricing (where prices rise as seats sell out) is supported by a pricing service that recomputes tier prices every minute and the reservation captures the price at hold time.

If the price changes between hold and confirmation, the customer pays the price quoted at hold time (the price they saw). If we re-priced at confirmation, customers would feel deceived.

Data Model

Postgres: events

SQL
CREATE TABLE events (
    event_id        BIGINT PRIMARY KEY,
    artist_id       BIGINT,
    venue_id        BIGINT,
    title           VARCHAR(255),
    starts_at       TIMESTAMPTZ,
    onsale_at       TIMESTAMPTZ,
    status          VARCHAR(32),               -- ANNOUNCED, ONSALE, SOLDOUT, COMPLETED
    seat_count      INT,
    created_at      TIMESTAMPTZ
);

CREATE INDEX idx_events_starts ON events (starts_at);
CREATE INDEX idx_events_onsale ON events (onsale_at);

Postgres: seats (canonical, per event)

SQL
CREATE TABLE seats (
    seat_id         BIGINT,
    event_id        BIGINT,
    section         VARCHAR(32),               -- 'A12'
    row             VARCHAR(8),                -- '12'
    seat_number     INT,                       -- 7
    tier            VARCHAR(16),               -- 'VIP', '100s'
    price_cents     INT,
    status          VARCHAR(16) NOT NULL,      -- AVAILABLE, HELD, SOLD
    holder_user_id  BIGINT,
    holds_until     TIMESTAMPTZ,
    PRIMARY KEY (event_id, seat_id)
);

CREATE INDEX idx_seats_event_status ON seats (event_id, status);

Postgres is the source of truth; Redis is the hot path. CDC keeps Redis warm, but during the onsale, Redis is authoritative for hold state and a background job persists changes to Postgres.

Postgres: tickets (issued, post-confirmation)

SQL
CREATE TABLE tickets (
    ticket_id       BIGINT PRIMARY KEY,
    event_id        BIGINT NOT NULL,
    seat_id         BIGINT NOT NULL,
    owner_user_id   BIGINT NOT NULL,
    barcode         VARCHAR(64) UNIQUE NOT NULL,
    issued_at       TIMESTAMPTZ NOT NULL,
    transferred_to  BIGINT,
    is_active       BOOLEAN DEFAULT TRUE
);

When a ticket is resold, we issue a new barcode and invalidate the old one (is_active = false), preventing duplicate-entry.

Scaling and Bottlenecks

One Hot Event

The onsale is the single hardest moment. All techniques converge:

  • Pre-warm: 30 min before onsale, scale up Redis shard owning the event to its dedicated capacity; pre-warm CDN with the event page; pre-load seat map data.
  • Waiting room throttle: admit at the rate Reservation can serve. If Reservation slows, slow admission proportionally so users in the seat picker do not see failures.
  • Per-event Redis shard isolation: one event's onsale should not impact other events. Shard by event_id.
  • Read amplification: live seat updates over WebSocket scale linearly with viewer count; we shard WebSocket nodes by event_id and use Redis Pub/Sub to fan out within an event.

Many Concurrent Events

The platform may have 100 events on sale at once globally, none individually huge. Spread them across Redis shards by event_id. Aggregate browse traffic uses standard read-replica scaling.

Ticket Resale

When a ticketholder lists a ticket at $200, a buyer purchases, and we issue a new ticket. This is essentially a mini-checkout: payment to platform, payout to seller, new barcode issued, old barcode invalidated. Stateful via the Order Service with a saga (payment -> issue new barcode -> invalidate old -> payout seller).

Database Hot Spots

The seats table for one event becomes hot during onsale. Postgres sees mostly batched writes (Redis flushes) which are fine; the read traffic is served from Redis. If a single event truly saturates its Redis shard, we shard within the event by section (one section per shard) at the cost of cross-shard atomicity for multi-section bookings (rare).

Audit Trail

Every state transition is published to Kafka for audit (regulatory requirements in some jurisdictions; dispute resolution; bot detection signals). Topic partitioned by event_id.

Failure Recovery

If the Redis shard owning an event crashes mid-onsale, we fail over to a replica (within seconds). Holds in flight that landed only on the primary may be lost; sessions whose holds are lost see 'we lost your hold; please re-select' rather than charging them for missing seats. This is preferable to selling the same seat twice.

Trade-offs and Alternatives

Pessimistic vs Optimistic Locking

We use pessimistic locking via Lua script because contention is severe and we need the strong serialization guarantee per shard. Optimistic (version + retry) would generate massive retry storms during onsale; pessimistic serializes cleanly.

Postgres vs Redis as the Hot Path

During onsale, Postgres row-level locks on the same seat row would serialize all attempts on one Postgres connection, hitting at best a few hundred per second. Redis Lua hits 10K+ per second on the same shard with much lower latency. Postgres remains the source of truth; Redis is the cache that absorbs the spike.

Polling vs WebSocket for Seat Map

Polling at 1 Hz from 200K viewers is 200K QPS of identical reads. WebSocket plus Pub/Sub gives the same UX at the cost of stateful connections. Stateful connections have their own scaling story (session pinning, reconnect storms), but the bandwidth and origin load reduction is enormous. WebSocket is the right choice at this scale.

Why Randomize the Waiting Room?

FCFS rewards bots and people who can run scripts. Randomizing within the queue (assigning a random priority at entry) gives every legitimate user the same expected position. It does not stop bots from joining the queue, but it removes the technical advantage from joining 50 ms early.

Hold TTL: 5 Minutes vs 10 Minutes

5 min increases turnover (more attempts per minute) but pressures customers to checkout fast; 10 min gives breathing room but slows the line. Most platforms run 5-7 min. Show the timer prominently in the UI.

Why Not Just 2PC Across Postgres?

Distributed Postgres locks across regions for a multi-seat hold would block multiple databases for the duration of the hold. We use Redis-as-source-of-truth-during-onsale instead, with an async write-back to Postgres. This trades durability for throughput during the onsale window; if Redis loses 100 ms of holds, customers re-select.

Static vs Dynamic Pricing

Static prices are simpler and customer-friendly; no surprise. Dynamic pricing (Ticketmaster's Verified Fan / Platinum) captures more revenue but generates customer trust issues. Either way, capture the price at hold time so the customer pays what they saw.

Bots: Detect vs Prevent

Prevention (CAPTCHA, fingerprinting) raises cost but never eliminates bots. Detection (post-hoc analysis of purchases for scalper patterns) is more effective but reactive. Most platforms layer both with explicit policies (max 4 tickets per identity per event).

Real-World Examples

How real systems implement this in production

Ticketmaster

Ticketmaster runs the canonical version of this stack: Verified Fan registration before onsale, randomized queue admission, captcha gates, and dynamic Platinum pricing for hot events. They face routine outages on mega-onsales (Taylor Swift Eras Tour broke their queue in 2022) which made the design tradeoffs visible at scale.

Trade-off: Ticketmaster's randomized queue improves fairness, but the 2022 Taylor Swift incident showed that even with the queue, the seat-selection backend can saturate and cascade. The lesson: capacity planning has to extend past admission control all the way through every downstream service that an admitted user touches.

Shopify Flash Sales

Shopify's checkout system handles flash drops (Yeezy, Supreme) where 1M users show up at the second a product launches. They use a Cloudflare-backed edge waiting room that holds requests at the edge before they reach origin, plus per-shop Pod isolation so one shop's flash sale does not affect others.

Trade-off: Shopify pushes admission control to the edge, which is more efficient than backend queues but harder to make exactly fair. The lesson: edge-based queuing is the cheapest way to absorb a flash, but you trade off fine-grained queue semantics for global low-latency throttling.

AXS (entertainment ticketing)

AXS uses a similar onsale architecture with virtual waiting rooms and per-event sharded backends. They emphasize digital tickets with rotating barcodes (a new barcode every 60 s) to defeat resale via screenshot.

Trade-off: Rotating barcodes increase friction at the venue gate (the customer must have the live app, not a screenshot) but eliminate a major fraud vector. The lesson: anti-fraud features that change the customer experience must be tuned carefully; rotating barcodes work for tech-savvy crowds but cause issues for older fans without smartphones.

DICE (London-based ticketing)

DICE built mobile-first ticketing with no resale (transfers only at face value) and a heavy emphasis on bot detection via account verification. Smaller scale than Ticketmaster but interesting model: by removing the secondary market, scalper economics break down.

Trade-off: DICE's no-resale policy eliminates scalping at the cost of being less attractive to power-users who want flexibility. The lesson: scalping is partly a market-design problem; restricting transfers reshapes incentives more effectively than any technical defense.

Quick Interview Phrases

Key terms to use in your answer

per-seat pessimistic lock with TTL
virtual waiting room with randomized priority
WebSocket fan-out for live seat map deltas
Redis Lua for atomic multi-seat reservation
shard by event_id for Redis isolation
capture price at hold time

Common Interview Questions

Questions you might be asked about this topic

Pre-onsale (9:30 AM): user clicks 'Buy Tickets', joins waiting room (Redis ZSET with randomized priority). At 10:00, queue starts admitting; user gets a JWT admission token via WebSocket and lands on the seat-picker page. Seat map loads from CDN (static layout) plus current state via WebSocket subscription. User picks 2 adjacent seats and clicks 'Hold'. Reservation Service runs the multi-seat Lua on the event's Redis shard: atomic check that both seats are AVAILABLE, set both to HELD with the user_id and a 5-min expires_at, add both to the holds ZSET. On success returns reservation_id. WebSocket pushes the HELD delta to all other viewers; their seat map immediately greys out those seats. User enters payment; Order Service runs checkout saga (charge payment, mark seats SOLD in Redis and Postgres, issue tickets with barcodes, email confirmation). On payment failure or 5-min timeout, the holds sweeper releases the seats back to AVAILABLE.

Interview Tips

How to discuss this topic effectively

1

Distinguish ticketing from generic e-commerce immediately. Saying 'each seat is unique so I cannot sub-bucket like SKU inventory; contention is irreducible per seat' shows you understand why this is harder.

2

Bring up the virtual waiting room before being asked. Admitting 1M users to the seat picker at once would crash everything; fair admission is the senior insight.

3

Default to Redis-as-source-of-truth during onsale, with async write-back. Postgres locks on a single seat row cap throughput at a few hundred per second, which is way below what we need.

4

Use WebSocket for seat-map updates, not polling. 200K viewers polling at 1 Hz is 200K QPS of identical reads; pub/sub fan-out is dramatically cheaper.

5

Mention price capture at hold time. Re-pricing at checkout breaks customer trust; capturing the quoted price avoids the worst kind of complaint.

Common Mistakes

Pitfalls to avoid in interviews

Treating seats as fungible inventory

Seat A12-7 is unique and cannot be sub-bucketed. The Lua script must lock that specific seat atomically; sub-bucketing or sharding within an event by random hash breaks multi-seat atomicity. Shard by event_id, lock per seat.

Letting the entire 1M onsale crowd hit the reservation service

Reservation can absorb maybe 10K attempts/sec; 1M concurrent users would cascade into outages. A virtual waiting room admits at the rate the backend can serve, with randomized fairness so bots do not dominate.

Polling the seat map for live updates

200K viewers polling at 1 Hz hammers your origin with 200K QPS of identical reads. Use WebSocket subscriptions backed by Redis Pub/Sub; clients receive only deltas as seats change state.

Re-pricing seats at checkout if dynamic pricing changed during the hold

Customers feel cheated when the price changes after they clicked 'Hold'. Capture the price at hold time and honor it through checkout, even if the live price has moved.

Using Postgres row locks as the primary contention mechanism during onsale

Postgres row locks on a single seat row serialize through one Postgres backend; throughput is a few hundred per second. Redis Lua scripts on a sharded cluster handle 10K+ reservation attempts/sec per event. Treat Postgres as the durable record, Redis as the live arbiter during onsale.