System Design Article

WebSockets, Long Polling & SSE

Difficulty: Medium

Standard HTTP is a request-response protocol: the client asks, the server answers. But many modern applications need real-time, bidirectional communication - chat messages, live notifications, stock tickers, collaborative editing, and gaming. This lesson covers three techniques for real-time communication: long polling, Server-Sent Events (SSE), and WebSockets. You will learn how each works, their trade-offs, and when to use which in a system design interview.

System Design
/

WebSockets, Long Polling & SSE

WebSockets, Long Polling & SSE

Standard HTTP is a request-response protocol: the client asks, the server answers. But many modern applications need real-time, bidirectional communication - chat messages, live notifications, stock tickers, collaborative editing, and gaming. This lesson covers three techniques for real-time communication: long polling, Server-Sent Events (SSE), and WebSockets. You will learn how each works, their trade-offs, and when to use which in a system design interview.

System Design
Medium
websockets
long-polling
sse
server-sent-events
real-time
bidirectional
push
intermediate

379 views

9

The Problem: HTTP Is Not Built for Real-Time

Standard HTTP follows a strict request-response pattern: the client sends a request, the server sends a response, and the connection is effectively idle until the next request. The server cannot spontaneously send data to the client.

This creates a problem for real-time applications:

Scenario: Chat Application

Alice sends a message to Bob. The server receives Alice's message, but how does Bob's client know there is a new message?

Option 1: Client polls regularly (Short Polling)

Text
Bob's client: GET /messages?since=last_id  (every 2 seconds)
Server: 200 OK {"messages": []}              - nothing new

... 2 seconds later ...
Bob's client: GET /messages?since=last_id
Server: 200 OK {"messages": []}              - still nothing

... 2 seconds later ...
Bob's client: GET /messages?since=last_id
Server: 200 OK {"messages": [{"from": "Alice", "text": "Hi!"}]}  - finally!

Problems with short polling:

  • Wasted resources: 90% of requests return empty responses.
  • Latency: Average delay of half the polling interval (1 second with 2-second polling).
  • Server load: 1 million connected users polling every 2 seconds = 500K requests/second of mostly empty responses.

We need better solutions. Enter long polling, SSE, and WebSockets.

Long Polling

Long polling is an improvement over short polling where the server holds the request open until new data is available (or a timeout expires).

How Long Polling Works

Text
Bob's Client                           Server
    |                                     |
    |  GET /messages?since=last_id        |
    |------------------------------------>|
    |                                     |  (server holds connection open)
    |                                     |  ... waiting for new data ...
    |                                     |
    |                                     |  Alice sends a message!
    |                                     |
    |  200 OK {"messages": [{...}]}       |
    |<------------------------------------|
    |                                     |
    |  Immediately re-send:               |
    |  GET /messages?since=new_last_id    |
    |------------------------------------>|
    |                                     |  (holds again...)

Key Characteristics

  • Near real-time: Messages arrive as soon as the server has them (no polling interval delay).
  • HTTP-based: Works with existing HTTP infrastructure - load balancers, proxies, firewalls all support it.
  • Server holds connections: Each client has a pending HTTP request. With 100K users, the server holds 100K open connections.
  • Timeout and reconnect: The server should time out after 30-60 seconds and return an empty response. The client immediately reconnects.

Advantages

  • Works everywhere (every browser, every proxy, every CDN supports HTTP).
  • No special protocol or library needed.
  • Easy to implement on both client and server.
  • Reduces wasted empty responses compared to short polling.

Limitations

  • One-directional push: The server can push to the client, but the client cannot push to the server over the same connection. The client must send a separate POST request to send data.
  • Connection overhead: Each response/reconnection cycle requires a new HTTP request with full headers.
  • Server resource usage: Holding many open connections consumes server resources (threads, memory).
  • Not truly bidirectional: Sending data from client to server requires a separate HTTP request.

When to Use Long Polling

  • Simple notification systems: "You have 3 new messages" - low frequency, simple data.
  • Email clients: Check for new emails - low to medium frequency.
  • Environments where WebSockets are blocked: Some corporate firewalls block WebSocket connections.
  • Prototype or MVP: Quick to implement, no special infrastructure needed.

Server-Sent Events (SSE)

Server-Sent Events (SSE) is a standard that allows the server to push events to the client over a single, long-lived HTTP connection. Unlike long polling, the connection stays open and the server can send multiple events over time.

How SSE Works

Text
Bob's Client                           Server
    |                                     |
    |  GET /events (Accept: text/event-stream)
    |------------------------------------>|
    |                                     |
    |  HTTP/1.1 200 OK                    |
    |  Content-Type: text/event-stream    |
    |  Connection: keep-alive             |
    |<------------------------------------|
    |                                     |
    |  data: {"type":"message",           |
    |         "from":"Alice",             |
    |         "text":"Hi!"}               |
    |<------------------------------------|
    |                                     |
    |  ... connection stays open ...      |
    |                                     |
    |  data: {"type":"typing",            |
    |         "user":"Alice"}             |
    |<------------------------------------|
    |                                     |

SSE Event Format

Text
event: message
data: {"from": "Alice", "text": "Hi!"}
id: 42

event: typing
data: {"user": "Alice"}

event: notification
data: {"count": 3}
id: 43
retry: 5000
  • event: Event type (clients can listen for specific types).
  • data: The payload (usually JSON).
  • id: Event ID for resuming after disconnection.
  • retry: Reconnection interval in milliseconds.

Built-in Browser Support

const source = new EventSource('/events');

source.addEventListener('message', (event) => {
    const data = JSON.parse(event.data);
    console.log('New message:', data);
});

source.addEventListener('notification', (event) => {
    const data = JSON.parse(event.data);
    updateBadge(data.count);
});

// Automatic reconnection with Last-Event-ID header
source.onerror = () => console.log('Connection lost, reconnecting...');

Advantages

  • Simple: Uses standard HTTP. The EventSource API handles reconnection automatically.
  • Efficient: Single connection, no reconnection overhead (unlike long polling).
  • Automatic reconnection: The browser reconnects automatically with the Last-Event-ID header, so the server can resume from where it left off.
  • Text-based: Easy to debug (you can see events in browser dev tools).

Limitations

  • Server-to-client only: SSE is unidirectional. The client cannot send data over the SSE connection - it must use separate HTTP requests.
  • Limited connections: Browsers limit SSE connections to ~6 per domain (HTTP/1.1). HTTP/2 largely solves this with multiplexing.
  • Text only: SSE transmits text data (UTF-8). Binary data must be Base64-encoded.
  • No native support in some environments: While all modern browsers support SSE, some server frameworks and proxies need configuration.

When to Use SSE

  • Live feeds: News tickers, social media feeds, sports scores - server pushes updates, client just displays them.
  • Notifications: Real-time notification counts, alerts.
  • Progress updates: File upload progress, long-running job status.
  • Stock tickers: Continuous price updates from server to client.

WebSockets

WebSockets provide a full-duplex, bidirectional communication channel over a single, long-lived TCP connection. Both the client and server can send messages at any time without waiting for a request.

How WebSockets Work

The Handshake (HTTP Upgrade)

WebSocket connections start as a regular HTTP request with an Upgrade header:

Text
Client -> Server:
GET /chat HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

Server -> Client:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

After the handshake, the protocol switches from HTTP to WebSocket. The TCP connection remains open.

Bidirectional Communication
Text
Client                                Server
    |                                     |
    |  [HTTP Upgrade Handshake]           |
    |<----------------------------------->|
    |                                     |
    |  === WebSocket Connection Open ===  |
    |                                     |
    |  {"type":"message","text":"Hi!"}    |
    |------------------------------------>|
    |                                     |
    |  {"type":"message","text":"Hello!"} |
    |<------------------------------------|
    |                                     |
    |  {"type":"typing","user":"Alice"}   |
    |<------------------------------------|
    |                                     |
    |  {"type":"message","text":"Bye"}    |
    |------------------------------------>|
    |                                     |
    |  Ping/Pong (keepalive)              |
    |<----------------------------------->|
    |                                     |

Key Characteristics

  • Full-duplex: Both sides can send messages simultaneously.
  • Low overhead: After the handshake, messages have minimal framing (2-6 bytes overhead vs. hundreds of bytes for HTTP headers).
  • Persistent connection: The connection stays open for the lifetime of the session.
  • Binary and text: Supports both text (UTF-8) and binary data.
  • Ping/Pong: Built-in heartbeat mechanism to detect dead connections.

Advantages

  • True real-time: Sub-millisecond message delivery in both directions.
  • Minimal overhead: No HTTP headers on each message. Ideal for high-frequency messaging.
  • Bidirectional: Client and server are peers - either can initiate communication.
  • Binary support: Efficient for sending images, audio, or binary protocols.

Limitations

  • Stateful connections: The server must maintain an open connection for each client. This complicates scaling.
  • Load balancing complexity: Standard HTTP load balancers distribute requests. WebSocket connections are long-lived, so you need sticky sessions or a connection-aware load balancer.
  • Not cacheable: WebSocket messages cannot be cached by CDNs or HTTP proxies.
  • Firewall/proxy issues: Some corporate networks block WebSocket connections (port 80/443 with Upgrade header).
  • Reconnection logic: Unlike SSE, WebSockets have no built-in automatic reconnection. You must implement retry logic yourself.

When to Use WebSockets

  • Chat applications: WhatsApp, Slack, Discord - bidirectional, high-frequency messaging.
  • Collaborative editing: Google Docs - real-time cursor positions, text changes.
  • Multiplayer gaming: Low-latency, bidirectional game state updates.
  • Financial trading: Real-time order book updates, trade execution.
  • Live dashboards: Monitoring systems with real-time metrics.

Comparing All Four Techniques

Head-to-Head Comparison

FeatureShort PollingLong PollingSSEWebSocket
DirectionClient -> ServerServer -> ClientServer -> ClientBidirectional
LatencyHigh (polling interval)Low-MediumLowVery Low
Connection overheadNew connection per pollNew connection per eventSingle persistent connectionSingle persistent connection
HTTP compatibleYesYesYesInitial handshake only
Proxy/CDN friendlyYesMostlyYesVaries (some block)
Browser supportUniversalUniversalAll modern browsersAll modern browsers
Auto-reconnectionN/A (client controls)Client responsibilityBuilt-inMust implement
Binary dataYes (in body)Yes (in body)No (text only)Yes
Server complexityLowMediumLow-MediumHigh
Scaling difficultyLowMediumMediumHigh
Message frequencyLowLow-MediumMedium-HighHigh

Decision Matrix: Which Technique for Which Problem?

Use CaseBest ChoiceWhy
Simple notifications ("3 new emails")Long Polling or SSELow frequency, server-to-client only
Live sports scoresSSEServer pushes updates, no client-to-server needed
Chat applicationWebSocketBidirectional, high frequency
Stock tickerSSE or WebSocketSSE for display-only; WebSocket if client sends trades
Online multiplayer gameWebSocketLow latency, bidirectional, binary data
Collaborative editingWebSocketBidirectional, real-time cursor and text sync
Social media feed (new posts)SSEServer pushes new posts; "like" is a separate HTTP POST
IoT sensor dataWebSocketHigh-frequency bidirectional data + commands
Long-running job progressSSEServer reports progress, client just displays
Presence indicators ("user is online")WebSocketBidirectional heartbeats, real-time status

Scaling Real-Time Systems

Real-time connections are stateful, which makes scaling fundamentally different from stateless HTTP APIs.

The Core Challenge

With a stateless REST API, any server can handle any request. With WebSockets, a client has a persistent connection to a specific server. If you have 3 WebSocket servers, User A might be connected to Server 1 and User B to Server 2. When A sends a message to B, Server 1 must somehow deliver it to Server 2.

Solution 1: Pub/Sub Message Broker

Text
[Client A] <--WS--> [Server 1] --publish--> [Redis Pub/Sub] --subscribe--> [Server 2] <--WS--> [Client B]
  • When Server 1 receives a message from Client A destined for Client B, it publishes the message to a Redis Pub/Sub channel.
  • Server 2 subscribes to relevant channels and forwards the message to Client B over the existing WebSocket connection.
  • This is the most common pattern used by Socket.IO, Discord, and similar real-time systems.

Solution 2: Consistent Hashing

Route all connections for a specific chat room or user to the same server using consistent hashing. This reduces cross-server communication but limits horizontal scaling.

Solution 3: Dedicated Connection Manager

Separate the connection management (which client is on which server) from the business logic:

Text
[Connection Manager]     [Business Logic Servers]
  - Tracks all WS          - Stateless
    connections             - Processes messages
  - Routes messages         - Queries databases
    to correct server

Scaling Challenges & Solutions

ChallengeSolution
Connection limits (OS file descriptors)Increase ulimit; use epoll/kqueue for efficient I/O multiplexing. A single server can handle ~100K-1M connections.
Memory per connectionMinimize per-connection state. A WebSocket connection uses ~2-10KB of memory. 1M connections = 2-10GB RAM.
Load balancer routingUse sticky sessions (based on cookie or IP) or use a layer 4 (TCP) load balancer that forwards the initial HTTP Upgrade.
Cross-server messagingRedis Pub/Sub, Kafka, or a dedicated message broker for routing messages between servers.
Reconnection stormsIf a server restarts, all its clients reconnect simultaneously. Implement exponential backoff with jitter.
Health monitoringWebSocket connections do not generate regular HTTP requests. Use ping/pong frames to detect dead connections.

Real-World Scale Numbers

  • Discord: Handles millions of concurrent WebSocket connections across thousands of servers, using Rust-based connection gateways.
  • Slack: Uses a combination of WebSockets for real-time messaging and HTTP for API calls, with a connection manager service.
  • WhatsApp: Uses a custom protocol over TCP (not standard WebSockets) to handle 2+ billion users with remarkably small server fleet.

Real-Time Communication in Interviews

How to Choose in an Interview

When an interviewer asks you to design a real-time feature, follow this decision process:

Step 1: Identify the communication direction

  • Server-to-client only? -> SSE or Long Polling
  • Bidirectional? -> WebSocket

Step 2: Assess the message frequency

  • Low frequency (notifications, email)? -> Long Polling or SSE
  • Medium frequency (live feed, scores)? -> SSE
  • High frequency (chat, gaming)? -> WebSocket

Step 3: Consider the scaling requirements

  • Small scale or prototype? -> Long Polling (simplest)
  • Large scale, server-to-client? -> SSE (efficient, simple)
  • Large scale, bidirectional? -> WebSocket + Pub/Sub

Step 4: Address the scaling challenge

  • Always mention how you would handle cross-server message delivery (Redis Pub/Sub, Kafka).
  • Mention sticky sessions or layer 4 load balancing for WebSocket connections.
  • Discuss reconnection strategy (exponential backoff with jitter).

Example: "Design a Notification System"

Good answer: "For real-time notifications, I would use Server-Sent Events. The client opens a single SSE connection to receive notifications. SSE is simpler than WebSockets and sufficient here because notifications only flow from server to client. If the connection drops, the browser automatically reconnects with the Last-Event-ID header so the server can replay missed events. For users who are offline, notifications are stored in a database and delivered on next connection."

Why this is strong: Justifies the choice (SSE over WebSocket), explains the reconnection strategy, and handles the offline case.

Example: "Design WhatsApp"

Good answer: "For real-time messaging, I would use WebSockets. Each client maintains a persistent WebSocket connection to a gateway server. When User A sends a message to User B, the gateway publishes the message to a Redis Pub/Sub channel keyed by User B's connection. The gateway server holding User B's connection receives the event and forwards it. For offline users, messages are stored in a queue and delivered when they reconnect. I would use consistent hashing to route connections by user ID to reduce cross-server communication."

Why this is strong: Chooses WebSocket with clear reasoning, addresses cross-server routing, handles offline delivery, and optimizes with consistent hashing.

Real-World Examples

How real systems implement this in production

Slack

Slack uses WebSockets for real-time messaging and presence indicators. Each client maintains a persistent WebSocket connection to a gateway service. When a message is sent, it is routed through backend services and pushed to all relevant connected clients via their WebSocket connections. Slack falls back to long polling if WebSockets are blocked.

Trade-off: WebSockets provide the best real-time experience but require connection management infrastructure and sticky load balancing. The long polling fallback ensures universal accessibility at the cost of slightly higher latency.

GitHub Actions / CI pipelines

GitHub uses Server-Sent Events to stream build logs and workflow status updates in real-time. When you view a running GitHub Actions workflow, an SSE connection pushes new log lines as they are generated. This is unidirectional (server to client) and fits SSE perfectly.

Trade-off: SSE is simpler to implement and scale than WebSockets for this use case, since log streaming is purely server-to-client. The trade-off is that SSE does not support binary data, so log content must be text.

Discord

Discord handles millions of concurrent WebSocket connections using a custom gateway written in Rust. Each gateway server manages thousands of connections and communicates with backend services via a message broker. Discord uses heartbeating (ping/pong) to detect dead connections and implements session resumption so clients can reconnect without missing messages.

Trade-off: Discord invested heavily in custom infrastructure (Rust gateway, session resumption, connection management) to achieve sub-second message delivery at massive scale. This engineering investment is justified by their real-time chat and voice use case.

Quick Interview Phrases

Key terms to use in your answer

full-duplex communication
HTTP Upgrade handshake
sticky sessions
heartbeat/ping-pong
connection fan-out
pub/sub for scaling WebSockets

Common Interview Questions

Questions you might be asked about this topic

Sticky sessions via load balancer, pub/sub (Redis) for cross-server messaging, separate connection manager from business logic, horizontal scaling with consistent hashing, heartbeats for cleanup.

Interview Tips

How to discuss this topic effectively

1

Always justify your real-time technique choice. Do not just say 'I will use WebSockets.' Say 'I will use WebSockets because this chat system requires bidirectional communication with low latency. SSE would not work because the client also needs to send messages in real-time.'

2

When designing a WebSocket-based system, immediately address the scaling challenge: 'Since WebSocket connections are stateful, I need a message broker (Redis Pub/Sub) for cross-server message delivery, and I will use sticky sessions at the load balancer level.'

3

Mention the fallback strategy. 'If WebSockets are blocked by a corporate firewall, the client falls back to long polling.' This shows you think about real-world deployment constraints.

4

For notification systems, SSE is often the better choice over WebSockets. Notifications flow server-to-client, and SSE has built-in reconnection with event replay. Mentioning this shows nuanced understanding.

5

Know the connection limits: a single server can handle ~100K-1M WebSocket connections (depending on memory and message frequency). If the interviewer asks about 10M concurrent users, you need multiple gateway servers with a pub/sub layer.

6

Always handle the offline case: 'Messages sent to offline users are stored in a message queue. When the user reconnects, missed messages are replayed from the queue.'

Common Mistakes

Pitfalls to avoid in interviews

Using WebSockets for everything that needs 'real-time' updates

Many 'real-time' features only need server-to-client push (notifications, live feeds, progress updates). SSE is simpler, has built-in reconnection, and scales more easily than WebSockets. Reserve WebSockets for truly bidirectional use cases like chat and gaming.

Ignoring the statefulness of WebSocket connections when discussing scaling

WebSocket connections are stateful - each client is connected to a specific server. This means you cannot simply add more servers behind a stateless load balancer. You need sticky sessions, a connection registry, and a message broker for cross-server communication.

Forgetting to handle reconnection and missed messages

Connections drop. Networks fail. Servers restart. Your design must handle reconnection gracefully: implement exponential backoff with jitter, store messages for offline users, and use message IDs or timestamps to replay missed events.

Assuming the load balancer will 'just work' with WebSockets

Standard HTTP load balancers (layer 7) may not correctly handle the WebSocket upgrade handshake. You need either a layer 4 (TCP) load balancer or a layer 7 load balancer specifically configured for WebSocket support (e.g., AWS ALB supports WebSockets natively).