Design a Live Notification System

1. What is a Live Notification System?

In the early 2000s, checking for new emails meant clicking “Refresh” every 5 minutes. Today, users expect Real-Time feedback. When you order an Uber, you see the car move. When you get a DM on Slack, it pops up instantly.

This shift from Pull (Client asks Server) to Push (Server tells Client) fundamentally changes how we architect systems. We move from stateless HTTP requests to stateful, persistent TCP connections.

[!TIP] Use Cases:

Social Media: “Jane liked your photo”.

Chat Apps: WhatsApp/Slack messages.

FinTech: Stock price alerts.

Gaming: Live score updates.

2. Requirements & Goals

2.1 Functional Requirements

Real-Time Delivery: Users receive notifications immediately (< 1s latency).
Offline Handling: If a user is offline, store the notification and deliver it when they reconnect (Inbox).
Fan-out: Ability to send one message to 1 million followers (e.g., “Celebrity tweeted”) efficiently.
Cross-Platform: Support Web (WebSocket), iOS (APNS), Android (FCM), Email, and SMS.

2.2 Non-Functional Requirements

Scalability: Support 10 Million concurrent open connections.
Battery Efficiency: Do not drain mobile battery with constant polling.
Reliability: At-least-once delivery. Never lose a notification.
Ordering: Messages should appear in the order they were sent.

2.3 Extended Requirements

Opt-Out: Users must be able to unsubscribe (GDPR/CCPA).
Rate Limiting: Prevent a single service from spamming a user (e.g., max 10 alerts/hour).

3. Capacity Estimation

3.1 Traffic Analysis

DAU: 100 Million.
Peak Connections: Assuming 10% of users are online simultaneously -> 10 Million concurrent connections.
Notification Volume: 20 notifications/user/day -> 2 Billion/day.
- Write QPS: 2B / 86400 ≈ 23,000 msg/sec.

3.2 Memory Estimation (The Bottleneck)

In a standard HTTP app, memory is freed after the request. In WebSockets, memory is held for the duration of the session.

Per Connection Overhead: ~20KB (TCP stack + App context in Go/Node.js).
Total RAM: 10 Million * 20KB = 200 GB RAM.
Server Sizing: If each server has 64GB RAM, we need ~4-5 servers just to hold the connections. To be safe (and handle CPU load), we’ll plan for 20 Servers.

4. System APIs

The client connects via WebSocket, but triggers come via REST.

Method	Endpoint	Description
`WS`	`/ws/connect`	Upgrades HTTP to WebSocket. Header: `Authorization: Bearer <token>`
`POST`	`/v1/notify`	Internal API to trigger a notification. Payload: `{ userId: "u1", type: "like", content: "..." }`
`POST`	`/v1/subscribe`	Subscribes a user to a topic. Payload: `{ topic: "news:tech" }`

5. Database Design

We need a way to map Users to the specific WebSocket Server they are connected to.

5.1 Redis (Session Store)

Key: user:{id}:connection
Value: server_ip: 10.0.0.5
TTL: 60 seconds (Heartbeat refreshes this).

5.2 Cassandra (Notification Inbox)

Stores history for offline retrieval.

CREATE TABLE notifications (
    user_id UUID,
    created_at TIMESTAMP,
    notification_id UUID,
    content TEXT,
    is_read BOOLEAN,
    PRIMARY KEY (user_id, created_at DESC)
);

6. High-Level Architecture

We use a Hybrid Architecture:

Stateless API: For triggering notifications.
Stateful Gateway: For maintaining WebSocket connections.

System Architecture: Live Notification System

WS Gateway | Redis Subscriptions | Cross-Platform Fallback (APNS/FCM)

Real-Time (WebSocket)

Offline Fallback (Push)

State/Preference Flow

Clients

Gateway Layer

Core Services

External Platforms

📱

Mobile App

🖥️

Web App

WS Gateway Cluster

Server 1

Server 2

...

Server N

Notification Engine

Routing & Logic

MQ (Kafka)

Redis Cluster

User -> Server Map

Cassandra

Inbox & Prefs

APNS

Apple Push

FCM

Google Push

7. Component Design (Deep Dive)

7.1 Protocols: The Evolution

Short Polling (The Bad Way)

Client: setInterval(fetch, 2000).

Pros: Easy to code.
Cons: Wastes bandwidth (HTTP Header overhead). 99% of requests return “No new messages”.

Long Polling (The Better Way)

Client: “Any news?” -> Server: “Wait…” (Holds connection open for 60s) -> Server: “Yes!” or “Timeout”.

Pros: Real-time feel. Works over standard HTTP.
Cons: Server ties up a thread per user. Reconnecting adds latency.

WebSockets (The Gold Standard)

A persistent TCP connection upgraded from HTTP.

Pros: Full Duplex (Bidirectional). Low overhead (2 byte header vs 800 byte HTTP header).
Cons: Stateful. Hard to load balance.

Server-Sent Events (SSE) (The Specialist)

A one-way stream (text/event-stream).

Pros: Native browser support (EventSource), auto-reconnect. Great for News Feeds, Stock Tickers.
Cons: Client cannot send data back.

8. Data Partitioning & Sharding

This is the hardest part. In a stateless REST API, any server can handle any request. In WebSockets, Server A holds the connection for User 1. If the Notification Engine sends the message to Server B, User 1 will never get it.

8.1 Solution: Redis Pub/Sub (The Backplane)

Instead of knowing exactly where User 1 is, we use a Pub/Sub model.

WS Server A: Subscribes to channel user:1.
Notification Engine: Publishes message to channel user:1.
Redis: Routes the message to Server A.
Server A: Pushes to User 1.

Trade-off: Redis Pub/Sub is “Fire and Forget”. If Server A is busy, the message is lost. For higher reliability, use Redis Streams or Kafka.

9. Reliability, Caching, & Load Balancing

9.1 The “Thundering Herd” Problem

What happens if a Data Center fails and 1 Million users try to reconnect at the exact same second? The Server Melts. (CPU goes to 100%, Auth service crashes).

Solution: Jitter When a client disconnects, do not reconnect immediately. Wait BackoffTime = Initial * 2^Retry + Random(0, 1000ms). The randomness spreads the load over time, allowing the server to recover.

9.2 Sticky Sessions

When using a Load Balancer (Nginx/HAProxy) for WebSockets, you MUST enable Sticky Sessions (Session Affinity) so that the handshake and subsequent frames go to the same physical server.

10. Interactive Decision Visualizer: Polling vs WebSocket

Experience the difference in efficiency.

Polling: Constant traffic, high overhead.
WebSocket: One connection, instant data push.

Protocol Efficiency Simulator

Visualize network traffic and latency

Short Polling

📱

🖥️

Active (Interval: 2s)

WebSocket

📱

🖥️

Connected (Persistent)

        > System Ready.
    

11. System Walkthrough

Let’s trace a notification when User A likes User B’s photo.

Scenario: The “Like” Notification

User A: Clicks “Like”. HTTP POST /v1/notify sends to API.
Notification Engine:
- Saves notification to Cassandra (for history).
- Looks up User B’s active connection in Redis: GET user:B:connection.
- Result: gateway-node-05.
Routing: Engine sends internal RPC (gRPC) to gateway-node-05.
WebSocket Push:
- Gateway constructs the WebSocket Frame.
- Opcode: 0x1 (Text Frame).
- Payload: JSON {"type": "like", "from": "User A"}.
- Pushes data to the open TCP socket file descriptor.
Client: Phone receives data packet, wakes up app, shows banner.

12. Low-Level Optimizations

12.1 The C10M Problem (10 Million Connections)

Handling 10k connections is easy. 10M is hard.

Kernel Tuning: Linux default file descriptor limit is 1024. We must increase fs.file-max and ulimit -n to > 1 Million per server.
Ephemeral Ports: When a server connects to Redis/DB, it uses a local port. We run out of 65k ports quickly.
- Solution: Enable net.ipv4.tcp_tw_reuse to reuse sockets in TIME_WAIT state.
Memory Optimization: In Go, each Goroutine takes 2KB. 1M connections = 2GB RAM. In Java (Threads), 1M connections = Crash. Use Non-blocking I/O (Netty/Node.js/Go) to handle many connections on few threads.

12.2 Message Batching

If a user receives 50 likes in 1 second, don’t send 50 frames.

Nagle’s Algorithm: TCP automatically buffers small packets.
Application Batching: Buffer notifications for 500ms and send [msg1, msg2, ...] in one frame to save CPU/Bandwidth.

13. Interview Gauntlet

Q1: How do you handle “Read Receipts” for 1 Million users?

Answer: Do not acknowledge every message immediately. Batch acknowledgments (e.g., “Read up to ID 500”) and send every 5 seconds.

Q2: What happens if a WebSocket Server crashes?

Answer: 500k clients lose connection. They all try to reconnect instantly (Thundering Herd). We must use Jitter (Random Backoff) on the client side to spread the reconnect spike over 30-60 seconds.

Q3: How do you secure WebSockets?

Answer: Use WSS (TLS). Authenticate during the HTTP Upgrade handshake using a Bearer Token (JWT). Do not send credentials inside the WebSocket frame.

Q4: Why Redis Pub/Sub vs Kafka?

Answer: Redis Pub/Sub is ephemeral (Fire-and-forget). It’s fast for “Online” users. Kafka is durable log storage. Use Kafka if you need to replay messages or guarantee delivery to offline services.

Q5: Can we use a Load Balancer for WebSockets?

Answer: Yes, but it must support “Sticky Sessions” (Session Affinity) during the handshake phase, or use Layer 4 (TCP) balancing to keep the pipe open to the same backend.

14. Summary: The Whiteboard Strategy

1. Requirements

Func: Real-time, Offline, Fan-out.
Scale: 10M Concurrent Users.

2. Architecture

[Notification Engine]
↓
[Redis Pub/Sub]
↙ ↘
[Gateway 1] [Gateway 2]

* Stateful: Gateway holds TCP.
* Stateless: Engine handles logic.

3. Data & API

            GET /ws/connect -> Upgrade

            Redis: SET user:1 gateway:10.0.0.5

4. Deep Dives

Reliability: Jitter for Thundering Herd.
Scaling: Redis Pub/Sub backplane.
Fallback: APNS/FCM if WebSocket closed.