Skip to main content
cracking the tech interview system design and algorithms in java 25

Design a Real-Time Messaging App

13 min read Chapter 5 of 75
Summary

Covers WebSocket gateway design, message fanout strategies, delivery...

Covers WebSocket gateway design, message fanout strategies, delivery receipt tracking with read/delivered states, end-to-end encryption with Signal Protocol concepts, and horizontal scaling with connection-aware routing.

Requirements

Functional Requirements

  1. One-to-one messaging — Users send text messages to each other in real time.
  2. Group chats — Support groups of up to 500 members with shared message history.
  3. Delivery receipts — Three-state tracking: sent, delivered, read.
  4. Media sharing — Images, videos, documents up to 100 MB per file.
  5. Online presence — Display whether a user is online, offline, or “last seen” timestamp.

Non-Functional Requirements

RequirementTarget
Message delivery latency< 200 ms (same region)
Concurrent connections500M+ persistent WebSocket connections
Message durabilityNo message loss — at-least-once delivery guarantee
EncryptionEnd-to-end encrypted; server never sees plaintext
Availability99.99% uptime
Message orderingPer-conversation total order

Capacity Estimation

Assume 1 billion registered users, 500 million daily active users (DAU), and an average of 50 messages sent per user per day.

MetricEstimate
Daily messages500M × 50 = 25 billion messages/day
Messages per second~290,000 msg/s
Concurrent WebSocket connections~500M at peak
Average message size (text)~200 bytes
Daily text storage25B × 200 bytes ≈ 5 TB/day
Media messages (10% of total)2.5B messages × 500 KB avg ≈ 1.25 PB/day
Bandwidth (text only)~10 GB/s outbound

Over 5 years, text storage alone reaches ~9 PB before replication. Media storage dominates costs and requires tiered storage with CDN offloading.

High-Level Design

┌──────────┐       ┌──────────────────┐       ┌─────────────────┐
│  Client   │◄────►│  WebSocket       │◄────►│  Message         │
│  (Mobile/ │  WS  │  Gateway Cluster │  gRPC │  Service         │
│   Web)    │      │  (Connection Mgmt)│      │  (Routing/Fan-out)│
└──────────┘       └──────────────────┘       └────────┬────────┘

                          ┌─────────────────────────────┼──────────────────┐
                          │                             │                  │
                   ┌──────▼──────┐            ┌────────▼────────┐  ┌──────▼──────┐
                   │  Message     │            │  Presence        │  │  Push        │
                   │  Store       │            │  Service         │  │  Notification│
                   │  (Cassandra) │            │  (Redis Cluster) │  │  Service     │
                   └─────────────┘            └─────────────────┘  └─────────────┘

                   ┌─────────────┐            ┌─────────────────┐  ┌──────▼──────┐
                   │  Media       │            │  Connection      │  │  APNs/FCM   │
                   │  Service     │            │  Registry (Redis)│  │             │
                   │  + CDN       │            └─────────────────┘  └─────────────┘
                   └─────────────┘

Core components:

  • WebSocket Gateway — Maintains persistent connections. Horizontally scaled behind a Layer 4 load balancer. Each gateway server handles 50K–100K concurrent connections.
  • Message Service — Receives messages, persists them, and routes to the correct gateway holding the recipient’s connection.
  • Connection Registry — A Redis cluster mapping userId → gatewayServerId so the Message Service knows where to forward messages.
  • Presence Service — Tracks online/offline state and “last seen” using heartbeat signals from the WebSocket Gateway.
  • Push Notification Service — Delivers push notifications via APNs (iOS) and FCM (Android) when the recipient has no active WebSocket connection.
  • Media Service — Handles upload, thumbnail generation, and CDN distribution for images, videos, and documents.

Deep Dive

WebSocket Gateway

The WebSocket Gateway manages the full connection lifecycle: TLS handshake, authentication token validation, heartbeat monitoring, and graceful reconnection.

Connection lifecycle:

  1. Client opens a WebSocket connection and sends an auth token in the first frame.
  2. Gateway validates the token against the Auth Service, extracts the userId, and registers the mapping userId → gatewayId in the Connection Registry.
  3. Gateway starts a heartbeat loop — the client pings every 30 seconds; if three consecutive pings are missed, the server closes the connection and marks the user offline.
  4. On disconnect, the gateway removes the registry entry and publishes an offline event to the Presence Service.

Connection-aware routing with Redis Pub/Sub:

Sticky sessions at the load balancer level are fragile — they break on server restarts and create uneven load distribution. A better approach uses the Connection Registry for routing. When Message Service needs to deliver a message to User B:

  1. Look up userB → gateway-7 in the Connection Registry.
  2. Publish the message to a Redis Pub/Sub channel named gateway-7.
  3. Gateway-7 subscribes to its own channel, receives the message, and pushes it down User B’s WebSocket.

This decouples the Message Service from knowing which gateway holds which connection.

// WebSocket handler using Java 25 virtual threads
public class ChatWebSocketHandler implements WebSocketHandler {

    private final ConnectionRegistry registry;
    private final MessageService messageService;
    private final PresenceService presenceService;

    @Override
    public void onOpen(WebSocketSession session) {
        Thread.startVirtualThread(() -> {
            String userId = authenticate(session);
            registry.register(userId, gatewayId(), session);
            presenceService.markOnline(userId);
            startHeartbeatMonitor(userId, session);
        });
    }

    @Override
    public void onMessage(WebSocketSession session, String payload) {
        Thread.startVirtualThread(() -> {
            String userId = registry.getUserId(session);
            ChatMessage message = ChatMessage.parse(payload);
            messageService.process(userId, message);
        });
    }

    @Override
    public void onClose(WebSocketSession session, CloseReason reason) {
        Thread.startVirtualThread(() -> {
            String userId = registry.getUserId(session);
            registry.deregister(userId);
            presenceService.markOffline(userId);
        });
    }

    private void startHeartbeatMonitor(String userId, WebSocketSession session) {
        int missedPings = 0;
        while (session.isOpen()) {
            try {
                Thread.sleep(Duration.ofSeconds(30));
                if (!receivedPingSince(session, Duration.ofSeconds(30))) {
                    missedPings++;
                    if (missedPings >= 3) {
                        session.close(CloseReason.GOING_AWAY);
                        return;
                    }
                } else {
                    missedPings = 0;
                }
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                return;
            }
        }
    }
}

Virtual threads shine here — each connection gets its own lightweight thread for heartbeat monitoring without consuming OS threads. A gateway handling 100K connections spawns 100K virtual threads at negligible cost.

Message Flow

Send path for 1:1 messaging:

  1. Sender’s client encrypts the message and sends it via WebSocket to their connected gateway.
  2. Gateway forwards the message to the Message Service via gRPC.
  3. Message Service generates a Snowflake-style message ID (timestamp-embedded for ordering), persists the message in the Message Store, and returns a SENT ack to the sender.
  4. Message Service looks up the recipient in the Connection Registry.
  5. If online: forward to the recipient’s gateway → push via WebSocket → recipient client acks DELIVERED.
  6. If offline: enqueue in the offline message queue and trigger a push notification.

Snowflake-style message IDs embed a millisecond timestamp in the high bits, followed by a machine ID and a sequence counter. This guarantees uniqueness, rough time ordering across partitions, and avoids coordination between servers.

// Message model with sealed delivery status
public sealed interface DeliveryStatus
        permits DeliveryStatus.Sent, DeliveryStatus.Delivered, DeliveryStatus.Read {

    record Sent(Instant serverTimestamp) implements DeliveryStatus {}
    record Delivered(Instant deviceTimestamp) implements DeliveryStatus {}
    record Read(Instant readTimestamp) implements DeliveryStatus {}
}

public record ChatMessage(
    long messageId,           // Snowflake ID — encodes timestamp + machine + seq
    String conversationId,
    String senderId,
    String recipientId,
    byte[] encryptedPayload,  // E2E encrypted — server never decrypts
    MessageType type,
    DeliveryStatus status,
    Instant createdAt
) {
    public enum MessageType { TEXT, IMAGE, VIDEO, DOCUMENT }

    public Instant extractTimestamp() {
        // Extract millisecond timestamp from high 41 bits of Snowflake ID
        long timestampMs = (messageId >> 22) + EPOCH_OFFSET;
        return Instant.ofEpochMilli(timestampMs);
    }
}

Group messaging fanout:

For a group with N members, the sender sends one message to the server. The Message Service fans out N-1 copies — one per recipient. For small groups (< 50 members), fan-out-on-write works well: the server writes a copy into each member’s inbox queue. For large groups (50–500 members), a hybrid approach writes the message once in the group’s message log and notifies online members immediately, while offline members pull from the group log on reconnect.

Delivery Receipts

The receipt state machine tracks each message through three states:

SENT ──────► DELIVERED ──────► READ
(server ack)  (device ack)     (UI opened)

The protocol works as follows:

  1. SENT — When the Message Service persists the message, it acks back to the sender’s client. The sender’s UI shows a single checkmark.
  2. DELIVERED — When the recipient’s device receives the message via WebSocket, the client sends a DELIVERED ack back through the gateway to the Message Service. The service updates the message status and notifies the sender’s client. The sender’s UI shows a double checkmark.
  3. READ — When the recipient opens the conversation containing the message, the client sends a READ ack. The sender’s UI shows blue checkmarks.
// Receipt state machine with transition validation
public class ReceiptStateMachine {

    private static final Map<Class<? extends DeliveryStatus>, Set<Class<? extends DeliveryStatus>>>
        VALID_TRANSITIONS = Map.of(
            DeliveryStatus.Sent.class, Set.of(DeliveryStatus.Delivered.class),
            DeliveryStatus.Delivered.class, Set.of(DeliveryStatus.Read.class),
            DeliveryStatus.Read.class, Set.of()  // terminal state
        );

    public DeliveryStatus transition(DeliveryStatus current, DeliveryStatus next) {
        Set<Class<? extends DeliveryStatus>> allowed =
            VALID_TRANSITIONS.getOrDefault(current.getClass(), Set.of());

        if (!allowed.contains(next.getClass())) {
            throw new IllegalStateException(
                "Invalid transition: %s → %s".formatted(
                    current.getClass().getSimpleName(),
                    next.getClass().getSimpleName()
                )
            );
        }
        return next;
    }
}

For group messages, delivery receipts aggregate per member. The sender sees “delivered to 47 of 50 members” rather than individual receipts for each member.

Offline Message Handling

When a recipient has no active connection:

  1. The Message Service writes the message to a per-user offline queue in the Message Store, keyed by (recipientId, messageId). Because message IDs embed timestamps, the queue maintains insertion order naturally.
  2. A push notification is sent via APNs or FCM with a lightweight payload (sender name + “New message” — no content, since the server holds only encrypted payloads).
  3. When the client reconnects, it sends a sync request with the ID of its last received message.
  4. The Message Service streams all messages with IDs greater than the client’s last-seen ID, in order.
  5. As the client acks each delivered message, the offline queue entries are removed.

Messages expire after 30 days (configurable per deployment). The TTL is enforced at the storage layer — Cassandra’s built-in TTL feature handles this efficiently without additional cleanup jobs.

End-to-End Encryption

The encryption architecture follows the Signal Protocol’s core concepts:

Key exchange (X3DH — Extended Triple Diffie-Hellman):

Each user generates three key pairs at registration time:

  • Identity key pair — Long-term, persistent across devices.
  • Signed pre-key pair — Medium-term, rotated weekly.
  • One-time pre-key bundle — Ephemeral keys uploaded in batches. Each key is used exactly once.

When Alice wants to message Bob for the first time, she fetches Bob’s pre-key bundle from the server and performs X3DH to derive a shared secret. This shared secret initializes a Double Ratchet session.

Per-message encryption:

The Double Ratchet algorithm rotates encryption keys after every message exchange. Each message uses a unique symmetric key derived from a chain of HMAC-based key derivation functions (HKDF). This provides forward secrecy — compromising a single message key reveals no past or future messages.

Server’s role:

The server stores and relays only encrypted payloads. It holds pre-key bundles for key exchange but never possesses the private keys needed to decrypt message content. The server does see metadata: who is messaging whom, message timestamps, and message sizes.

Trust model:

Users verify each other’s identity keys out-of-band (QR code scanning, safety number comparison). If a user’s identity key changes (new device, re-installation), contacts receive a warning that the encryption keys have changed.

Group Messaging

Fan-out strategy:

StrategyProsConsBest for
Fan-out on writeFast reads; each user reads from their inboxHigh write amplification for large groupsSmall groups (< 50)
Fan-out on readSingle write per messageSlower reads; N reads per recipientLarge groups, broadcast channels
HybridBalancedMore complexMedium groups (50–500)

Group encryption with Sender Keys:

Instead of encrypting a message N times (once per member), the Signal Protocol’s Sender Key scheme works as follows:

  1. Each group member generates a Sender Key and distributes it to all other members via pairwise encrypted channels.
  2. When sending a group message, the sender encrypts once with their Sender Key.
  3. All recipients who hold that Sender Key can decrypt.
  4. When a member leaves, the sender generates a new Sender Key and redistributes it to remaining members — the departing member can no longer decrypt future messages.

Membership management:

Group metadata (members, roles, name, avatar) is stored server-side. Admin operations (add/remove member, promote admin) are serialized through a single leader partition to avoid conflicting concurrent edits.

Bottlenecks & Scaling

BottleneckMitigation
WebSocket connection limits — A single server handles ~100K connections due to file descriptor and memory limits.Horizontal scaling with a fleet of gateway servers behind a Layer 4 load balancer. The Connection Registry decouples routing from physical server placement.
Thundering herd on reconnection — Server restart or network blip causes all 100K clients to reconnect simultaneously.Clients use exponential backoff with jitter: delay = min(base * 2^attempt + random(0, 1000ms), maxDelay). Gateway servers implement connection rate limiting during recovery.
Message Store hotspots — Popular group chats create write hotspots on a single partition.Partition the Message Store by conversationId. For extremely hot groups, shard the group’s message log across multiple partitions using conversationId + bucketId.
Media upload throughput — Large file uploads compete with real-time messaging on the same connection.Separate media uploads to a dedicated HTTP upload service. Use pre-signed URLs for direct-to-object-storage uploads, bypassing the application tier entirely.
Presence fanout — A user with 1,000 contacts going online triggers 1,000 notifications.Batch presence updates. Deliver presence only to users who have the contact’s chat screen open, not to all contacts. Rate-limit presence updates to once per 30 seconds.
Redis Connection Registry size — 500M entries at ~200 bytes each ≈ 100 GB.Shard across a Redis Cluster with 50+ nodes. Use hash-based slot distribution for even load. TTL entries at 2× heartbeat interval for automatic cleanup on ungraceful disconnects.

Interviewer Tips

Common follow-up questions interviewers ask:

  1. “How do you handle message ordering across multiple devices?” — Use per-device sequence numbers combined with Lamport timestamps. The server assigns a global sequence per conversation to resolve conflicts.

  2. “What happens during a network partition between data centers?” — Messages are queued locally at each data center. On partition heal, conflict resolution uses last-writer-wins per message ID (Snowflake IDs prevent collisions). Delivery receipts are crdt-mergeable (max timestamp wins).

  3. “How do you support message search if everything is E2E encrypted?” — Search happens client-side. The client maintains a local search index (SQLite FTS5) of decrypted messages. Server-side search is not possible without breaking encryption.

  4. “How would you implement typing indicators?” — Ephemeral presence events sent via WebSocket — not persisted, not queued, fire-and-forget. Rate-limited to one event per 3 seconds per conversation.

  5. “How do you migrate a user to a new device?” — The user scans a QR code from the old device to the new one, transferring the identity key. Message history can be transferred via an encrypted backup in cloud storage (Google Drive / iCloud), encrypted with a user-supplied passphrase — the server and cloud provider cannot decrypt it.

  6. “What about message editing and deletion?” — Editing sends a new message referencing the original message ID with an EDIT type. Deletion sends a tombstone message. Recipients process these commands client-side. Previously delivered messages remain on the recipient’s device unless “delete for everyone” is sent within a time window (e.g., 48 hours).