Scaling WebSockets: How to Handle 1 Million Concurrent Connections
The "C10K problem" is ancient history. Today, we talk about the C1M problem: handling 1 million concurrent connections on a single server.
If you are building a real-time application—whether it's a chat app, a live sports feed, or a collaborative editing tool—you will eventually hit a wall. Your server will crash, new connections will time out, and your latency will spike. And the culprit is rarely your application code. It's usually the OS configuration.
In this deep dive, we will walk through the exact steps to tune a Linux server to handle 1 million concurrent WebSocket connections. We will cover file descriptors, ephemeral ports, and the architecture required to scale beyond a single node.
The Bottleneck is Not Node.js (Usually)
Many developers assume that Node.js (or Python/Ruby) is too slow to handle millions of connections. While it's true that a single thread has limits, the first bottleneck you will hit is almost always the Operating System.
By default, Linux is configured for general-purpose computing, not for maintaining millions of persistent TCP connections.
1. The "Too Many Open Files" Error
In Linux, everything is a file. A socket is a file. A file on disk is a file. When a process opens a connection, it consumes a file descriptor (FD).
Check your current limit:
ulimit -n # Output: 1024 (usually)
If you try to open 1,025 connections, your application will crash with EMFILE: too many open files.
The Fix:
You need to increase both the system-wide limit and the per-process limit.
Edit /etc/sysctl.conf:
fs.file-max = 2097152
Edit /etc/security/limits.conf:
* soft nofile 1048576 * hard nofile 1048576 root soft nofile 1048576 root hard nofile 1048576
Apply the changes with sysctl -p. Now your OS allows enough file descriptors. But we are just getting started.
2. The Ephemeral Port Exhaustion Trap
This is the most common "silent killer" of high-scale WebSocket architectures.
When a client connects to a server, it uses a source IP and a source port. When your server connects to a backend database or another service, it becomes the client and uses a local port.
There are only 65,535 ports available. If you have a load balancer or proxy in front of your WebSocket server, you might run out of ports for outgoing connections.
The TCP Tuple
A TCP connection is identified by a 4-tuple:
{Source IP, Source Port, Destination IP, Destination Port}
If your load balancer connects to your WebSocket server, and both are on specific IPs, you are limited by the number of available source ports (approx 60k).
The Fix:
-
Increase the local port range:
# /etc/sysctl.conf net.ipv4.ip_local_port_range = 1024 65535 -
Enable TCP Reuse:
net.ipv4.tcp_tw_reuse = 1This allows the OS to reclaim ports in the
TIME_WAITstate more quickly.
3. Memory Usage: The Real Cost of a Socket
How much RAM does 1 million connections consume?
In the past, a connection might cost a few kilobytes of kernel memory. In modern kernels, it's much more efficient, but application memory is the bigger factor.
If you use Node.js, every Socket object takes up heap space.
- Empty Socket: ~2-4 KB
- With Metadata (User ID, Channel info): ~10 KB
Math Time:
1,000,000 connections * 10 KB = 10 GB of RAM.
This is feasible on a modern server, but you must be careful.
Optimization Tip:
Don't store the entire User object on the socket. Store only the userId and fetch details from Redis when needed.
// BAD socket.user = { id: 1, name: "Alice", email: "...", bio: "..." }; // GOOD socket.userId = 1;
4. The Event Loop Lag
Node.js is single-threaded. If you have 1 million connected users, and you try to broadcast a message to all of them, you will block the event loop.
// DO NOT DO THIS users.forEach(user => { user.socket.send("Hello!"); });
Sending 1 million packets takes time. If it takes 0.01ms per send, that's 10 seconds of blocking time. Your server will be unresponsive for 10 seconds.
The Fix: Batching and Workers
- Don't broadcast to everyone at once. Chunk your broadcasts.
- Use multiple processes. Use Node.js
clustermodule or run multiple container instances.
5. Architecture for Infinite Scale
A single server can handle 1M connections with tuning. But what about 10M? 100M?
You need a Distributed WebSocket Architecture.
The Layered Approach
- Load Balancer (Nginx/HAProxy): Terminates SSL, distributes connections to backend nodes.
- WebSocket Nodes (Node.js/Go): Hold the active connections. They are "stateful" in the sense that they hold the socket, but "stateless" regarding business logic.
- Pub/Sub Layer (Redis/NATS): The glue that holds it all together.
How Pub/Sub Works
When User A (connected to Server 1) wants to send a message to User B (connected to Server 2):
- User A sends message to Server 1.
- Server 1 publishes the message to a Redis channel:
publish('user:B', payload). - Server 2 (and all other servers) are subscribed to Redis.
- Server 2 receives the message, checks if User B is connected locally.
- Server 2 sends the message to User B.
This architecture allows you to add servers horizontally without limits.
6. Testing Your Limits
Don't wait for production to fail. You need to test this.
Tools like Artillery or k6 are great, but for massive concurrency, you might need a fleet of client machines.
Tsung is an Erlang-based distributed load testing tool that is excellent for simulating millions of concurrent users.
Conclusion
Handling 1 million connections is a badge of honor for system engineers. It requires leaving the comfort zone of "npm install" and diving into Linux internals.
Summary Checklist:
- Increase
ulimit(Open Files). - Tune
sysctl(Port range, TCP reuse). - Optimize Application Memory (Store IDs, not Objects).
- Use a Pub/Sub backend (Redis) for horizontal scaling.
The next time your server crashes under load, don't just upgrade the instance size. Look at the kernel. The answer is usually there.
Explore Related Tools
Try these free developer tools from Pockit