In modern distributed systems, horizontal scaling is the only reliable defense against the inevitable traffic spike. However, simply adding more servers to a cluster introduces the "Thundering Herd" problem: without a mechanism to distribute incoming requests efficiently, you risk overwhelming specific nodes while others sit idle.
This is where the load balancer (LB) steps in. Acting as the traffic cop of your backend infrastructure, the load balancer is responsible for ensuring high availability and reliability. But an LB is only as good as the algorithm it uses to route traffic.

While there are dozens of exotic routing strategies available—from Random to IP-Hash—the architectural debate almost always centers on two contenders: the simplicity of the static Round Robin versus the intelligence of the dynamic Least Connections.
In this article, we will dissect both algorithms, analyze how they handle edge cases like long-lived connections, and cover the essential companion configurations—health checks and sticky sessions—that determine the success of your load balancing strategy.
The Static Approach: Round Robin Explained
Round Robin is the default standard for a reason. It is the "Hello World" of load balancing algorithms—easy to understand, easy to implement, and surprisingly effective for many standard use cases.
How It Works Under the Hood
Round Robin operates on a cyclical, list-based distribution method. The load balancer maintains a list of upstream servers and forwards requests sequentially. The first request goes to Server A, the second to Server B, the third to Server C, and the fourth loops back to Server A.
Crucially, this is a static algorithm. The load balancer requires absolutely no knowledge of the server's current state, CPU load, or memory usage. It simply moves down the list.
The Pros: Simplicity and Speed
Because the load balancer doesn't need to query the backend servers for their status or calculate active connection metrics, Round Robin has virtually zero computational overhead. It is essentially stateless routing. This makes it an excellent choice for high-throughput LBs where every millisecond of latency at the ingress point matters.
The Cons: The 'Uneven Load' Trap
The greatest weakness of Round Robin is its blindness to request complexity. It treats every request as equal, which is rarely true in production environments.
Consider a scenario with two endpoints:
GET /style.css(Static asset, takes 5ms)GET /analytics/report(Heavy DB aggregation, takes 500ms)
If Server A coincidentally receives ten reporting requests in a row, while Server B receives ten CSS requests, the load balancer considers the distribution "fair" because both servers received ten requests. In reality, Server A is likely thrashing its CPU and timing out, while Server B is effectively idle. This leads to the "Uneven Load" trap, where aggregate throughput drops despite available capacity.
Variation: Weighted Round Robin
Production environments rarely have homogenous hardware. You might have a legacy server with 8GB of RAM alongside a new instance with 32GB.
To address this, most load balancers support Weighted Round Robin. This allows you to assign a numerical weight to servers, dictating the ratio of traffic they receive.
Nginx Configuration Example:
upstream backend_hosts { # This server gets 3x the traffic of the legacy server server app-01.internal weight=3; server app-02.internal weight=1;}The Dynamic Approach: Least Connections Explained
When your application logic varies significantly in processing time, static algorithms fail. This is where Least Connections (or Least Conn) shines by introducing state awareness to the routing decision.
How It Works: Counting Active States
Least Connections is a dynamic algorithm. The load balancer tracks the number of active, open connections between itself and each upstream server. When a new request arrives, the LB scans its internal table and forwards the packet to the server with the lowest number of current active connections.
If Server A is bogged down processing a heavy SQL query, its connection remains open. Server B, having finished its lightweight tasks quickly, will have a lower connection count and will receive the next incoming request.
The Pros: True Load Awareness
This algorithm is vastly superior for backends with varying service times, such as:
- Video transcoding services.
- APIs serving both simple JSON responses and complex file exports.
- Microservices behind a mesh.
By routing away from busy servers, Least Connections naturally smooths out the load, preventing the "convoy effect" where one slow request causes a backlog behind it.
The Cons: Complexity and 'Thundering Herd' Risks
There are two main trade-offs:
- Overhead: The LB must maintain state. In massive scale environments (millions of concurrent connections), the cost of counting connections and sorting priorities is non-zero, though usually negligible on modern hardware.
- The Cold Start Problem: If you add a fresh server to the pool (or restart one), it starts with zero connections. The Least Connections algorithm will immediately see this "winner" and route all new traffic to it until its connection count matches the peers. This can instantly overwhelm a cold server before its caches are warm.
Nginx Configuration Example:
upstream backend_hosts { least_conn; # Enables the algorithm server app-01.internal; server app-02.internal;}Critical Companion Configurations
Regardless of the algorithm you choose, two specific configurations are required to make your load balancer production-ready.
Health Checks: The Guard Rails
A load balancing algorithm is useless if it routes traffic to a dead server.
- Passive Health Checks: (Common in Nginx Open Source) The LB waits for a request to fail (e.g., a 502 Bad Gateway) before marking the server as unhealthy. This means some users will see errors.
- Active Health Checks: (Common in HAProxy / AWS ALB / Nginx Plus) The LB periodically pings a specific status endpoint (e.g.,
/healthz). If the server doesn't respond with a 200 OK, it is removed from rotation before user traffic is affected.
Sticky Sessions (Session Persistence)
Ideally, backend applications should be stateless. In reality, many legacy apps store session data (like shopping carts) in the server's local RAM rather than a distributed Redis cache.
If a user logs in on Server A, and the next request is routed to Server B via Round Robin, the user will appear logged out. To fix this, we use Sticky Sessions.
This is usually achieved via IP Hash (routing based on client IP) or Session Cookies. Note that enabling sticky sessions effectively overrides your load balancing algorithm, as traffic from a specific user is pinned to a specific node regardless of the node's current load.
Decision Matrix: Which Algorithm Should You Choose?
Choosing between Round Robin and Least Connections is rarely about personal preference; it is about workload characteristics.
Use Round Robin (or Weighted Round Robin) when:
- Your servers have identical specifications.
- Your requests are stateless, short-lived, and predictable (e.g., high-volume static content, simple GET requests).
- You want the lowest possible latency overhead at the load balancer level.
Use Least Connections when:
- You handle long-lived connections (WebSockets, Server-Sent Events).
- Your requests have highly variable processing times (e.g., complex SQL queries mixed with health checks).
- You are dealing with file uploads or media streaming.
Conclusion
For many developers starting out, Round Robin is the "set it and forget it" solution that works well enough for 80% of traffic patterns. However, as your application scales and the complexity of your requests diverges, moving to Least Connections becomes necessary to prevent resource exhaustion on specific nodes.
Remember, the algorithm is just one piece of the puzzle. Without active health checks to prune dead nodes and proper observability to visualize server response times, even the smartest algorithm cannot save your infrastructure.
Call to Action: Take a moment today to audit your Nginx upstream blocks or your AWS Target Group settings. Are you using the default Round Robin for a WebSocket service? A simple config change to least_conn might be the quick performance win your system needs.