A load balancer distributes incoming network traffic across multiple servers so no single server becomes a bottleneck. It's the entry point to virtually every scalable system.
A load balancer sits in front of a pool of servers and routes each incoming request to one of them. Its goals are:
Requests are sent to each server in turn: server 1, server 2, server 3, server 1, ...
Servers get a weight proportional to their capacity. A server with weight 3 receives 3x more traffic than a server with weight 1.
New requests go to the server with the fewest active connections.
The client's IP address is hashed to consistently route that client to the same server.
Requests go to the server with the lowest combination of active connections and response time.
| Feature | Layer 4 (Transport) | Layer 7 (Application) | |---|---|---| | Operates on | TCP/UDP | HTTP/HTTPS | | Sees request content | No | Yes | | Routing flexibility | Limited | High (URL, headers, cookies) | | Performance | Faster | Slightly slower | | SSL termination | No | Yes | | Example | AWS NLB | AWS ALB, Nginx, HAProxy |
L7 load balancers can route /api/* to API servers and /static/* to file servers, enabling sophisticated architectures.
Load balancers continuously poll each backend server (e.g., GET /health every 5 seconds). If a server fails N checks in a row, it's removed from the pool. When it recovers, it's added back. This is the key mechanism for high availability.
One load balancer is active; a standby takes over if it fails. Simple but wastes the passive node's capacity.
Multiple load balancers share traffic, often with DNS round-robin or anycast routing. Higher throughput and no wasted capacity.
The load balancer itself can become a single point of failure. Solutions: