Load Balancing | System Design Library

What is a Load Balancer?

A load balancer sits in front of a pool of servers and routes each incoming request to one of them. Its goals are:

Distribute traffic evenly to prevent hotspots
Eliminate single points of failure — if a server dies, traffic routes elsewhere
Enable horizontal scaling — add more servers without changing clients

Load Balancing Algorithms

Round Robin

Requests are sent to each server in turn: server 1, server 2, server 3, server 1, ...

Simple and predictable
Doesn't account for server capacity or current load
Best for: Stateless servers with similar specs and request costs

Weighted Round Robin

Servers get a weight proportional to their capacity. A server with weight 3 receives 3x more traffic than a server with weight 1.

Handles heterogeneous server pools
Still doesn't account for real-time load

Least Connections

New requests go to the server with the fewest active connections.

Better than round robin when request duration varies significantly
Best for: Long-lived connections (WebSockets, streaming)

IP Hash

The client's IP address is hashed to consistently route that client to the same server.

Provides session persistence ("sticky sessions")
Problem: Breaks if servers are added/removed (rehashing)

Least Response Time

Requests go to the server with the lowest combination of active connections and response time.

Most sophisticated of the common algorithms
Requires the load balancer to track response times

Layer 4 vs. Layer 7 Load Balancing

| Feature | Layer 4 (Transport) | Layer 7 (Application) | |---|---|---| | Operates on | TCP/UDP | HTTP/HTTPS | | Sees request content | No | Yes | | Routing flexibility | Limited | High (URL, headers, cookies) | | Performance | Faster | Slightly slower | | SSL termination | No | Yes | | Example | AWS NLB | AWS ALB, Nginx, HAProxy |

L7 load balancers can route /api/* to API servers and /static/* to file servers, enabling sophisticated architectures.

Health Checks

Load balancers continuously poll each backend server (e.g., GET /health every 5 seconds). If a server fails N checks in a row, it's removed from the pool. When it recovers, it's added back. This is the key mechanism for high availability.

Common Architectures

Active-Passive (Failover)

One load balancer is active; a standby takes over if it fails. Simple but wastes the passive node's capacity.

Active-Active

Multiple load balancers share traffic, often with DNS round-robin or anycast routing. Higher throughput and no wasted capacity.

Load Balancer as a SPOF

The load balancer itself can become a single point of failure. Solutions:

Run multiple load balancers with a floating IP (e.g., keepalived with VRRP)
Use a managed load balancer service (AWS ALB, GCP Load Balancer) with built-in redundancy
DNS-level load balancing as an outer layer

Interview Tips

Default to L7 load balancing for web services — the flexibility is usually worth it
Sticky sessions solve stateful session problems but hurt horizontal scaling. Push state to a shared cache (Redis) instead
Mention health checks when discussing high availability — they're what makes the HA actually work
For global systems, mention Global Load Balancing / GeoDNS to route users to the nearest region

System Design FundamentalsLoad Balancing