System Design FundamentalsLoad Balancing

A load balancer distributes incoming network traffic across multiple servers so no single server becomes a bottleneck. It's the entry point to virtually every scalable system.

What is a Load Balancer?

A load balancer sits in front of a pool of servers and routes each incoming request to one of them. Its goals are:

  1. Distribute traffic evenly to prevent hotspots
  2. Eliminate single points of failure — if a server dies, traffic routes elsewhere
  3. Enable horizontal scaling — add more servers without changing clients

Load Balancing Algorithms

Round Robin

Requests are sent to each server in turn: server 1, server 2, server 3, server 1, ...

  • Simple and predictable
  • Doesn't account for server capacity or current load
  • Best for: Stateless servers with similar specs and request costs

Weighted Round Robin

Servers get a weight proportional to their capacity. A server with weight 3 receives 3x more traffic than a server with weight 1.

  • Handles heterogeneous server pools
  • Still doesn't account for real-time load

Least Connections

New requests go to the server with the fewest active connections.

  • Better than round robin when request duration varies significantly
  • Best for: Long-lived connections (WebSockets, streaming)

IP Hash

The client's IP address is hashed to consistently route that client to the same server.

  • Provides session persistence ("sticky sessions")
  • Problem: Breaks if servers are added/removed (rehashing)

Least Response Time

Requests go to the server with the lowest combination of active connections and response time.

  • Most sophisticated of the common algorithms
  • Requires the load balancer to track response times

Layer 4 vs. Layer 7 Load Balancing

| Feature | Layer 4 (Transport) | Layer 7 (Application) | |---|---|---| | Operates on | TCP/UDP | HTTP/HTTPS | | Sees request content | No | Yes | | Routing flexibility | Limited | High (URL, headers, cookies) | | Performance | Faster | Slightly slower | | SSL termination | No | Yes | | Example | AWS NLB | AWS ALB, Nginx, HAProxy |

L7 load balancers can route /api/* to API servers and /static/* to file servers, enabling sophisticated architectures.

Health Checks

Load balancers continuously poll each backend server (e.g., GET /health every 5 seconds). If a server fails N checks in a row, it's removed from the pool. When it recovers, it's added back. This is the key mechanism for high availability.

Common Architectures

Active-Passive (Failover)

One load balancer is active; a standby takes over if it fails. Simple but wastes the passive node's capacity.

Active-Active

Multiple load balancers share traffic, often with DNS round-robin or anycast routing. Higher throughput and no wasted capacity.

Load Balancer as a SPOF

The load balancer itself can become a single point of failure. Solutions:

  • Run multiple load balancers with a floating IP (e.g., keepalived with VRRP)
  • Use a managed load balancer service (AWS ALB, GCP Load Balancer) with built-in redundancy
  • DNS-level load balancing as an outer layer

Interview Tips

  • Default to L7 load balancing for web services — the flexibility is usually worth it
  • Sticky sessions solve stateful session problems but hurt horizontal scaling. Push state to a shared cache (Redis) instead
  • Mention health checks when discussing high availability — they're what makes the HA actually work
  • For global systems, mention Global Load Balancing / GeoDNS to route users to the nearest region