Load balancing refers to the process of efficiently distributing incoming network traffic across a group of servers.
A load balancer is a system that distributes network or application traffic across a cluster of servers. Load balancers are typically used to improve performance and availability of applications and websites. When designing a load balancer, there are several factors to take into account, such as traffic patterns, server capacity, and failover scenarios.
There are three main types of load balancers: hardware, virtual, and software. Hardware load balancers are physical appliances that sit between the client and the server. Virtual load balancers are deployed on a hypervisor, while software load balancers are installed on a server operating system.
When designing a load balancer, it's important to select the right type for your needs. For example, if you need high availability, then you'll want to choose a hardware or virtual load balancer over a software one. If cost is a major concern, then a software load balancer might be the best option.
There are four main algorithms used by load balancers: round robin, least connections, least time, and weighted round robin. Round robin is the most common algorithm used and it simply cycles through the list of servers in order until it reaches the end of the list. The least connections algorithm sends traffic to the server with the fewest active connections. The least time algorithm is similar to least connections but takes into account the amount of time each server has been active rather than just the number of connections. Weighted round robin is similar to regular round robin but allows for some servers to receive more traffic than others based on their capacity or other factors.
Load balancing is an essential part of any large-scale system design. They frequently come up during system design interviews and it’s important to know when to and when not to use them.