Consistent Hashing | System Design Library

The Problem With Simple Hashing

With simple modulo hashing (shard = hash(key) % N), adding or removing one server causes nearly all keys to map to different servers — requiring a massive data migration.

If you go from N=3 to N=4 servers, about 75% of all cached keys need to be moved.

How Consistent Hashing Works

Imagine a circular ring (hash ring) with values from 0 to 2³²-1.

Map servers onto the ring: Hash each server's name/IP to a point on the ring
Map keys onto the ring: Hash each key to a point on the ring
Find the responsible server: Walk clockwise from the key's position; the first server you encounter is responsible for that key

        Node A (hash=10)
       /
  0 ——————————— Ring ——————————— 
       \                    Node B (hash=70)
        \ Key X (hash=50) →  Node B
         \                  /
          Node C (hash=90)

Adding or Removing Nodes

When a node is added:

It takes over some keys from its clockwise neighbor
Only the keys between the new node and its predecessor move
~1/N of total keys are redistributed (instead of ~100%)

When a node is removed:

Its keys are transferred to its clockwise successor
Only ~1/N keys are affected

Virtual Nodes (Vnodes)

A problem with basic consistent hashing: with few servers, the ring partitions may be uneven, creating hotspots.

Solution: Virtual nodes. Each physical server is assigned multiple positions on the ring (e.g., 100–200 virtual nodes per server). This:

Creates much more even key distribution
Allows servers with more capacity to have more vnodes
Makes node addition/removal even smoother

Cassandra, DynamoDB, and Chord all use virtual nodes.

Real-World Uses

| System | Use of Consistent Hashing | |---|---| | Cassandra | Partitioning data across nodes | | DynamoDB | Key-based partitioning | | Memcached/Redis Cluster | Distributing cache keys | | Nginx | Consistent upstream routing | | CDNs | Routing requests to cache nodes |

Interview Tips