Consistent hashing minimizes data redistribution when nodes are added or removed from a distributed system. It's used in distributed caches, databases, and load balancers.
With simple modulo hashing (shard = hash(key) % N), adding or removing one server causes nearly all keys to map to different servers — requiring a massive data migration.
If you go from N=3 to N=4 servers, about 75% of all cached keys need to be moved.
Imagine a circular ring (hash ring) with values from 0 to 2³²-1.
Node A (hash=10)
/
0 ——————————— Ring ———————————
\ Node B (hash=70)
\ Key X (hash=50) → Node B
\ /
Node C (hash=90)
When a node is added:
When a node is removed:
A problem with basic consistent hashing: with few servers, the ring partitions may be uneven, creating hotspots.
Solution: Virtual nodes. Each physical server is assigned multiple positions on the ring (e.g., 100–200 virtual nodes per server). This:
Cassandra, DynamoDB, and Chord all use virtual nodes.
| System | Use of Consistent Hashing | |---|---| | Cassandra | Partitioning data across nodes | | DynamoDB | Key-based partitioning | | Memcached/Redis Cluster | Distributing cache keys | | Nginx | Consistent upstream routing | | CDNs | Routing requests to cache nodes |