Database DesignConsistent Hashing

Consistent hashing minimizes data redistribution when nodes are added or removed from a distributed system. It's used in distributed caches, databases, and load balancers.

The Problem With Simple Hashing

With simple modulo hashing (shard = hash(key) % N), adding or removing one server causes nearly all keys to map to different servers — requiring a massive data migration.

If you go from N=3 to N=4 servers, about 75% of all cached keys need to be moved.

How Consistent Hashing Works

Imagine a circular ring (hash ring) with values from 0 to 2³²-1.

  1. Map servers onto the ring: Hash each server's name/IP to a point on the ring
  2. Map keys onto the ring: Hash each key to a point on the ring
  3. Find the responsible server: Walk clockwise from the key's position; the first server you encounter is responsible for that key
        Node A (hash=10)
       /
  0 ——————————— Ring ——————————— 
       \                    Node B (hash=70)
        \ Key X (hash=50) →  Node B
         \                  /
          Node C (hash=90)

Adding or Removing Nodes

When a node is added:

  • It takes over some keys from its clockwise neighbor
  • Only the keys between the new node and its predecessor move
  • ~1/N of total keys are redistributed (instead of ~100%)

When a node is removed:

  • Its keys are transferred to its clockwise successor
  • Only ~1/N keys are affected

Virtual Nodes (Vnodes)

A problem with basic consistent hashing: with few servers, the ring partitions may be uneven, creating hotspots.

Solution: Virtual nodes. Each physical server is assigned multiple positions on the ring (e.g., 100–200 virtual nodes per server). This:

  • Creates much more even key distribution
  • Allows servers with more capacity to have more vnodes
  • Makes node addition/removal even smoother

Cassandra, DynamoDB, and Chord all use virtual nodes.

Real-World Uses

| System | Use of Consistent Hashing | |---|---| | Cassandra | Partitioning data across nodes | | DynamoDB | Key-based partitioning | | Memcached/Redis Cluster | Distributing cache keys | | Nginx | Consistent upstream routing | | CDNs | Routing requests to cache nodes |

Interview Tips

  • Consistent hashing solves the "resharding problem" — know this explanation cold
  • When asked about distributed caches (like Memcached/Redis Cluster), mention consistent hashing as the key distribution mechanism
  • Virtual nodes solve the uneven distribution problem — mention them to show depth of knowledge
  • The core insight: only K/N keys need to move (K = total keys, N = number of nodes) when a node is added/removed, compared to moving almost all keys with modulo hashing