Rendezvous Hashing Explained (2020)

lucb1e 8 minutes ago

Cool, didn't know of it.

In a nutshell, assuming I understood it correctly: to find the storage location of a file, instead of doing the intuitive pattern of hash(filename)%num_servers (which would change completely every time the list of servers changes) you do something just a tiny bit more complicated:

    sort([hash(filename+serverID) for serverID in serverList])[0]

If you add a server to the list, it may become the new highest priority location for a given filename and it would have to move, but that's only the case for about 1 in length(serverList) files and makes the load balance equally across all servers again. And if the first entry (sortedList[0]) becomes unavailable, you fail over to the second one at [1] etc.

jongjong 23 minutes ago

I remember reading about Rendezvous Hashing years ago and found it simple, efficient and elegant. I was shocked that everyone was raving about consistent hashing because it is a more complex solution.

I implemented a Node.js library to go with my WebSocket library:

https://www.npmjs.com/package/skeleton-rendezvous

Another interesting point is that both Consistent Hashing and Rendezvous Hashing can be extended in different ways to provide more even distribution between nodes and to reduce the number of keys which need to be moved when adding and removing nodes.

The extended version of consistent hashing involves creating additional virtual buckets/sites around the ring to give each site more coverage.

The extended version of Rendezvous Hashing involves creating a kind of hierarchy to allow you to add more sites whilst only incurring a O(log n) performance penalty.

I found the extension of the Rendezvous approach more elegant and easier to implement as well.

For my solution, I wrote tests to show that it achieves O(log n) performance with respect to the number of sites and the allocation difference between the smallest and largest site was typically less than 15%... With 100 sites and 40000 keys, the difference between the smallest and biggest site was 30% but that's because we're talking about 400 keys per site on average... It's not so much about the large number of sites as it is about the average number of keys per site. As you add more keys per site, distribution becomes more even.

This last fact was highly relevant to my use case since each of my WebSocket servers (sites) could handle 10k to 20k concurrent connections... So distribution is highly even.

It got me wondering if maybe the reason why consistent hashing was so popular is because it provides more even distribution in scenarios where you have a relatively small average number of keys per site... In the old days, most servers (e.g. PHP, CGI) struggled to handle 1000 concurrent connections. But with the advent of Node.js, Golang, etc... some servers could support many tens of thousands of thousands of concurrent connections... So maybe the balance shifted but the habits did not?