Chapter 6 Building a Hash Table

What we’ll cover

So far we’ve been using the Ruby Hash class as the main storage mechanism for the key/value pairs received through the SET command. We also use it for the secondary dictionary necessary to implement the TTL related options of the SET command. We store the expiration timestamp of keys with TTLs, which allows us to know whether a key is expired or not.

Redis is written in C, which does not provide a collection similar to Ruby’s Hash. In C, you only get one collection out of the box, arrays.

Redis implements its own dictionary collection, in dict.c. Because the dictionary data structure is so central to how Redis functions, we will replace the use of the Ruby Hash class with a Dict class we will build from scratch.

We will also add the DEL command, the ability to delete keys is really important to Redis. Having the DEL command will also allow us to easily play with our Dict class to make sure that it handles deletion operations correctly.

We are not adding any new features in this chapter, beside the DEL command, we’re rewriting a key part of the system with lower level elements. Given that Redis’ dict data structure relies on arrays, we will still use the Ruby Array class. We could have reimplemented the Array class, and you’ll find an example in Appendix A, but arrays in C are not specific to Redis. On the other hand, the structure defined in dict.c is.

Let’s get to it.

Maps, Dictionaries, Associative Arrays

This chapter covers hash tables, which is one way of implementing a data structure commonly called Map, Dictionary or Associative Array. I will use the term “Dictionary” as I find that “Map” can be confusing, especially when working with languages providing a map function/method, such as Ruby! From my experience the term associative array, while very explicit, is not as common.

The basic definition of such data structure is one that holds zero or more key/value pairs, where a key cannot appear more than once.

A key operation (pun intended!) of a dictionary is the ability to retrieve a value given a key. The returned value is either empty if no element with such keys exists in the map, or the value mapped to the key if it exists.

Some definitions also include the ability to add, update or delete key/value pairs, which is not provided in immutable versions, where such operations would result in the creation of a new dictionary. The immutable versions will implement similar operations returning a new structure instead of modifying it.

There are multiple ways of implementing a data structure providing these operations. A naive and fairly inefficient version could be to use an array where each element is a key value pair:

def set(map, key, value)
  map.each do |pair|
    pair_key = pair[0]
    # Override the value if the key is already present
    if key == pair_key
      pair[1] = value
      return pair
    end
  end
  pair = [key, value]
  map << pair
  pair
end

def lookup(map, key)
  map.each do |pair_key, pair_value|
    return pair_value if key == pair_key
  end
  return
end

listing 6.1: A basic dictionary using an array

Given these functions, set & lookup, we could use them with an array:

map = []
set(map, "key-1", "value-1") # => ["key-1", "value-1"]
set(map, "key-2", "value-2") # => ["key-2", "value-2"]
set(map, "key-2", "value-3") # => ["key-2", "value-3"]

lookup(map, "key-1") # => "value-1"
lookup(map, "key-2") # => "value-2"
lookup(map, "key-3") # => nil

This approach works from an API standpoint, but it would show performance issues as we keep adding elements to the array. Because we must prevent duplicated keys, we need to iterate through the whole array every time we attempt to add a new pair if the key is not already present. This set implementation is an O(n) operation, where n is the number of pairs in the dictionary. The amount of time required to add an element is proportional to the size of the collection.

A lookup might not always require a complete scan of the array, if we’re lucky and find the key before the end, but it might, in the worst case scenario. The lookup operation is therefore also an O(n) operation.

For Redis, which should be able to handle hundreds of thousand of keys, even millions, and potentially billions, these performance issues are not acceptable.

One common implementation that addresses these performance issues is a hash table. Another possible implementation is a tree map, which uses a tree structure to store elements. For instance, the Java TreeMap class uses a Red-Black tree to maintain the underlying tree balanced. One of the benefits of a tree map compared to a hash table is that it stores elements in order, whereas a hash table does not.

In the next section we will learn how hash tables implement these operations in a more time efficient manner.

Before moving on and abandoning this implementation, it’s really important to note that while this implementation would not perform well with large collections, it might actually be one of the most efficient options for very small collections, such as with one or two pairs, thanks to its simplicity. If the array is small, finding an element requires very few steps and little memory overhead.

As a matter of fact, the Scala standard library does something similar for dictionaries with up to four pairs, it has special case classes meant to handle these fixed sized dictionaries, allowing them to be really fast as there’s no need for hashing or anything else.

Hash Tables

Hash tables are available in many programming languages as part of their standard libraries. Python has dict, Java has HashMap, Scala has Map, Elixir has Map, Rust has HashMap, Ruby’s Hash class is a hash table implementation too. You get it, they’re almost everywhere.

Hash tables can be implemented in different ways, the wikipedia article shows a few different examples. The one we’ll explore in this chapter uses a collision resolution called separate chaining. But why do we need collision resolution? To answer this we first need to look at the central element of a hash table, its hash function.

A hash function must obey a few properties, one of the most important ones being determinism, in other words, identical inputs should result in identical outputs. To explain why, let’s look at how the hashing function is used through a pseudo code implementation of a hash table:

function new_node(key, value, next_node)
    return Node(key, value, next_node)

function update_node(node, new_value)
    return Node(node.key, new_value)

function create_hash_table()
    table = allocate_array_of_arbitrary_initial_size()
    return table

function add_or_update_key_value_pair(table, key, value)
    hash = hash_function(key)
    index = hash % table.size
    node = table[index]
    if node == null
        table[index] = new_node(key, value, null)
    else
        while node != null && node.key != key
            node = node.next_node

        if node.nil?
            existing_node = table[index]
            table[index] = new_node(key, value, existing_node)
        else
            update_node(node, value)

function lookup_key(table, key)
    hash = hash_function(key)
    index = hash % table.size
    if table[index] == null
        return null
    else
        node = table[index]
        while node != null
            if node.key == key
                return key
            else
                node = node.next_node

        return null

listing 6.2: Pseudo-code hash table

The new_node function acts as the entry point of a linked list. A node contains a key, a value, and a next node value. If the next node value is null, the element is the last one in the list.

Prepending - a fancy word for “adding at the beginning” - an element to such list is done by first creating a single node list and then a second one, with the next_node value set to the first one:

node1 = new_node(key1, value1, null)
node2 = new_node(key2, value2, node1)

In this example node1 is a list with a single node, and node2 is a list with two nodes. The first node is the one with the key key2 and the value value2, its next_node value is equal to node1, which has the key key1 and value value1, it does not have a next_node value and is the last element of the list.

update_node works with an existing node and changes its value. It is a useful function when we find an existing pair with a matching key in add_or_update_key_value_pair. We explore this other function in more details below.

create_hash_table does only one thing, it allocates an array of arbitrary size. We purposefully do not define this function here. The size is not really important, as long as it creates a non empty array. The implementation of the allocation is also not really relevant to this example. Most operating systems provide such features, it’s therefore fair to assume that it would use the allocation operations provided by the operating system. malloc is a function in the C standard library that does provide the ability to manually allocate memory.

add_or_update_key_value_pair does more work and let’s walk through it, one line at a time. It takes three parameters, the table we want to insert the pair into, the key and the value. We first call hash_function with key. We’ll dive deeper into what an implementation of hash_function looks like later, but for now, let’s assume it returns an integer. Because the hash function is unaware of the size of the array, the returned value might be larger than the size of the array.

We use the modulo operation to convert the hash value returned by hash_function into a number between 0 and table.size - 1. We can now use the result of the modulo operation as an index. That’s why we have the create_hash_table function, to make sure that table is initialized with empty slots. These slots are often called buckets in hash table lingo.

If the bucket is empty, then we create a new node, add it to the bucket and we’re done. If the bucket is not empty, then things are more complicated.

There are two distinct cases to consider if there is already an item at the location obtained through the hash function. One of the nodes in the bucket might have the same key, in this case we want to override its value with the new value. This the case where we want to update an element in the array instead of adding it.

The other one is that the nodes already present might all have a different key, in which case we want to keep all the existing nodes and add the new one. This the case where we want to add a new pair, and is called a collision.

Let’s illustrate with an example. Let’s set the initial size of the array to 4, all the buckets are empty:

table = [nil, nil, nil, nil]

Let’s define a hash function, that returns the length of the input string:

function hash_function(string)
    return string.length

If we first call add_or_update_key_value_pair with the key "a" and the value "b", hash_function will return 1, the length of the string "a". 1 % 4 returns 1, so we add the pair at index 1:

table = [nil, Node("a", "b", nil), nil, nil]

Let’s call add_or_update_key_value_pair with the pair "cd"/"e", the length of "cd" is 2, 2 % 4 is 2, we insert the pair at index 2:

table = [nil, Node("a", "b", nil), Node("cd", "e", nil), nil]

Let’s now call add_or_update_key_value_pair with "fg"/"h". The length of "fg” is 2, but there’s already a pair at index 2. Because we want to keep all pairs we need a solution to resolve this collision. There are different strategies available to us here, and the one we’re going to use is called “separate chaining”.

The essence of this strategy is that each bucket contains a linked list of values. So in the previous example, we insert the new element at the beginning of the list at index 2. Note that prepending an element to a linked list is an O(1) operation, it takes the same amount of time regardless of the size of the list. This is the list once the operation is completed:

table = [nil, Node("a", "b", nil), Node("fg", "h", "Node("cd", "e", nil)), nil]

lookup_key is very similar to add_or_update_key_value_pair. We use the same process to find which bucket the key should be in. If the bucket is empty, we return null, the key is not present in the dictionary. On the other hand, if the bucket is not empty, we need to look through the list until we find the node with the key argument.

If we don’t find any, the key is not present in the dictionary.

Trying to avoid collisions

The hash_function we used in the previous works well as an example because of its simplicity but it would not be practical in the real world. To keep hash tables efficient, we want to reduce the number of collisions as much as possible. This is because iterating through the linked list is inefficient, if there are a lot of collisions, it could take a long time to loop through all the items in the bucket.

This is where the uniformity property of a hash function is really important. Uniformity helps reduce the likelihood of collision. In the previous example, if a hypothetical hash function had returned the values 1, 2 & 3, respectively, instead of 1, 2 & 2, there wouldn’t have been any conflicts.

Collisions are also related to the size of the underlying array. Regardless of the uniformity of the hash function, if the underlying array has a size n, storing n + 1 items cannot happen without at least one collision.

One approach would be to start by allocating a very large amount of memory, but this can be wasteful, because there could be a lot of memory allocated, but unused. Many hash table implementation have mechanisms to adjust the size as needed, and it turns out that Redis does this too, as we’ll see in the next section.

A good hash function that provides uniformity means that both operations add/update & lookup have an O(1) time complexity, meaning that the number of steps is always the same regardless of the number of elements already present. We first hash the value, transform it to an index and use the matching bucket.

On the other hand, a bad hash function without uniformity would make these operations O(n). In the absolute worst case scenario, all keys would land in the same bucket, and the number of operations required would depend on the number of elements already present in the linked list in the bucket.

Back to determinism

Now that we know how the result of a hash function is used, that is, it determines the location of a key/value pair in the underlying array, let’s go back to the determinism element of a hash function.

Let’s demonstrate why we need determinism by showing what would happen with a hash function that is not deterministic.

In Ruby, each object is given an object id, in the following examples, the two variables str1 & str2 are different instances, each holding the same value, and are therefore considered equal, but have different object_id values:

str1 = "hello"
str2 = "hello"
str1.object_id # => 180
str2.object_id # => 200
# The object_id values might be different on your machine, they are assigned at runtime
# and will therefore differ if you've created more Ruby objects beforehand for instance
str1 == str2 # => true

Let’s define a wrong hash function, which returns its input’s object_id:

def hash_function(object)
  object.object_id
end

Let’s manually walk through a small example, let’s start by creating a hash table of size 3 and add the pair a-key/a-value to it. Let’s re-use the same object_id from the previous example, and assume that a-key would have returned 180. 180 % 3 = 0, so we insert the new node at index 0:

table = [Node("a-key", "a-value", nil), nil, nil]

And let’s now call the lookup function with a different string holding the same value, and, reusing the previous example data again, assume that its object id is 200, 200 % 3 = 2. The lookup would look at the bucket at index 2, find a nil value and return nil, whereas the table does contain a pair with the key a-key.

A deterministic hash function prevents this.

Common Hash Functions

In order for a hash table implementation to be efficient, it needs a good hash function. Hash functions come in different flavors, as shown on wikipedia:

Cyclic redundancy checks
Checksums
Universal hash function families
Non-cryptographic
Keyed cryptographic
Unkeyed cryptographic

Some of the functions in the “Unkeyed cryptographic hash functions” category are pretty common. MD5 used to be very common to verify the integrity of a file downloaded over the internet. You would download the file, compute the md5 of the file locally and compare it against the md5 published by the author of the file. It is common to see sha256 used instead nowadays. This is what the Downloads page on ruby-lang.org does!

For a long time sha1 was the default algorithm used by git to hash commits and other objects. It now supports multiple algorithms such as sha256. This change was required after researchers proved that it was possible to forge two different inputs resulting in the same sha1 hash.

Redis uses SipHash which is in the “Keyed cryptographic hash functions” category. We will look closer at the SipHash algorithm below.

All Ruby objects implement a hash method, which happens to use the Siphash algorithm, the same algorithm Redis uses!

str1 = "hello"
str2 = "hello"
# Note that the hash value is partially computed from a random value and will therefore be different
# on your machine
# It will also be different if you restart irb
str1.hash # => 2242191710387986831
str2.hash # => 2242191710387986831

Now that we know what a hash function is, how it used to implement a hash table, let’s look at how things work in Redis.

How does Redis do it?

Redis uses three data structures to implement a dictionary, dict, dictht & dictEntry, the following diagram, from wjin.org, shows how they each relate to each other:

Diagram of Redis’ dict data structure

It’s important to note that dictionaries are used in multiple places in the Redis codebase, but there are two main ones for each database, the one holding all the top-level key/value pairs, such as the ones added with SET and other commands creating pairs, and the expires dictionary, used to store key TTLs.

If you’re not used to C, don’t worry too much about it for now, we’re not going to look too closely at pointers and other C specific features.

Our implementation supports a single database, but Redis can handle multiple databases. A database in Redis represents a set of key/value pairs it is defined as the following C struct:

typedef struct redisDb {
    dict *dict;                 /* The keyspace for this DB */
    dict *expires;              /* Timeout of keys with a timeout set */
    dict *blocking_keys;        /* Keys with clients waiting for data (BLPOP)*/
    dict *ready_keys;           /* Blocked keys that received a PUSH */
    dict *watched_keys;         /* WATCHED keys for MULTI/EXEC CAS */
    int id;                     /* Database ID */
    long long avg_ttl;          /* Average TTL, just for stats */
    unsigned long expires_cursor; /* Cursor of the active expire cycle. */
    list *defrag_later;         /* List of key names to attempt to defrag one by one, gradually. */
} redisDb;

listing 6.3: The C struct for redisDB defined in dict.c

We will ignore all the fields but the first two for now. We can see that the two fields, dict & expires are both of the same type: dict.

dict is defined in dict.h:

typedef struct dict {
    dictType *type;
    void *privdata;
    dictht ht[2];
    long rehashidx; /* rehashing not in progress if rehashidx == -1 */
    unsigned long iterators; /* number of iterators currently running */
} dict;

listing 6.4: The C struct for dict

Once again, in order to keep things as simple as possible, we will ignore some fields, specifically, privdata & iterators.

The dictType struct is used to configure the behavior of a dict instance, such as using a different hash function for instance. It is defined as:

typedef struct dictType {
    uint64_t (*hashFunction)(const void *key);
    void *(*keyDup)(void *privdata, const void *key);
    void *(*valDup)(void *privdata, const void *obj);
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);
    void (*keyDestructor)(void *privdata, void *key);
    void (*valDestructor)(void *privdata, void *obj);
} dictType;

listing 6.5: The C struct for dictType defined in dict.c

The syntax used in this struct is different because the members are function pointers. That’s about as far as we’ll go with C in this chapter. Redis does this to allow a form of configuration of a dict instance. It has the ability to create two dictionaries with potentially two different hash function implementation. We don’t need this level of flexibility at the moment so we will not implement these features for now.

The most interesting element of the dict struct for us is the dictht array. ht here stands for Hash Table. ht[2] means that the struct member is named ht and is an array of size two. Essentially, each dict instance has two hash tables, ht[0] & ht[1].

dictht is defined as follows:

/* This is our hash table structure. Every dictionary has two of this as we
 * implement incremental rehashing, for the old to the new table. */
typedef struct dictht {
    dictEntry **table;
    unsigned long size;
    unsigned long sizemask;
    unsigned long used;
} dictht;

listing 6.6: The C struct for dictht defined in dict.c

The comment tells us why a dict has two tables, for rehashing. To explain rehashing, we first need to explain the first member of dictht: dictEntry **table. The double star syntax, a pointer to pointer, is not that interesting to us at the moment, it is one way of defining an array of dictEntry with dynamic size, one that can be set at runtime. We then need to look at the dictEntry struct:

typedef struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;
} dictEntry;

listing 6.7: The C struct for dictEntry defined in dict.c

dictEntry is a linked list, a common term for a structure like this one is “a node”. It contains a key, key, a value, v and a link to the next element in the list, next.

Because dictEntry **table is an array with a dynamic size, we also need the size member. used is a counter, that starts at 0 and that is incremented when items are added, and decremented when items are removed.

sizemask is an integer value, which is initialized at 0 if size is also 0, but is otherwise always set to size - 1.

To understand the need for the sizemask member, let’s look back at our pseudo code implementation from above. We can see that a very common operation is to use hash_value % array_size. This operation converts a value, potentially larger than the array size, to one that is between 0 and size - 1, allowing us to use the result as an index for the underlying array.

The modulo operation, %, is not that costly, but it does require a few steps, an integer division, followed by a multiplication and a subtraction: c - (c/m*m). Let confirm with an example:

c = 469513
m = 143317
c % m # => 39562
c - (c/m*m) # => 39562

Given how crucial this operation is to the performance of a hash table, every operation relies on it to find the bucket index, it is valuable to attempt to optimize it.

It turns out that if, and only if, the modulus (the second part of the modulo operation, b in a % b, the size of the array in the hash table) is a power of two, then the modulo can be computed in a single operation with the bitwise AND/& operator:

a & (b-1)

Let’s illustrate this with a few examples, if the backing array of the hash table is of size 4, then:

0 % 4 = 0, 0 & 3 = 0
1 % 4 = 1, 1 & 3 = 1
2 % 4 = 2, 2 & 3 = 2
3 % 4 = 3, 3 & 3 = 3
4 % 4 = 0, 4 & 3 = 0
5 % 4 = 1, 5 & 3 = 1
...

It might help to visualize the number as binary, let’s use 4 bit integers for readability. The binary representation of 4 is 0100 and 3 is 0011, so 0 & 3 can be represented as:

The following shows the visualization of 1 & 3, 2 & 3, 3 & 3, 4 & 3 & 5 & 3.

 0001 (1)  |  0010 (2) |  0011 (3) |  0100 (4) |  0101 (5)
&0011      | &0011     | &0011     | &0011     | &0011
    =      |     =     |     =     |     =     |     =
 0001 (1)  |  0010 (2) |  0011 (3) |  0000 (0) |  0001 (1)

In order to take advantage of this property, Redis always picks a size that is a power of two for the backing array. By setting sizemask to size - 1, Redis can efficiently compute the index of any keys once it obtained its hash value. This is a part of the code for dictFind:

dictEntry *dictFind(dict *d, const void *key)
{
    // ...
    h = dictHashKey(d, key);
    for (table = 0; table <= 1; table++) {
        idx = h & d->ht[table].sizemask;
        he = d->ht[table].table[idx];
        // ...
    }
    return NULL;
}

listing 6.8: excerpt of dictFind

h is the value returned by the hash function and d->ht[table].sizemask is how Redis accesses the sizemask value for its hash table. idx is the index indicating the location of the bucket. Redis then looks into the array to inspect the bucket with he = d->ht[table].table[idx] (he stands for hash entry).

Rehashing

Now that we looked at the data structures that Redis uses to implement its dict type, we need to look at the rehashing process. A new dictionary in Redis is always empty, the backing table, the table member, is set to NULL and the size, sizemask and used members are all set to 0:

// https://github.com/antirez/redis/blob/6.0.0/src/dict.c#L102-L108
static void _dictReset(dictht *ht)
{
    ht->table = NULL;
    ht->size = 0;
    ht->sizemask = 0;
    ht->used = 0;
}

// https://github.com/antirez/redis/blob/6.0.0/src/dict.c#L121-L131
int _dictInit(dict *d, dictType *type,
        void *privDataPtr)
{
    _dictReset(&d->ht[0]);
    _dictReset(&d->ht[1]);
    d->type = type;
    d->privdata = privDataPtr;
    d->rehashidx = -1;
    d->iterators = 0;
    return DICT_OK;
}

listing 6.9: C code for _dictReset & _dictInit

Whenever Redis adds a new key/value pair to a dictionary, it first checks if the dictionary should be expanded. The main reason causing a dict to expand is if the number of items in it, the used member, is greater than or equal to the size of the dict, the size member. This will always be true for an empty dictionary since both are initialized to 0. This will also be true every time the number of items reaches the size of the dict. When the dict is of size 4, once 4 items are added, the next addition will trigger a resize.

This is necessary to limit the likelihood of collisions, because as we discussed, collisions slow down the hash table. If the array contains four buckets and the table needs to store five items, it will at best have one collision. If we resize the array to contain eight buckets, then it is possible to store the five items without any collisions.

As mentioned earlier, in order to take advantage of the “fast modulo for a power of two value through bitwise AND” property, Redis will always choose a power of two for the size. The smallest non empty size is 4, and it will grow through power of twos from there on: 8, 16, 32, 64 and so on. All the way up to LONG_MAX + 1, 9,223,372,036,854,775,808, also written as 9.223372036854776e+18 in the exponential notation. That’s 9.2 billion billions, Yes it’s a huge number!

Why LONG_MAX + 1

Redis uses LONG_MAX + 1 as the maximum value for a dict size. LONG_MAX is a constant defined in the C header file limits.h and is set to 9,223,372,036,854,775,807. We can see that it’s not a power of two by looking at the right most digit, 7, it’s not even, so we already know that it’s not a power of two. All power of two are even numbers since by definition, we obtain them by multiplying 2s.

This big number is a 64-bit integer, where all bits are 1s, except the first one:

0111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111

The first bit is a 0 because LONG_MAX is a signed integer, that is it can be negative of positive. Signed integers use the first bit to determine the sign, 0 for positive, 1 for negative. The other bound is LONG_MIN, set to -9,223,372,036,854,775,808, which has the following binary representation:

1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

This representation of signed numbers is called Two’s complement, there are other representations but two’s complement is a very common one.

The only larger number that can fit in a 64-bit integer, is the unsigned max value, where all bits, even the first one, are used to encode the integer value, it’s called ULONG_MAX, U stands for usigned here, and is set to: 18,446,744,073,709,551,615, 2^64 - 1 . As we did before, we can see that it’s not a power of two, 5, the last digit, is not even. This is its binary representation, 64 1s:

1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111

It means that if we want to use power of two numbers, then the largest one we can get to is 2^63. The max value we can encode in a 64 bit integer is 2^64 - 1, one short, so we have to stop one power of two below, hence setting the upper bound to LONG_MAX + 1, aka, 2^63.

Back in Chapter 4 we talked about Big O notation and time complexity. The bottom line being that since Redis processes incoming commands sequentially, a slow operation would effectively back the queue. You can think of it as someone taking a long time to go through checkout at a grocery store. The longer they take, the more likely it is that the queue of customers waiting in line increases.

Resizing a hash table is essentially an O(n) operation, the time it takes to do it is proportional to n, the number of elements in the hash table. In other words, the more elements in the table, the longer it’ll take to resize it. And as we just saw, Redis hash tables can get big, really big! Forcing all the clients in the queue to wait while we resize the table is far from desirable.

Enter rehashing!

Rehashing is the process Redis uses to incrementally, in small steps, resize the table, while still allowing other operations to be processed, and this is why it uses two hash tables per dictionary. Let’s look at how rehashing works through an example.

Note that resizing the array is technically not necessary to store the items given that each entry in the array, the buckets, are linked lists. This means that even an array of size 4 could store millions and billions of key/value pairs. The problem is that the performance would suffer drastically, iterating through that many items in a linked list would take a very long time. With millions of items, it could easily take multiple seconds.

The Redis server starts, the main dict is initialized with two hash tables, both empty
The server receives a SET command, it needs to expand the dictionary.
It finds the next power of two that will be enough to fit all the elements, it needs to store one element, so it uses the minimum value 4.
It allocates the main array, ht[0], with a size of 4, and adds the first key/value pair
The second, third & fourth values are added without any issues. used is now set to 4
A fifth SET command is received, Redis decides to resize the dict.
The resize process allocates a new table, big enough to store all the items, for which the size is a power of two. It selects the next power of two, 8.
The new table is assigned to the secondary table, the rehashing one, ht[1]
The dict is now a rehashing state. In this state, all new keys are added to the rehashing table.
The dict has now 2 tables, where 4 keys are in the first table and 1 is in the second one.
While in this state, many operations, such as key lookups will look at both tables. For instance a GET command will first look at the first table, and if it doesn’t find the item, will look at the rehashing table, and only if the items can’t be find in either table, will return NULL.
While in rehashing state, many commands, such as key lookups or key additions, will perform a single step of rehashing. The server_cron time event we looked at in Chapter 3 also attempts to rehash dictionaries that needs to.
The rehashing process starts at index 0, looks at the first table, and if it finds an item, moves it to the second table. It then moves to index 1, until it iterated through the entire table
Once it’s done, it makes the rehashing table the primary, resets the rehashing table to an empty table and exits the rehashing state

This process allows Redis to resize dictionaries in small steps, while not preventing clients to send commands in the meantime.

Rehashing is also used to reduce the size of a dictionary. If the number of keys, the used member, goes below 1/10th of the size value, that is, only 10% of the dictionary is used, Redis will try to find a better size to fit the keys. That is, it will find the smallest power of two that is greater than or equal to used. This is also performed in server_cron, as a time event and prevents Redis from unnecessarily using memory.

If you want to dig deeper in the Redis implementation, here are few interesting functions you can start with:

The SipHash hash function

The last thing we need to look at before building our own hash table is the hash function. Redis has been using the SipHash algorithm since version 5.0. Before that it had been using the MurmurHash2 algorithm as of version 2.5. And before that, it used a simple version from Dan Bernstein called djb2.

The SipHash algorithm is described in this paper: SipHash: a fast short-input PRF paper.

One of the benefits of SipHash is that it offers strong protection against attacks such as hash flooding.

The implementation of the SipHash algorithm is quite complicated. The one used by Redis is in the siphash.c file and a Ruby implementation is provided in Appendix B. What is important to note is that Siphash requires a key, usually coming from random bytes, to compute a hash value.

This means that unlike md5 or sha1, which always return the same value for the same input, siphash will return the same value, if, and only if, the key is the same.

This is essentially a simplified explanation of how the hash flooding protection works. If Redis were to use md5 as its hashing function, I could try to guess what the hash value used to compute the index would be. Let’s see that with an example:

The md5 hash for the string a is the string 0cc175b9c0f1b6a831c399e269772661:

Digest::MD5.hexdigest("a") # => 0cc175b9c0f1b6a831c399e269772661

The result is a 32 character string representing a 128-bit (16 bytes) result. Because most CPUs use 64-bit integers as their largest types, the result we just saw is actually the hex representation of two 64 bit integers. Let’s illustrate this with the pack and unpack method.

The string is a hex string, so we need to look at each pair of characters. We call hex on each pair, which returns the integer value. For instance '00'.hex returns 0, 'ff'.hex returns 255, the maximum value of an 8-bit integer - a byte. We then call .pack('c16') which returns a string representing all the bits concatenated together. We use 'c16' because the result of .map(&:hex) is an array of 16 bytes.

Finally .unpack('QQ') looks at the string of bytes and tries to convert to two 64 bit integers. We use 'QQ', which is identical to 'Q2', because a string of 16 bytes can be unpacked to two 64-bit integer. One 64-bit integer is composed of 8 bytes - a byte contains 8 bits, so 8 bytes contain 64 bits - so 16 bytes can be unpacked to two 64-bit integers.

bytes = "0cc175b9c0f1b6a831c399e269772661".scan(/../).map(&:hex).pack('c16') # => "\f\xC1u\xB9\xC0\xF1\xB6\xA81\xC3\x99\xE2iw&a"
bytes.unpack("QQ") # => [12157170054180749580, 7000413967451013937]

The previous code was adapted from this blog post

A simpler example might help illustrate this:

"ffff0000ffff0000ffff0000ffff0000".scan(/../).map(&:hex)
=> [255, 255, 0, 0, 255, 255, 0, 0, 255, 255, 0, 0, 255, 255, 0, 0]

[255, 255, 0, 0, 255, 255, 0, 0, 255, 255, 0, 0, 255, 255, 0, 0].pack('c16')
=> "\xFF\xFF\x00\x00\xFF\xFF\x00\x00\xFF\xFF\x00\x00\xFF\xFF\x00\x00"

"\xFF\xFF\x00\x00\xFF\xFF\x00\x00\xFF\xFF\x00\x00\xFF\xFF\x00\x00".unpack('QQ')
=> [281470681808895, 281470681808895]

We can play with pack and unpack a little bit more to confirm that these two 64-bit integers we got are the two elements of the md5 result:

[12157170054180749580].pack("Q").unpack("H16")
=> ["0cc175b9c0f1b6a8"]
[7000413967451013937].pack("Q").unpack("H16")
=> ["31c399e269772661"]

12157170054180749580 represents the first 64 bits of the md5 value, by calling .pack('Q') we convert it to a string representing all these bits back to back, and convert it back to a string of 16 hex characters with .unpack('H16'). We can confirm that 0cc175b9c0f1b6a is the first half of 0cc175b9c0f1b6a831c399e269772661 and that 31c399e269772661 is the second half.

We can also look at the actual 64 bits with unpack('B64')

[12157170054180749580].pack('Q').unpack('B64')
=> ["0000110011000001011101011011100111000000111100011011011010101000"]
[7000413967451013937].pack('Q').unpack('B64')
=> ["0011000111000011100110011110001001101001011101110010011001100001"]

Back to our hypothetical use of md5 as a hash function in Redis. Given that we would only use a single integer to apply the modulo to, we could pick either the first one or last one, let’s arbitrarily pick the second one, just because.

If I sent the command SET a-key a-value, the hash value of a-key is the 64 bit integer 7000413967451013937. This knowledge can be used to forge special requests and maximize the chances of collisions, potentially causing performance issues to the hash table.

With a keyed algorithm such as Siphash, it’s impossible to infer what the hash value would be if the server uses random bytes as the key. We can demonstrate this with Ruby, which also uses Siphash by running "a".hash in an irb shell, closing and reopening irb, the hash value will be different. This is because Ruby initializes random bytes at startup that it then uses to compute siphash values.

Siphash was added to Redis in this commit. SipHash returns a 64 bit integer, so we can use the value directly instead of going trough all the steps we had to go through with md5.

Our own `Dict` class

`Dict`, `HashTable` & `DictEntry`

We previously looked at the main data structures used in the Redis code base to implement a hash table. We will reimplement a simplified version of those, let’s start with the two small ones, DictEntry & HashTable:

module BYORedis
  class DictEntry

    attr_accessor :next, :value
    attr_reader :key

    def initialize(key, value)
      @key = key
      @value = value
      @next = nil
    end
  end
end

listing 6.10: The DictEntry class

module Redis
  class HashTable

    attr_reader :table, :size, :sizemask
    attr_accessor :used

    def initialize(size)
      @table = size == 0 ? nil : Array.new(size)
      @size = size
      @sizemask = size == 0 ? 0 : size - 1
      @used = 0
    end

    def empty?
      @size == 0
    end

    def each
      return unless @table

      @table.each
    end
  end
end

listing 6.11: The HashTable class

@table in HashTable will contain instances of DictEntry, which is our linked list implementation to handle collisions.

We could have defined DictEntry as a Struct, but the explicit class has one small benefit, it allows us to not define a setter for the key attribute.

As already discussed, because Ruby is Ruby, where most things are mutable, it does not mean that a key attribute will always be the same, but at least it prevents reassignment. The value can still be changed by updating the value in place:

dict_entry = BYORedis::DictEntry.new("a-key", "a-value") # => nil
dict_entry # => #<BYORedis::DictEntry:0x00007fd577a17c60 @key="a-key", @value="a-value", @next=nil>
dict_entry.key.gsub!("a", "A") # => "A-key"
dict_entry # => #<BYORedis::DictEntry:0x00007fd577a17c60 @key="A-key", @value="a-value", @next=nil>
dict_entry.key # => "A-key"

Let’s now create the main class, Dict:

require_relative './siphash'
require_relative './dict_entry'
require_relative './hash_table'

module BYORedis
  class Dict

    INITIAL_SIZE = 4
    MAX_SIZE = 2**63

    attr_reader :hash_tables

    def initialize
      @hash_tables = [ HashTable.new(0), HashTable.new(0) ]
      @rehashidx = -1
    end

    # ...
  end
end

listing 6.12: The Dict class

Our Dict class does not require any arguments, it is initialized with two empty hash tables, and a rehash index set to -1, indicating that it is not in a rehashing state. The rehashing index is used to keep track of the progress throughout rehashing. Its value will change from 0 to @hash_tables[0].size - 1 as we rehash the table, and reset to -1 when completed.

The first method we need to add to the Dict class is add, it will be used to add an element to a dictionary. This is needed for the SET command. This method always expects the called to check that the key is not already present and will fail if it finds a matching key.

# dict.rb
module BYORedis
  class Dict

    # ...

    def add(key, value)
      index = key_index(key)

      # Only happens if we didn't check the presence before calling this method
      return nil if index == -1

      rehash_step if rehashing?

      hash_table = rehashing? ? rehashing_table : main_table
      entry = hash_table.table[index]

      entry = entry.next while entry && entry.key != key

      if entry.nil?
        entry = DictEntry.new(key, value)
        entry.next = hash_table.table[index]
        hash_table.table[index] = entry
        hash_table.used += 1
      else
        raise "Unexpectedly found an entry with same key when trying to add #{ key } / #{ value }"
      end
    end
    alias []= add
  end
end

listing 6.13: The add method in the Dict class

The method is very similar to the pseudo code we looked at earlier in add_or_update_key_value_pair. We first obtain the index for the key, aka the location of the bucket the key should go into. We then perform a rehash step if we’re in rehashing state.

We select which table the key will end up in depending on whether or not we are in rehashing state. If we are, the key will be added to the rehashing table, otherwise it is added to the main table.

The next step is to inspect the bucket at the index position, if it is not empty we look through all the items in the bucket to check if one of the entries has the same key.

If we find an entry with the same key, we don’t need to create a new entry, we can just update the existing entry with the new value. The number of elements in the dictionary does not change.

On the other hand, if we didn’t find an entry with the same key, we need to create a new entry for the new key/value pair. First, we instantiate DictEntry with key & value, then we need to add it to the linked list.

There are essentially two valid options for us here, either append the element to the list or prepend it. Prepending to a linked list is a constant time operation O(1), whereas appending is proportional to the size of list, it is an O(n) operation. Let’s illustrate with an example, we’ll start by creating a list with three elements

entry1 = BYORedis::DictEntry.new(1, 'value-1') # => nil
entry2 = BYORedis::DictEntry.new(2, 'value-2') # => nil
entry3 = BYORedis::DictEntry.new(3, 'value-3') # => nil
entry1.next = entry2 # => nil
entry2.next = entry3 # => nil
entry1 # => #<BYORedis::DictEntry:0x00007fd57403dbc0 @key=1, @value="value-1", @next=#<BYORedis::DictEntry:0x00007fd577a3eb58 @key=2, @value="value-2", @next=#<BYORedis::DictEntry:0x00007fd5779c3d90 @key=3, @value="value-3", @next=nil>>>

Now that the list is created, let’s assume we only have a single variable for the list, list, which is equal to entry1 in the previous example, prepending a new node is done as follows:

new_entry = BYORedis::DictEntry.new(99, 'value-99') # => nil
new_entry.next = list # => nil
list = new_entry # => nil
list # => #<BYORedis::DictEntry:0x00007fd5779b8878 @key=99, @value="value-99", @next=#<BYORedis::DictEntry:0x00007fd57403dbc0 @key=1, @value="value-1", @next=#<BYORedis::DictEntry:0x00007fd577a3eb58 @key=2, @value="value-2", @next=#<BYORedis::DictEntry:0x00007fd5779c3d90 @key=3, @value="value-3", @next=nil>>>>

This process can be repeated regardless of the size of the list. On the other hand, to append an element, we first need to find the last element, here is how we do it:

new_entry = BYORedis::DictEntry.new(99, 'value-99') # => nil
current_entry = list # => nil
current_entry = current_entry.next while current_entry.next != nil # => nil
current_entry.next = new_entry # => nil
list # => #<BYORedis::DictEntry:0x00007fd57403dbc0 @key=1, @value="value-1", @next=#<BYORedis::DictEntry:0x00007fd577a3eb58 @key=2, @value="value-2", @next=#<BYORedis::DictEntry:0x00007fd5779c3d90 @key=3, @value="value-3", @next=#<BYORedis::DictEntry:0x00007fd5779b8878 ...>>>>

The third line iterates through all the elements in the list. The more elements, the longer it’ll take. Some linked list implementations work around this issue by always maintaining a pointer to the last element.

Back to our collision handling, even though appending has a worst time complexity compared to prepending, O(n) vs O(1), given that we need to iterate through all the items in the list anyway, to make sure the key does not already exist, we could have kept a variable pointing at the last element.

Redis chooses the prepend option with the assumption that it might be common for recently added items to be accessed more frequently. If this assumption holds true, future lookups would find the key/value pair early in the list, instead of having to iterate through the end of the list.

The lines entry.next = hash_table.table[index] & hash_table.table[index] = entry insert the new entry as the head of the list. The first one makes the new entry point at the old head of the list. If it was null, then the next value will still be null and the new entry will be the only element in the list. If it wasn’t null, then next in the new entry now points at the old head, which itself, points at the rest of the list. We’re not losing anything, great! The second line is necessary so that the head of list in the bucket points at the new entry.

The last line, hash_table.used += 1, increments the counter of how many items are in the dictionary.

Finally, we use the Ruby alias keyword to create an alias for add to []=. This allows us to use a Dict instance similarly to how we use a Ruby Hash:

dict = Dict.new
# The following two are equivalent, the same method is called, Dict#add:
dict.set(1, 2)
dict[1] = 2

We used a few private methods that we should now look more closely at: main_table, rehashing_table & key_index

# dict.rb
module BYORedis
  class Dict

    # ...
    private

    def main_table
      @hash_tables[0]
    end

    def rehashing_table
      @hash_tables[1]
    end

    def key_index(key)
      expand_if_needed
      hash = SipHash.digest(RANDOM_BYTES, key)
      index = nil

      iterate_through_hash_tables_unless_rehashing do |hash_table|
        index = hash & hash_table.sizemask
        entry = hash_table.table[index]
        while entry
          # The key is already present in the hash so there's no valid index where to add it
          if entry.key == key
            return -1
          else
            entry = entry.next
          end
        end
      end

      index
    end

    # In the Redis codebase, they extensively use the following pattern:
    # for (table = 0; table <= 1; table++) {
    #   ...
    #   if (!dictIsRehashing(d)) break;
    # }
    # This is common for many operations, such as finding or deleting an item in the dict,
    # we first need to look at the main table, the first table, but we haven't found in the
    # first one, we should look in the rehashing table, the second one, but only if we're in
    # the process of rehashing.
    # Taking advantage of Ruby blocks, we can write this helper method instead
    def iterate_through_hash_tables_unless_rehashing
      @hash_tables.each do |hash_table|
        yield hash_table
        break unless rehashing?
      end
    end
  end
end

listing 6.14: helper methods in the Dict class

main_table and rehashing_table are used as aliases of @hash_tables[0] and @hash_tables[1]. key_index is used to return the index representing the location of the bucket for the given key. It first calls expand_if_needed, which we’ll explore in the next section about rehashing. Once the dictionary has been resized if needed, it computes the hash value using the SipHash class. The code for the siphash algorithm is available in Appendix B.

Once we obtained the hash value, we need to convert it to an index within the boundaries of the backing array, with the modulo operation, or as we discussed earlier, the bitwise AND operation. Before doing that, we need to take into account which table the bucket should go into, if we’re not in a rehashing state, it should go in the main table, if we are, it should go to the rehashing table.

This process of first inspecting the main table, and the rehashing table, only if we’re in rehashing state is so common that we added a helper method for that, iterate_through_hash_tables_unless_rehashing.

This method replaces the common pattern in the Redis C codebase using a for loop and a conditional break statement at the end of the first iteration. We instead leverage the Ruby block syntax to always yield back the main table to the caller, and only yield the rehashing table if we’re in a rehashing state.

The implication for key_index is that if we’re in a rehashing state, we’ll first find the index in the first table, but iterate_through_hash_tables_unless_rehashing will yield a second time and index will end up being an index for the rehashing table.

key_index is used to find the index of the bucket where a new entry should be added and it therefore does not make sense to return a value if an entry already exists with the same key. In such cases we’ll need to update the existing entry, which we’ll cover soon. This is necessary because of the rehashing process, which is explained below as well. In short, during rehashing, keys will be present in two different tables, the main one and the rehashing, and clearly separating the process of adding a new entry and updating an existing one allows to guarantees keys are unique.

The digest method in the SipHash class requires a 16-byte key composed of random bytes. Redis does generates these with the getRandomBytes function, which attempts to use /dev/urandom/ and defaults to a weaker seed based on the current time and the pid of the server if /dev/urandom is not accessible. Ruby’s SecureRandom module provides a random_bytes method which uses /dev/urandom under the hood, so let’s use it:

# server.rb
require 'socket'
require 'logger'
require 'strscan'
require 'securerandom'

LOG_LEVEL = ENV['DEBUG'] ? Logger::DEBUG : Logger::INFO
RANDOM_BYTES = SecureRandom.bytes(16)
# ...

listing 6.15: Initialization of the random byte in server.rb

Now that we implemented the set method, and its alias, []=, we need to add the get method, which will be used by the GET command, to retrieve an element from a dictionary based on its key.

module BYORedis
  class Dict

    def used
      main_table.used + rehashing_table.used
    end

    def get_entry(key)
      return if main_table.used == 0 && rehashing_table.used == 0

      rehash_step if rehashing?

      hash = SipHash.digest(RANDOM_BYTES, key)

      iterate_through_hash_tables_unless_rehashing do |hash_table|
        index = hash & hash_table.sizemask

        entry = hash_table.table[index]

        while entry
          return entry if entry.key == key

          entry = entry.next
        end
      end

      nil
    end

    def get(key)
      get_entry(key)&.value
    end
    alias [] get

    def include?(key)
      !get_entry(key).nil?
    end

    def each
      return if main_table.used == 0 && rehashing_table.used == 0

      start_index = rehashing? ? @rehashidx : 0
      main_table.table[start_index..-1].each do |bucket|
        next if bucket.nil?

        until bucket.nil?
          yield bucket.key, bucket.value
          bucket = bucket.next
        end
      end
      return unless rehashing?

      rehashing_table.each do |bucket|
        next if bucket.nil?

        until bucket.nil?
          yield bucket.key, bucket.value
          bucket = bucket.next
        end
      end
    end
  end
end

listing 6.16: get, include? and each method in Dict.rb

The get method is a small wrapper around the lower level get_entry method. This lower level method is necessary to support nil as values. Without it, calling dict.get(key) would return nil if the key is not present, but also if it is present, but the value is nil. With get_entry, we will only receive nil when the key is not present, and will receive an instance of DictEntry otherwise.

get_entry starts with an early return statement if both tables are empty. If that’s the case, there’s no need to continue, we know the key is not present in the table.

The next step is similar to add, we perform a single rehash step if we’re in a rehashing state. The approach allows Redis to incrementally work its way through the rehashing process, without affecting too much the performance of other operations. A single rehashing step does not require a lot of work, and will have a negligible impact on the performance of get, but it has the important benefits of advancing the rehashing process.

Once again, we follow a pattern similar to the pseudo code lookup_key from earlier in the chapter, we start by computing the hash value of the key. Once we have the hash value, we first need to look at the main table, and if we’re in a rehashing state, at the second table as well. We do this with the same helper method as previously, iterate_through_hash_tables_unless_rehashing. hash & hash_table.sizemask returns the location of the bucket for the key. There might be more than one item in the bucket, because of potential collisions, so we need to iterate through all of them and compare their key with the key argument. We do this with the while loop. If we do find a matching key, we abort early and return the value associated with that key.

If we inspected all the element in the main table bucket, and potentially in the rehashing table bucket, and did not find any matches, we return nil. The key is not in the hash table.

Similarly to what we did with add and []=, we alias get to [], which allows us to use the same syntax we use for Hash instances:

dict = Dict.new
# The following are equivalent, the same method is called:
dict.get(1)
dict[1]

There is a use case where we only care about the presence of a key in a dictionary, but not about the value, this is what the include? method does. We do not need to reimplement the logic, we reuse the get method, discard its value and return a boolean instead.

Now that we have the get_entry method, we can define the set method which performs the “add or update” operation, and update the []= alias to point at set and not at add anymore.

module BYORedis
  class Dict
    # ...
    def set(key, value)
      entry = get_entry(key)
      if entry
        entry.value = value
      else
        add(key, value)
      end
    end
    alias []= set
    # ...
  end
end

listing 6.17: The set method in Dict.rb aliased to []=

Finally we add the each method, which will yield all the key/value pairs in the hash table. This time we do not use the iterate_through_hash_tables_unless_rehashing. This is because we’re using a small optimization technique to avoid iterating over buckets we know are empty.

If we are in rehashing state, then we know that all buckets between 0 and the value in @rehashidx excluded have been rehashed and the values are empty. In a table of size 16, if @rehashidx is set to 5, it means that the buckets at index 0, 1, 2, 3 & 4 have been migrated and we don’t need to inspect them.

For the rehashing table, we iterate through the backing array with the each method on HashTable, which itself delegates each to its Array instance. There is no need to yield back empty buckets, so we continue to the next bucket if we find an empty one. For each non empty bucket, we iterate through the linked list of entries and yield all of them.

Resizing and rehashing

When Dict is initialized, it creates two instances of HashTable, each with an empty array. We need to implement the rehashing mechanism to allow the tables to grow and shrink as needed. The add method we discussed in the previous section starts by calling expand_if_needed:

# dict.rb
module BYORedis
  class Dict

    INITIAL_SIZE = 4
    # ...

    def expand_if_needed
      return if rehashing?

      if main_table.empty?
        expand(INITIAL_SIZE)
      elsif main_table.used >= main_table.size
        expand(main_table.used * 2)
      end
    end
    # ...
  end
end

listing 6.18: expand_if_needed in the Dict class

If we are already in the process of rehashing the dict, then we can abort early, the process will continue incrementally, there’s nothing else we should do until rehashing is over. If the table is empty we call expand with INITIAL_SIZE, which is set to 4, otherwise we call it with a capacity set to twice the current size of the dictionary.

Note that the used member can be greater than the size member. size determines the size of the array in the hash table, in other words how many buckets are in the table, and each bucket can contain more than one value because it uses a linked list to store elements. This means that we might increase by more than twice the current size, since we rely on the number of items in the dict, not the current size, to determine the new size.

Let’s now take a look at the expand method:

# dict.rb
def expand(size)
  return if rehashing? || main_table.used > size

  real_size = next_power(size)

  return if real_size == main_table.size

  new_hash_table = HashTable.new(real_size)

  # Is this the first initialization? If so it's not really a rehashing
  # we just set the first hash table so that it can accept keys.
  if main_table.table.nil?
    @hash_tables[0] = new_hash_table
  else
    @hash_tables[1] = new_hash_table
    @rehashidx = 0
  end
end

def next_power(size)
  # Ruby has practically no limit to how big an integer can be, because under the hood the
  # Integer class allocates the necessary resources to go beyond what could fit in a 64 bit
  # integer.
  # That being said, let's still copy what Redis does, since it makes sense to have an
  # explicit limit about how big our Dicts can get
  i = INITIAL_SIZE
  return MAX_SIZE if size >= MAX_SIZE

  loop do
    return i if i >= size

    i *= 2
  end
end

def rehashing?
  @rehashidx != -1
end

listing 6.19: expand, next_power and rehashing? methods in the Dict class

Similarly to resize, if we’re already in the process of rehashing, we can abort early. We also abort early if the number of items in the array is greater than the new size. In this case, there’s no point in resizing the table, it would be too small.

We use the next_power method to find the next power of two that is greater or equal to the new size. This method is used to maintain the size of the array as of power of two to leverage the bitwise AND operation to compute a modulo.

We then perform another sanity check, if the “real size”, the power of two, is equal to the current size, then there’s no point in going through a rehashing process, the table is already at the desired size.

Now that we performed all the checks, we create a new hash table of the desired size and make it the rehashing table by assigning it to @hash_tables[1]. We do have to consider the case where this is the first call to expand, in which case both tables are empty, and there’s nothing to rehash so we directly assign the newly created hash table to the main table, @hash_tables[0].

If there are elements to rehash, we flag the dict as being in a rehashing state by setting @rehashidx to 0. This instance variable acts as a pointer to the next element that should be rehashed in the main table. By being different than -1, the dict is now considered in rehashing state. This is what the rehashing? method does.

In the case where there are elements to rehash, the expand method does not actually rehash any of them. It only prepares the dictionary for rehashing. The actual rehashing is performed in two different places.

Many operations, such as set and get, will perform a single rehashing step before performing the rest of the operation. The other place where rehashing happens is through the server_cron time event we discussed in Chapter 3.

# dict.rb
def rehash_step
  rehash(1)
end

def rehash(n)
  empty_visits = n * 10
  return 0 unless rehashing?

  while n > 0 && main_table.used != 0
    n -= 1
    entry = nil

    while main_table.table[@rehashidx].nil?
      @rehashidx += 1
      empty_visits -= 1
      return 1 if empty_visits == 0
    end

    entry = main_table.table[@rehashidx]

    while entry
      next_entry = entry.next
      idx = SipHash.digest(RANDOM_BYTES, entry.key) & rehashing_table.sizemask

      entry.next = rehashing_table.table[idx]
      rehashing_table.table[idx] = entry
      main_table.used -= 1
      rehashing_table.used += 1
      entry = next_entry
    end
    main_table.table[@rehashidx] = nil
    @rehashidx += 1
  end

  # Check if we already rehashed the whole table
  if main_table.used == 0
    @hash_tables[0] = rehashing_table
    @hash_tables[1] = HashTable.new(0)
    @rehashidx = -1
    0
  else
    # There's more to rehash
    1
  end
end

listing 6.20: rehashing related methods in the Dict class

rehash_step calls rehash with the parameter 1. The parameter to rehash dictates how many items it will rehash, that is, move from the main table to the rehashing one. Let’s look at the method one line at a time.

We start by initializing the empty_visits variables to n * 10. n, the method parameter is the number of elements we want to move to the rehashing table. Because of the nature of a hash table, when iterating through the buckets, from the one at index 0 until the one at index size - 1, we don’t know how many empty buckets we’ll find. In order to prevent any potential slowness in the rehashing process, Redis uses an upper bound for the number of empty visits.

This is especially important when reducing the size of a hash table, which we’ll explore next. In this case the table we’re rehashing might be very scarcely populated and there might be many consecutive empty buckets. By using empty_visits, we ensure that the rehashing step has a known upper bound.

The next step is a sanity check, if we are not rehashing, we should exit right away, there’s nothing else to do. Note that we’re using the same semantics as Redis here, a return value of 0 indicates that all elements are rehashed, 1 indicates that there are more elements to rehash.

The while loop is the main piece of the method, it iterates as long a n is greater than 0 and as long as there are items left in the main table. The second part of the condition is a quick shortcut we attempt to take at the end of the rehashing process. After rehashing the last element in the main table, there might still be a few extra empty buckets in the array, but since we know the table does not have any more elements left by looking at the used variable, we stop instead of iterating through the last empty buckets.

The first step of the loop is to decrement n, this is a very important part of the loop that dictates the number of iterations. We also initialize an entry variable that will be used to iterate through the entries of a bucket.

@rehashidx is initialized at 0 when the rehashing process starts, so we use it as the index to look through the buckets in the main table. As long as we find empty buckets, we increment the empty_visits variable and increment @rehashidx to keep moving along in the main table. If empty_visits reaches 0, we stop the rehashing process, we’re already spent too much time iterating through empty buckets. Note that the next call to rehash will be able to continue where we left off since it will use @rehashidx.

This inner while loop will stop as soon as we encounter a non empty bucket, and we set entry to the first entry of the bucket when that happens. As we already established, we might be dealing with more than one entries, so we use another inner while loop to iterate through all the entries in the linked list.

For each of these entries, we first store the following entry in the next_entry variable and then compute the index of the entry in the rehashing table.

The next few lines perform the actual move of the entry from one table to the other, similarly to how we inserted an entry in the add method earlier. We make the next attribute of the entry variable point at the bucket in the rehashing table with entry.next = rehashing_table.table[idx]. This might either set the value to nil if the bucket in the rehashing table was empty, or would otherwise be the head of the list if it was not empty.

rehashing_table.table[idx] = entry is taking care of making the bucket in the rehashing table point at the entry variable, making it the new head of the list.

We then decrement used in the main table and increment it in the rehashing table.

Finally, in order to keep iterating through the entries in the bucket from the main table, we set entry to next_entry, which will repeat these steps with the next entry in the list.

Once we’ve moved all the entries in the bucket, one rehashing step is completed, we explicitly empty the bucket with main_table.table[@rehashidx] = nil and increment @rehashidx so that the next iteration of the outer while loop continues iterating through the main table.

Once n rehashing steps have been performed, we check a few final details. If used reached 0 in the main table, we’re done with the rehashing process, the main table is empty. We set @hash_tables[0] to the new rehashing table, effectively promoting it from rehashing table to main table, and reset the rehashing table to an empty table. Resetting @rehashidx to -1 marks the end of the rehashing process.

We return 1 otherwise, indicating that the rehashing process is not done.

The use of rehash_step makes sure that rehashing keeps happening, one step at a time, but large tables might require many rehash steps, up to millions of steps. Redis uses the server_cron time event to perform rehashing in larger steps, in order to speed up the rehashing process. This is especially useful if the server is idle for a while. If Redis would only perform rehashing steps when receiving commands, it would not make any progress while idle. This approach prevents this and takes advantage of the idle time to do so cleanup work.

Let’s add a call to a new method, databases_cron in server_cron in server.rb:

# server.rb

def server_cron
  # ...

  databases_cron

  1000 / DEFAULT_FREQUENCY
end

def databases_cron
  @data_store.resize if ht_needs_resize(@data_store)
  @expires.resize if ht_needs_resize(@expires)

  @data_store.rehash_milliseconds(1)
  @expires.rehash_milliseconds(1)
end

def slots(dict)
  dict.hash_tables[0].size + dict.hash_tables[1].size
end

def size(dict)
  dict.hash_tables[0].used + dict.hash_tables[1].used
end

def ht_needs_resize(dict)
  size = slots(dict)
  used = size(dict)

  size > Dict::INITIAL_SIZE && ((used * 100) / size < HASHTABLE_MIN_FILL)
end

listing 6.21: databases_cron method in the Server class

databases_cron performs two operations on the two dictionaries in the Server class, @data_store, that holds all the key/value pairs, and @expires which keeps track of the keys with TTLs. For each of these dictionaries, if first calls resize, which we’ve explored in the previous section, only if it thinks the dictionary needs to be resized.

A dictionary needs to be resized if the number of items in the dictionary is less than 10% of the size of the dictionary. Beside the EX & PX options, we have not yet implemented commands to delete keys, such as the DEL command, but we will soon. Once the DEL command is implemented, key/value pairs can be removed from a dictionary, meaning that a dictionary that grew to a given size might become too big at some point. Let’s look at an example:

dict = BYORedis::Dict.new
dict.hash_tables[0].size # => 0
dict.hash_tables[1].size # => 0
dict.add("1", "2") # => 1
dict.hash_tables[0].size # => 4
# We're calling to_s because Dict only works with strings and i is an Integer
100.times { |i| dict.add(i.to_s, i.to_s) } # => 100
dict.hash_tables[0].size # => 128

In the previous examples, the main table grew to a size of 128, if we were to delete all items but one, we would end up with only one non empty bucket, while the other 127 would be empty.

With small values like those it might not seem like a big difference, but as we saw earlier, Redis’ dict can grow to billions of keys, so efficiently using memory is important.

If the dictionaries need resizing, calling resize will find a good size for the dictionary and start the rehashing process.

The next step is to call rehash_milliseconds. We’ve already explained why it’s important for Redis to maintain an upper bound to all the operations that blocks other clients in the queue. By calling rehash_milliseconds(1), Redis tries to do as much rehashing as possible and stops once one millisecond has elapsed. This means that in the most pessimistic scenario possible, a client would send a command right after Redis enters the rehash_milliseconds(1) call, effectively blocking this client for at least one millisecond. This behavior is considered acceptable by Redis, but it can be disabled through the activerehashing config value. We have not yet implemented any form of configuration so we will assume that all clients of our server are ok with this behavior.

# dict.rb
def rehash_milliseconds(millis)
  start = Time.now.to_f * 1000
  rehashes = 0
  while rehash(100) == 1
    rehashes += 100
    time_elapsed = Time.now.to_f * 1000 - start

    break if time_elapsed > millis
  end
  rehashes
end

def resize
  return if rehashing?

  minimal = main_table.used
  minimal = INITIAL_SIZE if minimal < INITIAL_SIZE

  expand(minimal)
end

listing 6.22: rehash_milliseconds and resize methods in the Dict class

No more `Hash` & `{}`

Equipped with our Dict class, we can now remove all instances of Hash and replace them with Dict. We lose a little bit syntax wise, we cannot use the Hash literal syntax with { key: value }, but overall, things are not that different:

# server.rb
module BYORedis
  class Server

    COMMANDS = Dict.new
    COMMANDS.set('command', CommandCommand)
    COMMANDS.set('del', DelCommand)
    COMMANDS.set('get', GetCommand)
    COMMANDS.set('set', SetCommand)
    COMMANDS.set('ttl', TtlCommand)
    COMMANDS.set('pttl', PttlCommand)
    # ...

    def initialize
      @logger = Logger.new(STDOUT)
      @logger.level = LOG_LEVEL

      @clients = []
      @data_store = Dict.new
      @expires = Dict.new

      # ...
    end

    # ...
  end
end

listing 6.23: Replacing usages of Hash with Dict in the Server class

The SetCommand class was using a Hash to store the configuration values of the possible options. We replace it similarly to what we did for the COMMANDS constant in server.rb:

# set_command.rb
module BYORedis
  class SetCommand
    OPTIONS = Dict.new
    OPTIONS.set(
      'ex',
      CommandOptionWithValue.new(
        'expire',
        ->(value) { validate_integer(value) * 1000 },
      )
    )
    OPTIONS.set(
      'px',
      CommandOptionWithValue.new(
        'expire',
        ->(value) { validate_integer(value) },
      )
    )
    OPTIONS.set(
      'xx', CommandOption.new('presence')
    )
    OPTIONS.set(
      'nx', CommandOption.new('presence')
    )
    OPTIONS.set(
      'keepttl', CommandOption.new('expire')
    )

    # ...
    def initialize(data_store, expires, args)
      # ...
      @options = Dict.new
    end
    # ...
  end
end

listing 6.24: Replacing usages of Hash with Dict in the SetCommand class

Changing @clients to a Dict

Up until now the @clients in the Server class was an Array, which worked fine for small list of clients but would be problematic for largest lists of clients. When we process the results of IO.select, we use the returned sockets to find the matching clients within the connected clients. This is an O(n) operation with an array. We can use our new Dict class to turn it into an O(1) operation. It’s worth nothing that the performance will be worse for a small number of clients given the overhead of the Dict class, but the performance won’t change regardless of the number of connected clients.

Each socket has a fileno which returns the numeric file descriptor of the socket. This number will be different for each socket and we can therefore use it as a key to identify sockets in the Dict.

module BYORedis
  class Server
    # ...
    Client = Struct.new(:socket, :buffer, :blocked_state) do
      attr_reader :id

      def initialize(socket)
        @id = socket.fileno.to_s
        self.socket = socket
        self.buffer = ''
      end
    end
    # ...
    def initialize
      @logger = Logger.new(STDOUT)
      @logger.level = LOG_LEVEL

      @clients = Dict.new
      # ...
    end
    # ...
    def client_sockets
      sockets = []
      @clients.each { |_, client| sockets << client.socket }
      sockets
    end
    # ...
    def process_poll_events(sockets)
      sockets.each do |socket|
        begin
          if socket.is_a?(TCPServer)
            socket = @server.accept

            @clients[socket.fileno.to_s] = Client.new(socket)
          elsif socket.is_a?(TCPSocket)
            client = @clients[socket.fileno.to_s]
            client_command_with_args = socket.read_nonblock(1024, exception: false)
            if client_command_with_args.nil?
              @clients.delete(socket.fileno.to_s)
              socket.close
            elsif client_command_with_args == :wait_readable
              # There's nothing to read from the client, we don't have to do anything
              next
            elsif client_command_with_args.empty?
              @logger.debug "Empty request received from #{ socket }"
            else
              # ...
            end
          else
            raise "Unknown socket type: #{ socket }"
          end
        rescue Errno::ECONNRESET
          @clients.delete(socket.fileno.to_s)
        rescue IncompleteCommand
          # Not clearing the buffer or anything
          next
        rescue ProtocolError => e
          socket.write e.serialize
          socket.close
          @clients.delete(socket.fileno.to_s)
        end
      end
    end
    # ...
  end
end

Adding the `DEL` command

As mentioned earlier in this chapter, the only option we currently have for keys to be deleted is to set them with a TTL with the EX & PX options. It will be more convenient to manually test the behavior of our new Dict class with a more explicit option. The DEL command is very useful in that aspect, it accepts one or more keys as its arguments and attempts to delete them. It returns an integer representing the number of keys that were deleted:

module BYORedis
  class DelCommand

    def initialize(data_store, expires, args)
      @data_store = data_store
      @expires = expires
      @args = args
    end

    def call
      if @args.empty?
        RESPError.new("ERR wrong number of arguments for 'GET' command")
      else
        keys = @args
        deleted_count = 0
        keys.each do |key|
          entry = @data_store.delete(key)
          if entry != nil
            # If there was a key, then we need to delete its TTL if it had one:
            @expires.delete(key)
            deleted_count += 1
          end
        end

        RESPInteger.new(deleted_count)
      end
    end

    def self.describe
      [
        'del',
        -2, # arity
        # command flags
        [ RESPSimpleString.new('write') ],
        1, # position of first key in argument list
        -1, # position of last key in argument list
        1, # step count for locating repeating keys
        # acl categories: https://github.com/antirez/redis/blob/6.0.0/src/server.c#L161-L166
        [ '@keyspace', '@write', '@slow' ].map { |s| RESPSimpleString.new(s) },
      ]
    end
  end
end

listing 6.25: The new DelCommand class

The DelCommand class implements the behavior of the command, as well as defining the data for the COMMAND command, but it mainly relies on a non existing method on the Dict class, delete. Let’s add it:

# dict.rb
def delete(key)
  return if main_table.used == 0 && rehashing_table.used == 0

  rehash_step if rehashing?

  hash_key = SipHash.digest(RANDOM_BYTES, key)
  iterate_through_hash_tables_unless_rehashing do |hash_table|
    index = hash_key & hash_table.sizemask
    entry = hash_table.table[index]
    previous_entry = nil

    while entry
      if entry.key == key
        if previous_entry
          previous_entry.next = entry.next
        else
          hash_table.table[index] = entry.next
        end
        hash_table.used -= 1
        return entry
      end
      previous_entry = entry
      entry = entry.next
    end
  end
end

listing 6.26: The delete method in the Dict class

The logic in delete is similar to the add method. We start with a shortcut we’ve already seen, if the table is empty, there’s no need to go any further, we can stop right there.

We perform a rehashing step if needed, to keep the rehashing process moving and then we get the hash value for the key. The key we’re attempting to delete might be in the main table or the rehashing table, so we use our helper method iterate_through_hash_tables_unless_rehashing to help with this.

For each hash table we look at, we obtain the index and inspect the bucket at that location. If the bucket is empty, then there’s nothing else to do, the key is not present in this table. If the bucket is not empty, we need to iterate through all the entries and compare the keys of these entries with the method argument. If there’s a match we found the entry we need to delete, this is what the while loop is doing.

If there is a match, the if entry.key == key condition, we need to handle two cases, whether or not the entry we want to delete is the first entry in the bucket. We use the previous_entry variable to keep track of the previous entry as we iterate through the entry list in the bucket. Once we find a match, this allows us to know if the entry we’re trying to delete was the first one or not, if previous_entry is nil, then it is the first one.

In this case the deleting process is to make the head of the bucket, hash_table.table[index], point at the next element in the list. If entry was the only element, its next value would be nil, and the bucket would now be empty. If entry was the head of list with more than one element, then the bucket now starts with what was the second element in the list, the entry which next is to in entry.

If previous_entry is not nil, then entry is not at the head of the list and the deletion process requires us to change previous_entry.next to now skip over entry since we want to remove it from the list, and point to the entry in entry.next. This entry might be nil or not, and it does not matter to us. If it was nil, then it means that entry was the last element in the list, and by changing previous_entry.next to nil, previous_entry is now the last entry. If it wasn’t nil, then previous_entry now points to the element entry is pointing to, and no entries point at entry anymore, it’s now floating on its own and is not part of the list anymore.

In a language with manual memory management, such as C, we would also need to explicitly free the memory, but we’re in Ruby, so we’ll let the garbage collector do its job.

Whether or not the entry was the first one in the bucket, it’s now deleted, we decrement the used counter and return.

Conclusion

In this chapter we dived into hash tables, how Redis uses them and how to implement one in Ruby, in order to replace the Hash class.

In the next Chapter we will add new commands to handle a new data type, Lists. See you there.

Appendix A: Array from scratch in Ruby

This implementation use the fiddle gem which is packaged alongside Ruby. Fiddle provides an interface to libffi, which is a C library that allows code in one language to call code in another language. Here we use the Fiddle::Pointer class with the malloc method to explicitly allocate memory.

This implementation only works with strings and does not support resizing.

PTR_SIZE is constant set to ::Fiddle::SIZEOF_LONG, which is 8 on most modern machines. This size is in bytes, and since a byte holds 8 bits, we can see that a long is 64 bits here.

When initialized, we call Fiddle::Pointer.malloc(PTR_SIZE * size) to allocate enough memory to store n pointers, where n is the given size.

Adding an element, beside the sanity checks around the size and the type requires the following step:

We obtain a pointer to the given string with Pointer.to_ptr(str)
We compute the location of the next element in the array and save it in offset. If the array is empty it’s gonna be 0, if there’s one element, it’ll be 8, etc …
We use the []= notation of the Pointer class to write the 8 bytes representing the address of the given string, which is what the value of ptr.ref. This is similar to the & operator used to return the address of variable.
We increment the size by one.

The get method is similar, we transform the index to an offset by multiplying it by the size of a pointer, 8. And we read a whole pointer, 8 bytes, from there, with the [] method of the Pointer class.

We use .unpack('Q') on the result because the [] method of Pointer returns a string of 8 bytes, representing the 64-bit integer holding the address of the string stored in the array. But we need the actual integer value, to pass to Pointer.new, and that’s what unpack('Q') does, it “unpacks” the string to a long long, a 64-bit integer, and returns that as an array, so we grab the first element with [0].

to_s on Pointer is smart enough to read all the bytes until it finds a null byte and return the result as a Ruby string, great, so we don’t have to deal with that!

require 'fiddle'

class BYOArray < BasicObject

  PTR_SIZE = ::Fiddle::SIZEOF_LONG

  def initialize(max_size)
    @max_size = max_size
    @current_size = 0
    @beginning_address = ::Fiddle::Pointer.malloc(PTR_SIZE * max_size)
  end

  def add(str)
    ::Kernel.raise 'Array is full' if @current_size == @max_size
    ::Kernel.raise 'Expected a string' unless str.is_a?(::String)

    ptr = ::Fiddle::Pointer.to_ptr(str)
    offset = @current_size * PTR_SIZE # 0 at first, then 8, 16, etc ...

    @beginning_address[offset, PTR_SIZE] = ptr.ref

    @current_size += 1
    self
  end

  def get(i)
    return nil if i < 0 || i >= @current_size

    address = @beginning_address[i * PTR_SIZE, PTR_SIZE].unpack('Q')[0]
    ::Fiddle::Pointer.new(address).to_s
  end

  def to_s
    "Size: #{ @current_size }"
  end
end

listing 6.27: A Ruby implementation of an Array structure

ary = BYOArray.new(2)
::Kernel.puts ary.add("foo")
::Kernel.puts ary.add("bar")

::Kernel.puts ary.get(0)

::Kernel.puts ary.get(1)
::Kernel.puts ary.get(2)
::Kernel.puts ary.add("bar")

Note that this is a very very rough sketch of how Fiddle can be used to allocate memory from Ruby in an array-like manner. Among probably other issues, a huge problem with this implementation is that the memory allocated in the constructor is never de-allocated. Fiddle exposes a free method, but since Ruby objects are deallocated by the Garbage Collector, it is non-trivial to figure out how and when to call free. The ObjectSpace module has a define_finalizer method which can be used to define a proc that will be called when the object is garbage collected. This StackOverflow answer shows an example of how it can be used:

class MyClass
  def initialize
    ObjectSpace.define_finalizer(
      self,
      self.class.method(:finalize).to_proc
    )
  end

  def MyClass.finalize(id)
    puts "Object #{id} dying at #{Time.new}"
  end
end

listing 6.28: Using define_finalizer to register a callback on object destruction

Appendix B: A Ruby implementation of SipHash

The following is adapted from the siphash gem. The gem implements the 2-4 version, and the version below implements the same one Redis uses, 1-2. These two numbers, named c & d in the siphash paper represent the number of compression steps and finalization steps. Variants with higher number of compression and finalization steps are supposed to provide a higher security at the cost of being slower. The following is a quote from the Redis implementation:

We use SipHash 1-2. This is not believed to be as strong as the suggested 2-4 variant, but AFAIK there are not trivial attacks against this reduced-rounds version, and it runs at the same speed as Murmurhash2 that we used previously, why the 2-4 variant slowed down Redis by a 4% figure more or less.

# Credit to https://github.com/emboss/siphash-ruby/blob/master/lib/siphash.rb
class SipHash

  # Ruby's Integer class allows numbers by going passed the max value of a 64 bit integer,
  # by encoding the value across multiple 64-bit integers under the hood. In order to make
  # sure that the values we deal with stay within the 64-bit range, we use the following
  # constant as the right operand of an AND bitwise operation
  MASK_64 = 0xffffffffffffffff

  def self.digest(key, msg, compress_rounds: 1, finalize_rounds: 2)
    new(key, msg, compress_rounds, finalize_rounds).digest
  end

  def initialize(key, msg, compress_rounds, finalize_rounds)
    @msg = msg
    @compress_rounds = compress_rounds
    @finalize_rounds = finalize_rounds

    # These are the four 64-bit integers of internal state. The initial values are based on the
    # arbitrary string: "somepseudorandomlygeneratedbytes"
    # "somepseu".split('').map(&:ord).map { |b| b.to_s(16) }.join # => "736f6d6570736575"
    # The 64-bit value is 8317987319222330741, its binary representation is:
    # 0111 0011 0110 1111 0110 1101 0110 0101 0111 0000 0111 0011 0110 0101 0111 0101
    # Which we can obtain with
    # "736f6d6570736575".scan(/../).map { |h| h.hex } # => [115, 111, 109, 101, 112, 115, 101, 117]
    # [115, 111, 109, 101, 112, 115, 101, 117].pack('c8') # => "somepseu"
    # "somepseu".unpack('Q>') # => 8317987319222330741
    # '%064b' % 8317987319222330741 # =>
    # 0111 0011 0110 1111 0110 1101 0110 0101 0111 0000 0111 0011 0110 0101 0111 0101
    #
    # Note that we used 'Q>' which tells unpack to assume big-endianness. Using the default of
    # Q< would have returned the same 8 bytes but in the opposite order, from right to left:
    # "somepseu".unpack('Q<') # => 8459294401660546931
    # '%064b' % 8459294401660546931 # =>
    # 0111 0101 0110 0101 0111 0011 0111 0000 0110 0101 0110 1101 0110 1111 0111 0011
    # These are the same bytes but in different order, the character s is the following 8 bits:
    # ('%08b' % 's'.ord).scan(/..../).join(' ') # => "0111 0011"
    # And the character o is:
    # ('%08b' % 'o'.ord).scan(/..../).join(' ') # => "0110 1111"
    #
    # We can see that in the big endian version, these two bytes are on the left side, and
    # they're on the right side with the little endian version
    #
    # "dorandom".split('').map(&:ord).map { |b| b.to_s(16) }.join # => "646f72616e646f6d"
    # "lygenera".split('').map(&:ord).map { |b| b.to_s(16) }.join # => "6c7967656e657261"
    # "tedbytes".split('').map(&:ord).map { |b| b.to_s(16) }.join # => "7465646279746573"
    @v0 = 0x736f6d6570736575
    @v1 = 0x646f72616e646f6d
    @v2 = 0x6c7967656e657261
    @v3 = 0x7465646279746573

    # The key argument is a 16 byte string, which we want to unpack to two 64-bit integers.
    # A byte contains 8 bits, so one 64-bit integer is composed of 8 bytes, and one 16 byte
    # string can be unpacked to two 64-bit integers.
    # The first line grabs the first 8 bytes with the slice method, and calls unpack with the
    # Q< argument. Q means that Ruby will attempt to unpack as a long, and < is the explicit
    # way of telling it to unpack it as little endian, which is the default on many modern CPUs
    k0 = key.slice(0, 8).unpack('Q<')[0]
    # This line does the same thing with the last 8 bytes of the key
    k1 = key.slice(8, 8).unpack('Q<')[0]

    # The ^ Ruby operator is the XOR bitwise operation, a ^= b is equivalent to a = a ^ b
    # These four lines initialize the four 64-bit integers of internal state with the two
    # parts of the secret, k0 and k1
    @v0 ^= k0
    @v1 ^= k1
    @v2 ^= k0
    @v3 ^= k1
  end

  def digest
    iter = @msg.size / 8

    # Compression step
    iter.times do |i|
      m = @msg.slice(i * 8, 8).unpack('Q<')[0]
      compress(m)
    end

    # Compression of the last characters
    m = last_block(iter)
    compress(m)

    # Finalization step
    finalize

    # Digest result
    @v0 ^ @v1 ^ @v2 ^ @v3
  end

  private

  # This function might look a little bit complicated at first, it implements this section of
  # the compression part of paper, 2.2:
  # where m w-1 includes the last 0 through 7 bytes of m followed by null bytes and ending with a
  # byte encoding the positive integer b mod 256
  def last_block(iter)
    # We initialize last as a 64-bit integer where the leftmost byte, the 8 bits to the left are
    # set to the length of the input. 8 bits can only encode a value up to 255, so if the length
    # is more than 255, the extra bites are discarded, for a length of 11, last would look like
    # 0000 1011 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 000 00000
    # 11 is encoded as 1011 as binary, calling << 56 on it pushes it 56 bits to the left
    last = (@msg.size << 56) & MASK_64

    left = @msg.size % 8
    off = iter * 8

    # At this point, we've applied the compress step to all the 8 byte chunks in the input
    # string, but if the length was not a multiple of 8, there might be between 1 & 7 more bytes
    # we have not compressed yet.
    # For instance, if the string was 'hello world', the length is 11. The digest method would
    # have computed an iter value of 1, because 11 / 8 => 1, and called compress once, with the
    # first 8 bytes, obtained with the slice method, 'hello world'.slice(0, 8) => 'hello wo'
    # We still need to compress the last 3 bytes, 'rld'
    # In this example, left will be 3, and off 8
    # Note that this case/when statement is written to optimize for readability, there are ways
    # to express the same instructions with less code
    case left
    when 7
      last |= @msg[off + 6].ord << 48
      last |= @msg[off + 5].ord << 40
      last |= @msg[off + 4].ord << 32
      last |= @msg[off + 3].ord << 24
      last |= @msg[off + 2].ord << 16
      last |= @msg[off + 1].ord << 8
      last |= @msg[off].ord
    when 6
      last |= @msg[off + 5].ord << 40
      last |= @msg[off + 4].ord << 32
      last |= @msg[off + 3].ord << 24
      last |= @msg[off + 2].ord << 16
      last |= @msg[off + 1].ord << 8
      last |= @msg[off].ord
    when 5
      last |= @msg[off + 4].ord << 32
      last |= @msg[off + 3].ord << 24
      last |= @msg[off + 2].ord << 16
      last |= @msg[off + 1].ord << 8
      last |= @msg[off].ord
    when 4
      last |= @msg[off + 3].ord << 24
      last |= @msg[off + 2].ord << 16
      last |= @msg[off + 1].ord << 8
      last |= @msg[off].ord
    when 3
      # In the example documented above, this is the branch of the case/when we would
      # end up in. last is initially set to:
      # 0000 1011 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 000 00000
      # @msg[off + 2] is the character d, calling ord returns its integer value, 100
      # We shift it 16 bits to the left, effectively inserting as the 6th byte starting
      # from the left, or third from the right:
      # 0000 1011 0000 0000 0000 0000 0000 0000 0000 0000 0110 0100 0000 0000 0000 0000
      last |= @msg[off + 2].ord << 16
      # @msg[off + 1]is the character l, ord returns 108, and we insert it as the 7th byte
      # from the left, or second from the right:

      last |= @msg[off + 1].ord << 8
      # Finally, @msg[off] is the character r, with the ord value 100, and we insert it as the
      # byte from the left, or first from the right:
      # 0000 1011 0000 0000 0000 0000 0000 0000 0000 0000 0110 0100 0110 1100 0111 0010
      last |= @msg[off].ord
    when 2
      last |= @msg[off + 1].ord << 8
      last |= @msg[off].ord
    when 1
      last |= @msg[off].ord
    when 0
      last
    else
      raise "Something unexpected happened with the r value: #{ r }, should be between 0 & 7"
    end

    # Last is now a 64 bit integer containing the length of the input and the last bytes, if any:
    last
  end

  # rotl64 is the left rotation, also called circular shift:
  # https://en.wikipedia.org/wiki/Circular_shift
  def rotl64(num, shift)
    ((num << shift) & MASK_64) | (num >> (64 - shift))
  end

  # This is the main step of the siphash algorithm. A big difference with the C implementation
  # in the Redis codebase is the use of & MASK_64. As explained above, it is used to apply an
  # upper bounds to the results as Ruby would allow them to go past the max value of 64-bit
  # integer: 2^64 - 1
    def sip_round
    @v0 = (@v0 + @v1) & MASK_64
    @v2 = (@v2 + @v3) & MASK_64
    @v1 = rotl64(@v1, 13)
    @v3 = rotl64(@v3, 16)
    @v1 ^= @v0
    @v3 ^= @v2
    @v0 = rotl64(@v0, 32)
    @v2 = (@v2 + @v1) & MASK_64
    @v0 = (@v0 + @v3) & MASK_64
    @v1 = rotl64(@v1, 17)
    @v3 = rotl64(@v3, 21)
    @v1 ^= @v2
    @v3 ^= @v0
    @v2 = rotl64(@v2, 32)
  end

  def compress(m)
    @v3 ^= m
    @compress_rounds.times { sip_round }
    @v0 ^= m
  end

  def finalize
    @v2 ^= 0xff
    @finalize_rounds.times { sip_round }
  end
end

listing 6.29: A Ruby implementation of the siphash 1-2 algorithm

Contents