Redis offers a classic implementation of hash tables. To handle hash collisions, Redis employs chained hashing, which links data with the same hash value in a list rather than expanding the table. This ensures that all data remains accessible within the hash table. To minimize the performance impact of rehashing, Redis uses a incremental rehashing strategy, which distributes the workload of rehashing over time to reduce system overhead.

In this article, I’ll guide you through the core design principles and implementation details of Redis hash tables. By understanding how Redis manages hash collisions and optimizes rehashing, you’ll gain the tools to implement high-performance hash tables when handling large datasets in practical applications.

How does Redis Implement Chained Hashing?

Before we dive into Redis’ implementation of chained hashing, it’s essential to first understand the structure of Redis hash tables and the cause of hash collisions as data volume increases. This context will help clarify how chained hashing effectively addresses these collisions.

What is a Hash Collision?

At its core, a hash table is an array where each element represents a hash bucket. The first element is bucket 0, the second is bucket 1, and so on. When a key-value pair is inserted, the key is processed by a hash function, and the result is taken modulo the number of elements in the array. This gives the index (or bucket) where the key-value pair will be stored.

For example, as shown in the diagram below, after hashing **<font style="background-color:rgb(249, 250, 251);">key1</font>**, it maps to bucket 1. Similarly, **<font style="background-color:rgb(249, 250, 251);">key3</font>** and **<font style="background-color:rgb(249, 250, 251);">key16</font>** map to buckets 7 and 4, respectively.

From the diagram, we see that there are 16 keys to store, but the hash table only has 8 available slots. This leads to a situation where multiple keys are assigned to the same bucket, resulting in a hash collision.

In practical use, it’s often difficult to predict exactly how much data will need to be stored in a hash table. Creating an oversized hash table upfront can lead to wasted space if the data volume remains low. For this reason, hash tables are typically initialized with a smaller size, with the expectation that the number of keys (keyspace) will eventually outgrow the initial table size.

This mismatch between keyspace and hash table size inevitably causes different keys to be mapped to the same bucket by the hash function, resulting in hash collisions. If each bucket can only store a single key-value pair, this limits the hash table’s ability to manage large datasets effectively. For example, if both **<font style="background-color:rgb(249, 250, 251);">key3</font>** and **<font style="background-color:rgb(249, 250, 251);">key100</font>** are mapped to bucket 5, and bucket 5 can only hold one key, one of the keys would fail to be stored in the hash table.

To resolve hash collisions, two common solutions are used:

  1. Chaining, which I’ll introduce next, allows multiple keys to be stored in the same bucket by linking them in a list. However, it’s important to ensure that chains don’t become too long, as this can degrade the hash table’s performance.
  2. Rehashing, which redistributes the keys when the chain length exceeds a certain threshold. Rehashing can be costly, but Redis mitigates this with incremental rehashing, which spreads the rehashing process over time.

Let’s now explore how chained hashing is designed and implemented to handle hash collisions.

How is Chained Hashing Designed and Implemented?

Chained hashing is a technique where keys mapped to the same bucket in a hash table are linked together using a linked list. Let’s explore how Redis implements chained hashing and why it’s effective in resolving hash collisions.

First, we need to understand how the hash table is implemented in Redis. The source code related to Redis’s hash table can be found in two main files: **<font style="background-color:rgb(249, 250, 251);">dict.h</font>** and **<font style="background-color:rgb(249, 250, 251);">dict.c</font>**. The **<font style="background-color:rgb(249, 250, 251);">dict.h</font>** file defines the structure of the hash table, hash entries, and various operation functions, while **<font style="background-color:rgb(249, 250, 251);">dict.c</font>** contains the actual implementation of these operations.

In **<font style="background-color:rgb(249, 250, 251);">dict.h</font>**, the hash table is defined as a two-dimensional array (**<font style="background-color:rgb(249, 250, 251);">dictEntry **table</font>**), where each element is a pointer to a hash entry (**<font style="background-color:rgb(249, 250, 251);">dictEntry</font>**). The following code snippet shows how the hash table is defined in **<font style="background-color:rgb(249, 250, 251);">dict.h</font>**.

1
2
3
4
5
6
7
8
/* This is our hash table structure. Every dictionary has two of this as we
* implement incremental rehashing, for the old to the new table. */
typedef struct dictht {
dictEntry **table;
unsigned long size;
unsigned long sizemask;
unsigned long used;
} dictht;

To implement chained hashing, Redis’s **<font style="background-color:rgb(249, 250, 251);">dictEntry</font>** structure is designed to include a pointer not only to the key and value but also to the next hash entry. This is what creates the linked list for handling collisions. As shown in the code, the **<font style="background-color:rgb(249, 250, 251);">dictEntry</font>** structure includes a pointer, **<font style="background-color:rgb(249, 250, 251);">*next</font>**, which points to another **<font style="background-color:rgb(249, 250, 251);">dictEntry</font>** structure—this is how Redis implements chained hashing.

1
2
3
4
5
6
7
8
9
10
typedef struct dictEntry {
void *key;
union {
void *val;
uint64_t u64;
int64_t s64;
double d;
} v;
struct dictEntry *next;
} dictEntry;

In addition to the **<font style="background-color:rgb(249, 250, 251);">next</font>** pointer for chaining, there’s another interesting detail worth noting. The value in the key-value pair is defined by a union, **<font style="background-color:rgb(249, 250, 251);">v</font>**, in the **<font style="background-color:rgb(249, 250, 251);">dictEntry</font>** structure. This union includes a pointer to the actual value (**<font style="background-color:rgb(249, 250, 251);">*val</font>**), as well as fields for storing an unsigned 64-bit integer, a signed 64-bit integer, or a **<font style="background-color:rgb(249, 250, 251);">double</font>** value.

The reason I’m pointing this out is that this design choice is a clever memory-saving technique. When the value is an integer or a double, which are 64-bit types, there’s no need for an additional pointer to store the value. Instead, the value can be stored directly within the key-value structure, reducing the need for extra memory.

At this point, you should have a solid understanding of how Redis implements chained hashing. But you might still be wondering how this helps resolve hash collisions.

Let me explain using an example: both **<font style="background-color:rgb(249, 250, 251);">key3</font>** and **<font style="background-color:rgb(249, 250, 251);">key100</font>** are mapped to bucket 5 in the hash table. With chained hashing, bucket 5 doesn’t just store **<font style="background-color:rgb(249, 250, 251);">key3</font>** or **<font style="background-color:rgb(249, 250, 251);">key100</font>**—instead, it uses a linked list to connect **<font style="background-color:rgb(249, 250, 251);">key3</font>** and **<font style="background-color:rgb(249, 250, 251);">key100</font>**, as shown in the diagram below. As more keys are mapped to bucket 5, they are all linked together in a list, thus managing the collisions.

When we want to search for **<font style="background-color:rgb(249, 250, 251);">key100</font>**, we start by computing its hash value and finding that it maps to bucket 5. We then traverse the linked list in bucket 5, comparing each key until we find **<font style="background-color:rgb(249, 250, 251);">key100</font>**. This is how chained hashing allows us to resolve collisions and locate the desired entry.

However, there is a drawback to chained hashing. As the linked list in a single bucket grows longer, the time it takes to search for a key in that bucket increases, which can degrade the overall performance of the hash table.

So, is there a way to minimize the performance impact? Yes! That brings us to the next topic: the design and implementation of rehashing.

How does Redis Implement Rehashing?

Rehashing in Redis essentially refers to expanding the size of the hash table. The basic approach Redis takes to implement rehashing is as follows:

  1. Redis uses two hash tables alternately during rehashing. As I mentioned earlier, Redis defines the hash table using the <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictht</font> structure in <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dict.h</font>. However, when actually using the hash table, Redis also defines a <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dict</font> structure in the same file. This structure contains an array (<font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[2]</font>) with two hash tables: <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[0]</font> and <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[1]</font>. The <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dict</font> structure is defined in the code as follows:

1
2
3
4
5
6
7
typedef struct dict {
dictType *type;
void *privdata;
dictht ht[2];
long rehashidx; /* rehashing not in progress if rehashidx == -1 */
int16_t pauserehash; /* If >0 rehashing is paused (<0 indicates coding error) */
} dict;

  1. **During normal operation, all key-value pairs are written to **<font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[0]</font>.
  2. When rehashing occurs, key-value pairs are gradually migrated from <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[0]</font> to <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[1]</font>.
  3. After migration is complete, the memory space of <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[0]</font> is released, and the address of <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[1]</font> is assigned to <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[0]</font>. Then, the size of <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[1]</font> is reset to zero. At this point, Redis resumes normal operations, with <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[0]</font> handling requests and <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[1]</font> reserved for the next rehash.

To help you understand this alternating process between <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[0]</font> and <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[1]</font>, I’ve included a diagram below.

Now that you have a basic understanding of how Redis alternates between two hash tables to implement rehashing, let’s discuss some important questions that need to be addressed during rehashing. I believe there are three key issues to consider:

  1. When should rehashing be triggered?
  2. How much should the hash table be expanded?
  3. How should rehashing be carried out?

Next, I’ll walk you through the code implementation for each of these questions, so you can clearly see how Redis handles these aspects of rehashing.

When Should Rehashing Be Triggered?

To determine when rehashing is needed, Redis uses the function <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">_dictExpandIfNeeded</font>. Let’s first look at the conditions within this function that trigger resizing, and then explore where this function is called in the Redis codebase.

There are three conditions defined in the <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">_dictExpandIfNeeded</font> function that can trigger expansion:

  1. Condition 1: The size of <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[0]</font> is zero.
  2. Condition 2: The number of elements in <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[0]</font> exceeds its current size, and the hash table can be resized.
  3. Condition 3: The number of elements in <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[0]</font> is a multiple of the table size, defined by <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dict_force_resize_ratio</font> (which defaults to 5).

Here’s a snippet of the code defining these conditions in the <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">_dictExpandIfNeeded</font> function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/* Expand the hash table if needed */
static int _dictExpandIfNeeded(dict *d)
{
/* Incremental rehashing already in progress. Return. */
if (dictIsRehashing(d)) return DICT_OK;

/* If the hash table is empty expand it to the initial size. */
if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);

/* If we reached the 1:1 ratio, and we are allowed to resize the hash
* table (global setting) or we should avoid it but the ratio between
* elements/buckets is over the "safe" threshold, we resize doubling
* the number of buckets. */
if (!dictTypeExpandAllowed(d))
return DICT_OK;
if ((dict_can_resize == DICT_RESIZE_ENABLE &&
d->ht[0].used >= d->ht[0].size) ||
(dict_can_resize != DICT_RESIZE_FORBID &&
d->ht[0].used / d->ht[0].size > dict_force_resize_ratio))
{
return dictExpand(d, d->ht[0].used + 1);
}
return DICT_OK;
}

For the first condition, if the hash table is empty, Redis simply sets the table to its initial size, which is part of initialization, not rehashing.

Conditions two and three, however, correspond to rehashing scenarios. In both cases, Redis compares the number of elements in the hash table (<font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">d->ht[0].used</font>) with the table’s current size (<font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">d->ht[0].size</font>). This ratio is known as the load factor. Redis triggers rehashing when the load factor is either equal to or greater than 1, or greater than 5.

When the load factor exceeds 5, it indicates that the hash table is severely overloaded and needs to be resized immediately. If the load factor is 1 or greater, Redis checks the value of the variable <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dict_can_resize</font> to determine if resizing is currently allowed.

You may wonder, what exactly is the <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dict_can_resize</font> variable? This variable is controlled by two functions: <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictEnableResize</font> and <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictDisableResize</font>, which enable and disable hash table resizing, respectively:

1
2
3
4
5
6
7
void dictEnableResize(void) {
dict_can_resize = 1;
}

void dictDisableResize(void) {
dict_can_resize = 0;
}

These functions are wrapped within <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">updateDictResizePolicy</font>. This function enables or disables rehashing based on whether Redis is currently running an RDB snapshot or AOF rewrite process. If neither process is running, rehashing is allowed:

1
2
3
4
5
6
void updateDictResizePolicy(void) {
if (server.rdb_child_pid == -1 && server.aof_child_pid == -1)
dictEnableResize();
else
dictDisableResize();
}

Now that we understand the conditions for triggering rehashing in <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">_dictExpandIfNeeded</font>, let’s look at where this function is called in Redis. By examining the <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dict.c</font> file, we can see that <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">_dictExpandIfNeeded</font> is invoked by the <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">_dictKeyIndex</font> function, which is itself called by <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictAddRaw</font>. The <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictAddRaw</font> function is then called by three key functions:

  1. <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictAdd</font>: Adds a key-value pair to the hash table.
  2. <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictReplace</font>: Adds a key-value pair or replaces an existing one.
  3. <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictAddOrFind</font>: Calls <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictAddRaw</font> directly.

Thus, every time Redis adds or modifies a key-value pair, it checks whether rehashing is necessary. The following diagram illustrates the call relationships leading to <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">_dictExpandIfNeeded</font>.

In summary, the key functions responsible for triggering rehashing in Redis are <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">_dictExpandIfNeeded</font> and <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">updateDictResizePolicy</font>. The former checks the load factor and whether resizing is allowed, while the latter enables or disables resizing based on RDB and AOF processes.

Next, we’ll explore the second major question in rehashing: how much should Redis expand the hash table?

How Much Should the Hash Table Be Expanded?

In Redis, when resizing the hash table space due to rehashing, it calls the **<font style="background-color:rgb(249, 250, 251);">dictExpand</font>** function. This function takes two parameters: the hash table to be resized (**<font style="background-color:rgb(249, 250, 251);">dict *d</font>**) and the target size (**<font style="background-color:rgb(249, 250, 251);">unsigned long size</font>**). The prototype of the **<font style="background-color:rgb(249, 250, 251);">dictExpand</font>** function is shown below:

1
int dictExpand(dict *d, unsigned long size);

To determine whether resizing is needed for a hash table, we use the **<font style="background-color:rgb(249, 250, 251);">_dictExpandIfNeeded</font>** function. Once resizing is deemed necessary, Redis expands the hash table during rehashing by doubling its current size (**<font style="background-color:rgb(249, 250, 251);">size * 2</font>**) if the current table’s used space is **<font style="background-color:rgb(249, 250, 251);">size</font>**, as illustrated in the following code snippet:

1
dictExpand(d, d->ht[0].used + 1);

The actual resizing process in the **<font style="background-color:rgb(249, 250, 251);">dictExpand</font>** function is managed by the **<font style="background-color:rgb(249, 250, 251);">_dictNextPower</font>** function. This function continuously doubles the hash table’s initial size (**<font style="background-color:rgb(249, 250, 251);">DICT_HT_INITIAL_SIZE</font>**) until it reaches the target size, as shown in the code snippet below:

1
2
3
4
5
6
7
8
9
10
11
12
/* Our hash table capability is a power of two */
static unsigned long _dictNextPower(unsigned long size)
{
unsigned long i = DICT_HT_INITIAL_SIZE;

if (size >= LONG_MAX) return LONG_MAX + 1LU;
while(1) {
if (i >= size)
return i;
i *= 2;
}
}

Now, let’s delve into Redis’s approach to addressing the third issue: how to execute rehashing. Essentially, this involves Redis implementing a incremental rehashing design.

How is Incremental Rehashing Implemented?

First, let’s clarify why we need incremental rehashing in the first place. When a hash table undergoes rehashing, the hash table space expands, and keys that were mapped to certain positions may now need to be relocated to new ones. This means many keys must be copied from their old positions to new ones. During this key copying process, Redis’s main thread is unable to handle other requests, causing the rehashing process to block the main thread and lead to performance overhead.

To mitigate this overhead, Redis introduced the concept of incremental rehashing.

In simple terms, incremental rehashing means Redis doesn’t copy all the keys from the current hash table to the new one in a single operation. Instead, it copies them in batches, with each operation only transferring keys from one bucket at a time. This limits the duration of each copying operation, thereby reducing its impact on the main thread.

So, how is incremental rehashing implemented in the code? There are two key functions involved: <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictRehash</font> and <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">_dictRehashStep</font>.

Let’s start with the <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictRehash</font> function, which performs the actual key copying. It takes two input parameters: the global hash table (which consists of the <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dict</font> structure holding <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[0]</font> and <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[1]</font>) and the number of buckets to be rehashed.

The logic of the <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictRehash</font> function is divided into two parts:

  1. First, it runs a loop, copying the keys in each bucket according to the specified number of buckets <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">n</font>. Of course, if all the data in <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[0]</font> has already been transferred, the copying loop stops.
  2. Second, after copying <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">n</font> buckets, the function checks whether all the data in <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[0]</font> has been migrated. If so, the space for <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[0]</font> is released. Since Redis always references <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[0]</font> when handling requests, once the rehashing process is complete and the data is fully in <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[1]</font>, Redis assigns <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[1]</font> to <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[0]</font>, ensuring that other parts of the code can continue functioning as expected. After this, the size of <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[1]</font> is reset to zero in preparation for the next rehash. At the same time, the global hash table’s <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">rehashidx</font> variable is set to -1, indicating that the rehash process has finished (I’ll explain the purpose of the <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">rehashidx</font> variable shortly).

Here’s a diagram that illustrates the primary steps of the <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictRehash</font> function.

You can also check the code snippet below to better understand its logic.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
/* Performs N steps of incremental rehashing. Returns 1 if there are still
* keys to move from the old to the new hash table, otherwise 0 is returned.
*
* Note that a rehashing step consists in moving a bucket (that may have more
* than one key as we use chaining) from the old to the new hash table, however
* since part of the hash table may be composed of empty spaces, it is not
* guaranteed that this function will rehash even a single bucket, since it
* will visit at max N*10 empty buckets in total, otherwise the amount of
* work it does would be unbound and the function may block for a long time. */
int dictRehash(dict *d, int n) {
int empty_visits = n*10; /* Max number of empty buckets to visit. */
unsigned long s0 = d->ht[0].size;
unsigned long s1 = d->ht[1].size;
if (dict_can_resize == DICT_RESIZE_FORBID || !dictIsRehashing(d)) return 0;
if (dict_can_resize == DICT_RESIZE_AVOID &&
((s1 > s0 && s1 / s0 < dict_force_resize_ratio) ||
(s1 < s0 && s0 / s1 < dict_force_resize_ratio)))
{
return 0;
}

while(n-- && d->ht[0].used != 0) {
dictEntry *de, *nextde;

/* Note that rehashidx can't overflow as we are sure there are more
* elements because ht[0].used != 0 */
assert(d->ht[0].size > (unsigned long)d->rehashidx);
while(d->ht[0].table[d->rehashidx] == NULL) {
d->rehashidx++;
if (--empty_visits == 0) return 1;
}
de = d->ht[0].table[d->rehashidx];
/* Move all the keys in this bucket from the old to the new hash HT */
while(de) {
uint64_t h;

nextde = de->next;
/* Get the index in the new hash table */
h = dictHashKey(d, de->key) & d->ht[1].sizemask;
de->next = d->ht[1].table[h];
d->ht[1].table[h] = de;
d->ht[0].used--;
d->ht[1].used++;
de = nextde;
}
d->ht[0].table[d->rehashidx] = NULL;
d->rehashidx++;
}

/* Check if we already rehashed the whole table... */
if (d->ht[0].used == 0) {
zfree(d->ht[0].table);
d->ht[0] = d->ht[1];
_dictReset(&d->ht[1]);
d->rehashidx = -1;
return 0;
}

/* More to rehash... */
return 1;
}

Now that we’ve covered the core functionality of the <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictRehash</font> function, let’s explore how incremental rehashing works by migrating data at the bucket level. This is where the <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">rehashidx</font> variable in the global hash table structure comes into play.

The <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">rehashidx</font> variable indicates which bucket is currently being migrated. For instance, when <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">rehashidx</font> is 0, the first bucket in <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[0]</font> is being migrated; when it’s 1, the second bucket is being migrated, and so on.

The main loop in the <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictRehash</font> function checks whether the bucket pointed to by <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">rehashidx</font> is empty. If it is, <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">rehashidx</font> is incremented, and the next bucket is examined.

But is it possible for multiple consecutive buckets to be empty? Yes, it is. In such cases, incremental rehashing doesn’t just keep incrementing <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">rehashidx</font> indefinitely. The reason is that during rehashing, the Redis main thread is unable to handle other requests. Therefore, Redis introduces a variable called <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">empty_visits</font> to track how many empty buckets have been checked. After a certain number of empty buckets have been examined, the current round of rehashing stops, allowing Redis to process external requests and avoid performance degradation. You can see this logic in the code snippet below.

1
2
3
4
while(d->ht[0].table[d->rehashidx] == NULL) {
d->rehashidx++;
if (--empty_visits == 0) return 1;
}

If the bucket that <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">rehashidx</font> points to contains data for migration, Redis will take each hash entry from that bucket, recalculate its new position in <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[1]</font> based on the size of <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[1]</font>, and insert the hash entry into the appropriate bucket in <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[1]</font>. Each time a hash entry is migrated, the <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">used</font> variables in <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[0]</font> and <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[1]</font> (which track how many entries each table holds) are updated—<font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">used</font> in <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[0]</font> decreases by one, while <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">used</font> in <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">ht[1]</font> increases by one. Once all data in the current bucket is migrated, <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">rehashidx</font> is incremented, and the process continues with the next bucket. The code snippet below illustrates this migration process.

By now, we’ve covered the complete logic of the <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictRehash</font> function. We know that <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictRehash</font> migrates hash entries on a bucket-by-bucket basis, with the number of buckets processed being determined by the loop count parameter <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">n</font>. Whenever Redis performs a rehash operation, it eventually calls the <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictRehash</font> function.

Next, let’s look at the second key function involved in incremental rehashing: <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">_dictRehashStep</font>, which performs rehashing one bucket at a time. According to Redis’s source code, there are five functions that call <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">_dictRehashStep</font>, which in turn calls <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictRehash</font> to carry out the rehashing process. These functions are: <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictAddRaw</font>, <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictGenericDelete</font>, <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictFind</font>, <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictGetRandomKey</font>, and <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictGetSomeKeys</font>.

The functions <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictAddRaw</font> and <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictGenericDelete</font> are used to add and delete key-value pairs in Redis, while the other three functions are for querying. The diagram below shows the relationships between these functions.

It’s important to note that regardless of whether you’re adding, deleting, or querying, when these five functions call <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">_dictRehashStep</font>, the loop count parameter <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">n</font> passed to <font style="color:rgb(55, 65, 81);background-color:rgb(249, 250, 251);">dictRehash</font> is always set to 1, as shown in the following code:

1
2
3
4
static void _dictRehashStep(dict *d) {
// Passes 1 as the loop count parameter to `dictRehash`, meaning that after migrating one bucket, normal operations resume
if (d->iterators == 0) dictRehash(d, 1);
}

This means that after migrating a single bucket, the hash table is free to handle regular add, delete, or query operations. This is how incremental rehashing is implemented at the code level.

Conclusion

Creating a high-performance hash table is not only a priority for Redis but also a crucial goal in the development of many computer systems. To build an efficient hash table, two key issues need to be addressed: hash collisions and the overhead of rehashing.

In this article, we explored the structure of Redis’s hash table, the implementation of chained hashing, and the design and implementation of incremental rehashing. Redis’s hash table structure is unique—each hash entry contains a pointer, enabling chained hashing. Additionally, Redis uses a global hash table that contains two hash tables, a design that facilitates data migration from one table to another during rehashing.

The incremental rehashing mechanism in Redis is a general method for hash table expansion and is worth studying. Its core idea is to migrate only a limited number of buckets at a time, preventing performance issues that arise from migrating all buckets in one go. Once you understand the concept and implementation of incremental rehashing, you can apply it to your own hash table designs.