Redis HyperLogLog
HyperLogLog is a probabilistic data structure used to estimate the cardinality of a set. As a probabilistic data structure, HyperLogLog trades perfect accuracy for efficient space utilization.
Redis added the HyperLogLog data structure in version 2.8.9.
HyperLogLog in Redis is an algorithm used for cardinality estimation. Its main advantage is that, even when the number of input elements is extremely large, the memory required to calculate the cardinality remains fixed and small.
In Redis, each HyperLogLog key only consumes 12 KB of memory, yet it can estimate the cardinality of up to 2^64 unique elements and provide a standard error of 0.81%. This is in stark contrast to traditional data structures like sets, where memory usage increases as the number of elements grows.
However, since HyperLogLog only computes the cardinality based on the input elements without storing them, it cannot return the individual elements like a set can.
What is Cardinality?
For example, consider the dataset {1, 3, 5, 7, 5, 7, 8}. The cardinality of this set is {1, 3, 5, 7, 8}, meaning there are 5 unique elements. Cardinality estimation is the process of quickly calculating the number of unique elements with a small margin of error.
Example
The following example demonstrates how HyperLogLog works:
1 | 127.0.0.1:6379> PFADD hll_key redis |
Basic Commands
The table below lists the basic Redis HyperLogLog commands:
No. | Command and Description |
---|---|
1 | PFADD key element [element ...] Adds the specified elements to the HyperLogLog. |
2 | PFCOUNT key [key ...] Returns the cardinality estimate for the given HyperLogLog(s). |
3 | PFMERGE destkey sourcekey [sourcekey ...] Combines multiple HyperLogLogs into one. |