Talk about Redis binary array Bitmap
- 2021-10-27 07:37:37
- OfStack
It hasn't been updated for a long time. When the company was caching attention/fans, I chose Bitmap. At that time, it was the first time I saw this data array form. SETBIT, GETBIT and BITCOUNT were used. Let's not talk about how to use the commands. Today, let's take a closer look at the implementation and principle of these three commands.
Considerations for selecting bitmap:
Realization of bit group
The concerned object and the concerned person in the concerned relationship requirements are 0-tens of millions of data objects. When storing this corresponding relationship, using bitmap as a bit group obviously saves storage space compared with set of uid, and the memory of redis is very precious, which is worth considering.
The bit group can be roughly expressed as a binary string such as 0101010000100000...... 0100. The string object implementation in Redis can be seen in SDS string 1 of Redis, and the data structure of SDS is binary safe, so Redis can use strings to represent bit groups. So, according to the above, the bit group is in the form of a string: buf [0] buf [1]... so that one byte is stored.
SETBIT and GETBIT
Implementation of GETBIT:
# Return Bit group bitarray In offset On the offset 2 Binary bit (byte*8+bit) Value of
getbit <bitarray> <offset>
# Byte
byte = offset / 8
# Bit
bit = (offset mod 8) + 1
# You can see O(1)
Implementation of SETBIT:
# Will Bit group bitarray In offset On the offset 2 The value of the binary bit is set to value
setbit <bitarray> <offset> <value>
# Calculation save 2 How many binary bits do you need Byte
len = [offset / 8] + 1
# Squid ? 2 The data structure used by the binary bit group is sds , and sds The record length is len , normal expansion, the same space pre-allocation The extension bit is `00000`
# Byte
byte = offset / 8
# Bit
bit = (offset mod 8) + 1
# Record (byte*8+bit) Upper oldvalue And assigns a new value, returning the oldvalue
Implementation of Bitcount
BITCOUNT counts the number of values in the positioning array with a value of 1, that is, counting Hamming weight (see Leetcode 191,338), which is actually an old problem. Look at several algorithms and the practice of redis.
#1. Rough traversal O(n)
class Solution(object):
def hammingWeight(self, n):
rst = 0
mask = 1
for i in range(32):
if n & mask :
rst += 1
mask = mask << 1
return rst
#2. Look-up table method
# The principle of look-up table method lies in establishing N Single digit group, if it is 8 Bit, that is, reduce the number of queries /8 The larger the table is built, the less the number of queries is
# The typical operation of changing space for time reminds me of it Counting sort O(n+k)
# Memory considerations: It can be calculated here 1 Under 8 Bit table lookup consumes hundreds of bytes of memory, 16 Bit look-up table is 100 Kb , 32 Bit is 10 Multiple G In practice, the server can only accept 16 Bit
# CPU Consider: The larger the watch length, miss The greater the rate. And max For 16 It can not solve the problem of table lookup efficiency of large arrays
# This algorithm O(n/m) S(m) m<=16
#3.variable-precision SWAR Algorithm O(1)
#4.Redis
# redis Unprocessed 2 Binary number >= 128 , using variable-precision SWAR
# redis Unprocessed 2 Binary number < 128 , using Look-up table method
Redis-High Performance bitmap
Real-time indicator
Redis bitmap can be used to achieve real-time metrics quickly and simply.
Traditionally, index data is generated by batch job. However, bitmap of redis supports real-time index calculation and has extremely high space utilization. For example, 128 million users, counting daily UV in real time, only occupy 16MB memory space, and take 50ms on mbp.
bitset
Think of it as an array of zeros and ones. Each bit in bitset can be 0 or 1, and offset is used to indicate the position of bit in the array. Support bit operation between multiple bitset, such as AND or NOR.
Group statistics
The population statistics of bitset represent the number of bit with a data of 1 in bitset. Using bitset for population statistics is very efficient. For example, bitset with 1 billion bit has set data in 90% of its space, and it takes only 10 or several hundred ms to make population statistics on mbp.
redis bitmap
bitmap is binary data, and the data of specific offset position bit can be set by set key offset val instruction, and the time complexity is O (1).
Daily users
Count the number of active users for a concern (a page or an operation).
key rule: Concern name + day timestamp
val: Create an bitmap with a width of the current number of users. id of each user is used as offset, and this user ID is a record ID, which cannot be an userID generated by specific rules.
When the user accesses the focus, the user IDoffset location data is set to 1 for the specific bitset. After that, the group statistics are carried out on the bitset, which is the daily users of the focus.
Active users with different time dimensions can be stored in the way of concern name + timestamp.
For example, the number of users playing music per hour, key is defined as play_yyyyMMddhh; The number of users who play music every day, key is defined as play_yyyyMMdd.
When you need to count the number of users in a large time range, you can first combine multiple bitset, and then make group statistics, such as counting the number of users in one week and one month.
Performance comparison
128 million users use bitset to record daily activities, and use daily activities to collect statistics on 7 days and 15 days.
PERIODTIME (MS)
Daily50.2
Weekly392.0
Monthly1624.8
Characteristic analysis
The only user who accesses the concern in the cycle dimension and binds the mobile phone number, and the user bitset who accesses the concern in the cycle dimension by using the user bitset bound to the mobile phone number
According to the rolling statistics of only one user in the last n day, the union of daily users bitset in the last n day is obtained
Instructions involved
Population statistics using bitcount key
The intersection union uses bitop and/or dest key1 keyn
Use setbit key offset 1 for user IDoffset setting data
java bitset.cardinality()/and(bitset)/or(bitset)