Talk about Redis binary array Bitmap

  • 2021-10-27 07:37:37
  • OfStack

It hasn't been updated for a long time. When the company was caching attention/fans, I chose Bitmap. At that time, it was the first time I saw this data array form. SETBIT, GETBIT and BITCOUNT were used. Let's not talk about how to use the commands. Today, let's take a closer look at the implementation and principle of these three commands.

Considerations for selecting bitmap:

Realization of bit group

The concerned object and the concerned person in the concerned relationship requirements are 0-tens of millions of data objects. When storing this corresponding relationship, using bitmap as a bit group obviously saves storage space compared with set of uid, and the memory of redis is very precious, which is worth considering.

The bit group can be roughly expressed as a binary string such as 0101010000100000...... 0100. The string object implementation in Redis can be seen in SDS string 1 of Redis, and the data structure of SDS is binary safe, so Redis can use strings to represent bit groups. So, according to the above, the bit group is in the form of a string: buf [0] buf [1]... so that one byte is stored.

SETBIT and GETBIT

Implementation of GETBIT:


  #  Return   Bit group  bitarray  In  offset  On the offset 2 Binary bit (byte*8+bit) Value of 
  getbit <bitarray> <offset>
  #  Byte 
  byte = offset / 8  
  #  Bit 
  bit = (offset mod 8) + 1
  #  You can see  O(1)

Implementation of SETBIT:


  #  Will   Bit group  bitarray  In offset  On the offset 2 The value of the binary bit is set to  value
  setbit <bitarray> <offset> <value>
  #  Calculation save 2 How many binary bits do you need   Byte 
  len = [offset / 8] + 1 
  #  Squid  ? 2 The data structure used by the binary bit group is  sds  , and  sds  The record length is len  , normal expansion, the same space pre-allocation   The extension bit is `00000`
  #  Byte 
  byte = offset / 8  
  #  Bit 
  bit = (offset mod 8) + 1 
  #  Record  (byte*8+bit)  Upper  oldvalue  And assigns a new value, returning the  oldvalue

Implementation of Bitcount

BITCOUNT counts the number of values in the positioning array with a value of 1, that is, counting Hamming weight (see Leetcode 191,338), which is actually an old problem. Look at several algorithms and the practice of redis.


#1.  Rough traversal  O(n)

class Solution(object):
    def hammingWeight(self, n):
        rst = 0
        mask = 1
        for i in range(32):
            if n & mask :
                rst += 1
            mask = mask << 1
        return  rst
        
#2.  Look-up table method  
#  The principle of look-up table method lies in establishing N Single digit group, if it is 8 Bit, that is, reduce the number of queries /8  The larger the table is built, the less the number of queries is 
#  The typical operation of changing space for time reminds me of it   Counting sort O(n+k) 
#  Memory considerations: It can be calculated here 1 Under  8 Bit table lookup consumes hundreds of bytes of memory, 16 Bit look-up table is 100 Kb , 32 Bit is 10 Multiple G In practice, the server can only accept 16 Bit 
# CPU Consider: The larger the watch length, miss  The greater the rate. And  max  For 16  It can not solve the problem of table lookup efficiency of large arrays 
#  This algorithm  O(n/m) S(m) m<=16 

#3.variable-precision SWAR  Algorithm  O(1)

#4.Redis 
# redis  Unprocessed 2 Binary number  >= 128 , using variable-precision SWAR
# redis  Unprocessed 2 Binary number  < 128 , using   Look-up table method 

Redis-High Performance bitmap

Real-time indicator

Redis bitmap can be used to achieve real-time metrics quickly and simply.

Traditionally, index data is generated by batch job. However, bitmap of redis supports real-time index calculation and has extremely high space utilization. For example, 128 million users, counting daily UV in real time, only occupy 16MB memory space, and take 50ms on mbp.

bitset

Think of it as an array of zeros and ones. Each bit in bitset can be 0 or 1, and offset is used to indicate the position of bit in the array. Support bit operation between multiple bitset, such as AND or NOR.

Group statistics

The population statistics of bitset represent the number of bit with a data of 1 in bitset. Using bitset for population statistics is very efficient. For example, bitset with 1 billion bit has set data in 90% of its space, and it takes only 10 or several hundred ms to make population statistics on mbp.

redis bitmap

bitmap is binary data, and the data of specific offset position bit can be set by set key offset val instruction, and the time complexity is O (1).

Daily users

Count the number of active users for a concern (a page or an operation).

key rule: Concern name + day timestamp

val: Create an bitmap with a width of the current number of users. id of each user is used as offset, and this user ID is a record ID, which cannot be an userID generated by specific rules.

When the user accesses the focus, the user IDoffset location data is set to 1 for the specific bitset. After that, the group statistics are carried out on the bitset, which is the daily users of the focus.

Active users with different time dimensions can be stored in the way of concern name + timestamp.

For example, the number of users playing music per hour, key is defined as play_yyyyMMddhh; The number of users who play music every day, key is defined as play_yyyyMMdd.

When you need to count the number of users in a large time range, you can first combine multiple bitset, and then make group statistics, such as counting the number of users in one week and one month.

Performance comparison

128 million users use bitset to record daily activities, and use daily activities to collect statistics on 7 days and 15 days.
PERIODTIME (MS)
Daily50.2
Weekly392.0
Monthly1624.8

Characteristic analysis

The only user who accesses the concern in the cycle dimension and binds the mobile phone number, and the user bitset who accesses the concern in the cycle dimension by using the user bitset bound to the mobile phone number

According to the rolling statistics of only one user in the last n day, the union of daily users bitset in the last n day is obtained

Instructions involved

Population statistics using bitcount key

The intersection union uses bitop and/or dest key1 keyn

Use setbit key offset 1 for user IDoffset setting data


java bitset.cardinality()/and(bitset)/or(bitset)

Related articles: