The analysis of bitmap processing mass data and its implementation method

  • 2020-04-02 01:01:31
  • OfStack

【 what is a bit-map 】
The so-called bit-map is to mark the corresponding Value of an element with a Bit Bit, and the Key is the element. With the use of bits to store data, there are significant savings in storage space.
If we don't know what a bit-map is yet, let's look at a specific example. Let's say we want to sort 5 elements (4,7,2,5,3) from 0 to 7 (assuming they don't repeat). Then we can use the bit-map method to achieve the purpose of sorting. To represent 8 Numbers, we only need 8 bits (1Bytes). First, we set the space of 1Byte and set all the bits of these Spaces to 0(as shown in the figure below)
< img Alt = "" border = 0 SRC =" / / files.jb51.net/file_images/article/201305/201305290939013.gif ">  
Then we go through the five elements, so the first element is 4, so we set the position of 4 to 1 (so we can do p+(I /8)|(0x01) < < (I %8)) of course, the operation here involves big-ending and little-ending, which is considered as big-ending). Since it starts from zero, the fifth position should be one (as shown in the figure below) :  

< img Alt = "" border = 0 SRC =" / / files.jb51.net/file_images/article/201305/201305290939014.gif ">
Then the second element 7 is processed, the eighth position is 1, and then the third element is processed, until all the elements are processed at last, the corresponding position is 1, at this time, the state of the memory Bit is as follows:  
< img Alt = "" border = 0 SRC =" / / files.jb51.net/file_images/article/201305/201305290939015.gif ">

Then we now go through the Bit area and print out the number (2,3,4,5,7) of the Bit that is one, so that we can sort. The following code shows the use of a BitMap: sort.

//Define 8 bits & NBSP in each Byte; & have spent
#include  The < memory.h >   
#define BYTESIZE 8  
void SetBit(char *p, int posi)   
{   
    for(int i=0; i  The <  (posi/BYTESIZE); i++)   
    {   
        p++;   
    }   

    *p = *p|(0x01 < < (posi%BYTESIZE));//Assign the Bit value 1& PI;
    return;   
}   

void BitMapSortDemo()   
{   
    //For the sake of simplicity, we don't consider negative Numbers & NBSP;
    int num[] = {3,5,2,10,6,12,8,14,9};   

    //The value of BufferLen is determined by the maximum value of the data to be sorted.
    //The maximum value in the order is 14, so you only need 2 Bytes(16 bits)& PI;
    //That's it. & have spent & have spent
    const int BufferLen = 2;   
    char *pBuffer = new char[BufferLen];   

    //Set all Bit positions to zero, otherwise the result is unpredictable. & have spent
    memset(pBuffer,0,BufferLen);   
    for(int i=0;i The < 9;i++)   
    {   
        //First place the corresponding Bit Bit as 1& PI;
        SetBit(pBuffer,num[i]);   
    }   

    //Output sorting result & NBSP; & have spent
    for(int i=0;i The < BufferLen;i++)//Processing one Byte & NBSP; at a time;
    {   
        for(int j=0;j The < BYTESIZE;j++)//Handles each Bit Bit & PI in the byte;
        {   
            //To determine whether the bit is 1 or not, output it. The judgment here is rather stupid. & have spent
            //First get the mask of the j-th bit (0x01 < < j), and put the & PI in the memory region;
            //Bit and this mask do and operate. Finally, determine whether the mask matches the processed & NBSP;
            //The result is the same & NBSP;
            if((*pBuffer&(0x01 < < j)) == (0x01 < < j))   
            {   
                printf("%d ",i*BYTESIZE + j);   
            }   
        }   
        pBuffer++;   
    }   
}   

int _tmain(int argc, _TCHAR* argv[])   
{   
    BitMapSortDemo();   
    return 0;   
}  

[scope of application]
Data can be quickly found, weighed, deleted, generally the data range is less than 10 times of int
[basic principles and key points]
Use a bit array to indicate whether certain elements, such as an 8-bit phone number, exist
【 extend 】
Bloom filter can be seen as an extension of bit-map
[example of problem]
1) given that a file contains some phone Numbers, each of which is an 8-digit number, count the number of different Numbers.
Eight bits at most 99 999 999, about 99m bits, about 10 megabytes of memory. (it can be understood as the digits from 0 to 99 999 999, each digit corresponds to a Bit Bit, so only 99M bits are needed == 1.2mbytes, in this way, it USES a small memory of 1.2m to represent all 8-digit phones)
2) find out the number of non-repeating integers among 250 million integers. The memory space is not enough to hold the 250 million integers.
The bit-map is extended to represent a number by 2 bits, 0 means no occurrence, 1 means one occurrence, 2 means two or more occurrences. When traversing these Numbers, if the value of the corresponding position is 0, it is set as 1. If it's 1, set it to 2; If it's 2, it stays the same. Or instead of representing it with 2 bits, we can simulate it with two bit-maps, which is the same thing.
C language implementation of bitmap

bitmap.h  

  
#ifndef _BITMAP_H_  
#define _BITMAP_H_  

  
int bitmap_init(int size, int start);  

  
int bitmap_set(int index);  

  
int bitmap_get(int i);  

  
int bitmap_data(int index);  

  
int bitmap_free();  

#endif  

bitmap.c  

#include <stdio.h>  
#include <stdlib.h>  
#include <string.h>  
#include "bitmap.h"  

unsigned char *g_bitmap = NULL;  
int g_size = 0;  
int g_base = 0;  

int bitmap_init(int size, int start)  
{  
    g_bitmap = (char *)malloc((size/8+1)*sizeof(char));  
    if(g_bitmap == NULL)  
        return 0;  
    g_base = start;  
    g_size = size/8+1;  
    memset(g_bitmap, 0x0, g_size);  
    return 1;  
}  

int bitmap_set(int index)  
{  
    int quo = (index-g_base)/8 ;  
    int remainder = (index-g_base)%8;  
    unsigned char x = (0x1<<remainder);  
    if( quo > g_size)  
        return 0;  
    g_bitmap[quo] |= x;  
    return 1;   
}  

int bitmap_get(int i)  
{  
    int quo = (i)/8 ;  
    int remainder = (i)%8;  
    unsigned char x = (0x1<<remainder);  
    unsigned char res;  
    if( quo > g_size)  
        return -1;  
    res = g_bitmap[quo] & x;  
    return res > 0 ? 1 : 0;   
}  

int bitmap_data(int index)  
{  
    return (index + g_base);  
}  

int bitmap_free()  
{  
    free(g_bitmap);  
}  

 The test program bitmap_test.c :   

#include <stdio.h>  
#include "bitmap.h"  

int main()  
{  
    int a[] = {5,8,7,6,3,1,10,78,56,34,23,12,43,54,65,76,87,98,89,100};  
    int i;  
    bitmap_init(100, 0);  
    for(i=0; i<20; i++)  
        bitmap_set(a[i]);  
    for(i=0; i<100; i++)  
    {  
        if(bitmap_get(i) > 0 )  
            printf("%d ", bitmap_data(i));  
    }  
    printf("/n");  
    bitmap_free();  
    return 0;  
}  


Related articles: