C++ implementation bitmap sorting example

  • 2020-04-02 02:38:29
  • OfStack

In the book "pearls in programming", a bitmap sorting method not mentioned in the introduction to algorithms is mentioned. This sorting method is to achieve the goal of time-space compromise and win-win by sacrificing spatial efficiency to pursue temporal efficiency (linear time). This article briefly introduces the idea of bitmap sorting in the form of an example.

I. problem description

        1. Input: a file containing at most 10 million non-negative integers

        2. Characteristics: each number is less than 10000000 non-negative integer; No repeating Numbers; There is no correlation between the data.

        3. Constraints: (1) up to 1MB of memory space is available; (2) adequate disk space; Running time a few minutes at most, preferably linear time.
       
        4. Output: sequence of integers in ascending order.

Two, bitmap sorting ideas

Due to the large number of data records to be sorted, we simply use the common sorting method is inefficient in time, and the running time will be very long. And the memory space is limited (limited to about 1MB), so we can't read all the integers into memory at once (if each integer is stored in 7 bytes, then 1MB can only hold about 143,000 digits). Of course we can read the input file multiple times and sort it multiple times, but a better solution is to use bitmap sort, which USES a limited 1MB of memory space and only sorts it once.

1. According to the largest number in the set to be sorted, a bit array is created to represent the integer in the set to be sorted;

2. The corresponding position in the digit bit array in the set to be sorted is set to 1, and the other positions are set to 0;

For sorting collection,2,3,5,8,13 {1}, for example, can be expressed as: 0-1-1-1-0-1-0-0-1-0-0-0-0 to 1

The sorting process can be naturally divided into three steps:

Step 1: set all the bits to 0;

Step 2: set each corresponding bit to 1 by reading in each integer in the file.

Step 3: verify each bit. If the bit is 1, output the corresponding integer.

Note: bitmap sort USES a binary bit instead of an integer to represent a 0 or 1, which can greatly reduce the amount of memory required. The premise of using bitmap sort is to know the maximum number in the sequence to be sorted. The disadvantage of bitmap sorting is that you still have to reserve a bit for some Numbers that haven't been seen before. Therefore, bitmap sorting is more suitable for keyword - intensive sequences, such as a city's telephone number.

The pseudo-code is as follows:


 
  for i = [0, n) 
    bit[i] = 0 
 
  for each i in the input file 
    bit[i] = 1 
 
  for i = [0, n) 
    if bit[i] == 1 
      write i on the output file 

Performance: time complexity up to O(n), 1MB containing 8*1024*1024 bits, memory required 10000000/(8*1024*1024)= 1.20mb, if not strictly limited can be considered to meet the requirements.

Three, bitmap sorting implementation

When sorting bitmaps, we need to consider: given a number, how to find its corresponding bitmap position, the method is to find the number of bytes corresponding to the number, and then find the number of bits corresponding to the number. Such as:


unsigned char bitmap[2]; 
 

A byte has eight bits, and a 5 is the fifth bit of the 0th byte. 14 represents the sixth bit of the first byte.

To simplify bit processing here, we use the bitset container of the C++ standard library. Bitset is a data structure for a collection of bits provided by C++ that allows us to use bits like arrays, with access to bits with specified subscripts. Like other containers, bitset is a template class. The specific bitset method can be viewed in STD ::bitset reference.

Next, we use the bitset container for bitmap sorting:


 
#include<bitset> 
#include<iostream> 
using namespace std; 
 
#define MAX 20 
 
int main() 
{ 
  int arr[10] = {5,1,2,13,7,10,0,20,16,9}; 
 
  bitset<MAX+1> bit; 
   
   
  for(int i=0; i<10; ++i) 
  { 
    bit.set(arr[i]); 
     
  } 
 
   
  for(int i=0; i<MAX+1; ++i) 
  { 
     
    if(bit.test(i)) 
    { 
      cout << i << " "; 
    } 
  } 
  cout << endl; 
} 

Output: 0, 1, 2, 5, 7, 9, 10, 13, 16, 20


Related articles: