Python implementation of bitmap data structure details

2020-04-02 13:26:48
OfStack

Bitmap is a very common data structure, such as in Bloom Filter. Sort without repeating integers and so on. Bitmaps are typically implemented based on arrays, where each element can be viewed as a series of binary Numbers, all of which form a larger binary collection. For Python, integer types are signed by default, so an integer has 31 bits available.

Bitmap implementation ideas

Bitmap is used to operate on each bit. For example, if a Python array contains four 32-bit signed integers, the total number of available bits is 4 * 31 = 124. If you want to operate on the 90th bit, you first get the number of elements in the array of operations, then the corresponding bit index, and then the operation.

< img border = 0 SRC = "/ / files.jb51.net/file_images/article/201403/20140328224020.jpg" >

The figure above shows a 32-bit integer. In Python, the default is a signed type, and the highest bit is a signed bit, which bitmap cannot use. The left is the highest, the right is the lowest, the lowest is the 0th bit.

Bitmap is used to operate on each bit. For example, if a Python array contains four 32-bit signed integers, the total number of available bits is 4 * 31 = 124. If you want to operate on the 90th bit, you first get the number of elements in the array of operations, then the corresponding bit index, and then the operation.

Initialize the bitmap

The bitmap needs to be initialized first. In the case of 90, since a single integer can only use 31 bits, dividing 90 by 31 and rounding up will tell you how many array elements are needed. The code is as follows:


#!/usr/bin/env python
#coding: utf8
class Bitmap(object):
 def __init__(self, max):
  self.size = int((max + 31 - 1) / 31) # Take up the whole 
if __name__ == '__main__':
 bitmap = Bitmap(90)
 print ' Need to be  %d  An element. ' % bitmap.size


$ python bitmap.py
 Need to be  3  An element.

Evaluates the index in the array

Calculating the index in the array is the same as calculating the size of the array. But instead of calculating the maximum number, replace it with any integer that needs to be stored. One difference, however, is that the index evaluated in the array is rounded down, so you need to modify the implementation of the calcElemIndex method. The code is changed as follows:


#!/usr/bin/env python
#coding: utf8
class Bitmap(object):
 def __init__(self, max):
  self.size  = self.calcElemIndex(max, True)
  self.array = [0 for i in range(self.size)]
 def calcElemIndex(self, num, up=False):
  '''up for True Is rounded up ,  Otherwise, it is rounded down '''
  if up:
   return int((num + 31 - 1) / 31) # Take up the whole 
  return num / 31
if __name__ == '__main__':
 bitmap = Bitmap(90)
 print ' An array of need  %d  An element. ' % bitmap.size
 print '47  Should be stored in the  %d  On the array elements. ' % bitmap.calcElemIndex(47)


$ python bitmap.py
 An array of need  3  An element. 
47  Should be stored in the  1  On the array elements.

So it's important to get the largest integer, or you might create an array that doesn't hold some data.

Evaluates a bit index in an array element

A bit index in an array element can be obtained by a modulus operation. The bit index can be obtained by modulating the integer to be stored with 31. The code is changed as follows:


#!/usr/bin/env python
#coding: utf8
class Bitmap(object):
 def __init__(self, max):
  self.size  = self.calcElemIndex(max, True)
  self.array = [0 for i in range(self.size)]
 def calcElemIndex(self, num, up=False):
  '''up for True Is rounded up ,  Otherwise, it is rounded down '''
  if up:
   return int((num + 31 - 1) / 31) # Take up the whole 
  return num / 31
 def calcBitIndex(self, num):
  return num % 31
if __name__ == '__main__':
 bitmap = Bitmap(90)
 print ' An array of need  %d  An element. ' % bitmap.size
 print '47  Should be stored in the  %d  On the array elements. ' % bitmap.calcElemIndex(47)
 print '47  Should be stored in the  %d  Number of array elements  %d  The throne. ' % (bitmap.calcElemIndex(47), bitmap.calcBitIndex(47),)

Don't forget to start at the 0th place.

Buy 1 operation

The binary bit defaults to 0, and a location of 1 indicates that data is stored in that bit. The code is changed as follows:


#!/usr/bin/env python
#coding: utf8
class Bitmap(object):
 def __init__(self, max):
  self.size  = self.calcElemIndex(max, True)
  self.array = [0 for i in range(self.size)]
 def calcElemIndex(self, num, up=False):
  '''up for True Is rounded up ,  Otherwise, it is rounded down '''
  if up:
   return int((num + 31 - 1) / 31) # Take up the whole 
  return num / 31
 def calcBitIndex(self, num):
  return num % 31
 def set(self, num):
  elemIndex = self.calcElemIndex(num)
  byteIndex = self.calcBitIndex(num)
  elem      = self.array[elemIndex]
  self.array[elemIndex] = elem | (1 << byteIndex)
if __name__ == '__main__':
 bitmap = Bitmap(90)
 bitmap.set(0)
 print bitmap.array

Since we start at the 0th bit, if we need to store a 0, we need to place the 0th bit at 1.

Qing 0 operation

To discard stored data at a location of 0. The code is as follows:


#!/usr/bin/env python
#coding: utf8
class Bitmap(object):
 def __init__(self, max):
  self.size  = self.calcElemIndex(max, True)
  self.array = [0 for i in range(self.size)]
 def calcElemIndex(self, num, up=False):
  '''up for True Is rounded up ,  Otherwise, it is rounded down '''
  if up:
   return int((num + 31 - 1) / 31) # Take up the whole 
  return num / 31
 def calcBitIndex(self, num):
  return num % 31
 def set(self, num):
  elemIndex = self.calcElemIndex(num)
  byteIndex = self.calcBitIndex(num)
  elem      = self.array[elemIndex]
  self.array[elemIndex] = elem | (1 << byteIndex)
 def clean(self, i):
  elemIndex = self.calcElemIndex(i)
  byteIndex = self.calcBitIndex(i)
  elem      = self.array[elemIndex]
  self.array[elemIndex] = elem & (~(1 << byteIndex))
if __name__ == '__main__':
 bitmap = Bitmap(87)
 bitmap.set(0)
 bitmap.set(34)
 print bitmap.array
 bitmap.clean(0)
 print bitmap.array
 bitmap.clean(34)
 print bitmap.array

Clear 0 and set 1 are reciprocal operations.

Test if someone is 1

The purpose of determining whether someone is 1 is to fetch the previously stored data. The code is as follows:


#!/usr/bin/env python
#coding: utf8
class Bitmap(object):
 def __init__(self, max):
  self.size  = self.calcElemIndex(max, True)
  self.array = [0 for i in range(self.size)]
 def calcElemIndex(self, num, up=False):
  '''up for True Is rounded up ,  Otherwise, it is rounded down '''
  if up:
   return int((num + 31 - 1) / 31) # Take up the whole 
  return num / 31
 def calcBitIndex(self, num):
  return num % 31
 def set(self, num):
  elemIndex = self.calcElemIndex(num)
  byteIndex = self.calcBitIndex(num)
  elem      = self.array[elemIndex]
  self.array[elemIndex] = elem | (1 << byteIndex)
 def clean(self, i):
  elemIndex = self.calcElemIndex(i)
  byteIndex = self.calcBitIndex(i)
  elem      = self.array[elemIndex]
  self.array[elemIndex] = elem & (~(1 << byteIndex))
 def test(self, i):
  elemIndex = self.calcElemIndex(i)
  byteIndex = self.calcBitIndex(i)
  if self.array[elemIndex] & (1 << byteIndex):
   return True
  return False
if __name__ == '__main__':
 bitmap = Bitmap(90)
 bitmap.set(0)
 print bitmap.array
 print bitmap.test(0)
 bitmap.set(1)
 print bitmap.test(1)
 print bitmap.test(2)
 bitmap.clean(1)
 print bitmap.test(1)


$ python bitmap.py
[1, 0, 0]
True
True
False
False

Next, implement a sort that does not repeat the array. Given that the largest element of an unordered array of non-negative integers is 879, sort it naturally. The code is as follows:


#!/usr/bin/env python
#coding: utf8
class Bitmap(object):
 def __init__(self, max):
  self.size  = self.calcElemIndex(max, True)
  self.array = [0 for i in range(self.size)]
 def calcElemIndex(self, num, up=False):
  '''up for True Is rounded up ,  Otherwise, it is rounded down '''
  if up:
   return int((num + 31 - 1) / 31) # Take up the whole 
  return num / 31
 def calcBitIndex(self, num):
  return num % 31
 def set(self, num):
  elemIndex = self.calcElemIndex(num)
  byteIndex = self.calcBitIndex(num)
  elem      = self.array[elemIndex]
  self.array[elemIndex] = elem | (1 << byteIndex)
 def clean(self, i):
  elemIndex = self.calcElemIndex(i)
  byteIndex = self.calcBitIndex(i)
  elem      = self.array[elemIndex]
  self.array[elemIndex] = elem & (~(1 << byteIndex))
 def test(self, i):
  elemIndex = self.calcElemIndex(i)
  byteIndex = self.calcBitIndex(i)
  if self.array[elemIndex] & (1 << byteIndex):
   return True
  return False
if __name__ == '__main__':
 MAX = 879
 suffle_array = [45, 2, 78, 35, 67, 90, 879, 0, 340, 123, 46]
 result       = []
 bitmap = Bitmap(MAX)
 for num in suffle_array:
  bitmap.set(num)

 for i in range(MAX + 1):
  if bitmap.test(i):
   result.append(i)
 print ' The original array is :    %s' % suffle_array
 print ' The sorted array is : %s' % result

With bitmap implemented, it is very easy to use it for sorting. Other languages can implement bitmap as well, but for statically typed languages, such as C/Golang, because you can declare unsigned integers directly, the available bits become 32 bits, just change the 31 in the code above to 32, please note.