Python binary search and bisect module details

  • 2020-05-24 05:43:07
  • OfStack

preface

In fact, the internal implementation of Python's list (list) is an array, that is, a linear table. Looking for elements in a list can be used list.index() Method with a time complexity of O(n). For large data, 2 points of search can be used for optimization.

2 points search requires the object to be in order, the basic principle is as follows:

1. Starting from the middle element of the array, if the middle element happens to be the element to be looked for, the search process ends;

2. If a particular element of 1 is greater than or less than the middle element, look in the half of the array that is greater than or less than the middle element, and compare it with the first element from the middle element.

3. If the array is empty in step 1, it cannot be found.

Two-point search is also called binary search, and the algorithm reduces the search range by one and a half for each comparison, and its time complexity is O(logn).

We respectively use recursion and loop to achieve 2-point search:


def binary_search_recursion(lst, value, low, high): 
 if high < low: 
 return None
 mid = (low + high) / 2 
 if lst[mid] > value: 
 return binary_search_recursion(lst, value, low, mid-1) 
 elif lst[mid] < value: 
 return binary_search_recursion(lst, value, mid+1, high) 
 else: 
 return mid 

def binary_search_loop(lst,value): 
 low, high = 0, len(lst)-1 
 while low <= high: 
 mid = (low + high) / 2 
 if lst[mid] < value: 
 low = mid + 1 
 elif lst[mid] > value: 
 high = mid - 1
 else:
 return mid 
 return None

Next, perform a performance test on the two implementations:


if __name__ == "__main__":
 import random
 lst = [random.randint(0, 10000) for _ in xrange(100000)]
 lst.sort()

 def test_recursion():
 binary_search_recursion(lst, 999, 0, len(lst)-1)

 def test_loop():
 binary_search_loop(lst, 999)

 import timeit
 t1 = timeit.Timer("test_recursion()", setup="from __main__ import test_recursion")
 t2 = timeit.Timer("test_loop()", setup="from __main__ import test_loop")

 print "Recursion:", t1.timeit()
 print "Loop:", t2.timeit()

The implementation results are as follows:


Recursion: 3.12596702576
Loop: 2.08254289627

You can see that loops are more efficient than recursion.

bisect module

Python has one bisect module for maintaining ordered lists. The bisect module implements an algorithm for inserting elements into an ordered list. In some cases, this is more efficient than repeatedly sorting a list or constructing a large list to sort again. Bisect is a two-point sorting method, which inserts an element into the proper location of an ordered list, eliminating the need to maintain an ordered list every time sort is called.

Here is a simple example:


import bisect
import random

random.seed(1)

print'New Pos Contents'
print'--- --- --------'

l = []
for i in range(1, 15):
 r = random.randint(1, 100)
 position = bisect.bisect(l, r)
 bisect.insort(l, r)
 print'%3d %3d' % (r, position), l

Output results:


New Pos Contents
--- --- --------
 14 0 [14]
 85 1 [14, 85]
 77 1 [14, 77, 85]
 26 1 [14, 26, 77, 85]
 50 2 [14, 26, 50, 77, 85]
 45 2 [14, 26, 45, 50, 77, 85]
 66 4 [14, 26, 45, 50, 66, 77, 85]
 79 6 [14, 26, 45, 50, 66, 77, 79, 85]
 10 0 [10, 14, 26, 45, 50, 66, 77, 79, 85]
 3 0 [3, 10, 14, 26, 45, 50, 66, 77, 79, 85]
 84 9 [3, 10, 14, 26, 45, 50, 66, 77, 79, 84, 85]
 44 4 [3, 10, 14, 26, 44, 45, 50, 66, 77, 79, 84, 85]
 77 9 [3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85]
 1 0 [1, 3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85]

The functions provided by the Bisect module are:

bisect.bisect_left(a,x, lo=0, hi=len(a)) :

Find the index that inserted x into the ordered list a. lo and hi are used to specify the interval of the list, and the default is to use the entire list. If x already exists, insert to its left. The return value is index.

bisect.bisect_right(a,x, lo=0, hi=len(a))

bisect.bisect(a, x,lo=0, hi=len(a)) :

These two functions are similar to bisect_left, but if x already exists, insert to its right.

bisect.insort_left(a,x, lo=0, hi=len(a)) :

Insert x into the ordered list a. This is the same as a.insert (bisect.bisect_left (a,x, lo, hi), x).

bisect.insort_right(a,x, lo=0, hi=len(a))

bisect.insort(a, x,lo=0, hi=len(a)) :

Similar to insort_left, but if x already exists, insert to its right.

The functions provided by the Bisect module can be divided into two categories: bisect* is only used to look up index, without actual insertion; insort* is used for actual insertion.

A typical application of this module is to calculate score grades:


def grade(score,breakpoints=[60, 70, 80, 90], grades='FDCBA'):
 i = bisect.bisect(breakpoints, score)
 return grades[i]

print [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]

Execution results:


['F', 'A', 'C', 'C', 'B', 'A', 'A']

Similarly, we can use bisect module to achieve 2-point lookup:


def binary_search_bisect(lst, x):
 from bisect import bisect_left
 i = bisect_left(lst, x)
 if i != len(lst) and lst[i] == x:
 return i
 return None

Let's test the performance of it with recursion and loop implementation of 2-point lookup:


Recursion: 4.00940990448
Loop: 2.6583480835
Bisect: 1.74922895432

You can see that it's a little bit faster than the loop implementation, about a half a step faster than the recursive implementation.

Python's famous data processing library, numpy, also has a function for 2-point lookups numpy.searchsorted , the usage is basically the same as bisect, except that if you want to insert to the right, you need to set the parameter side='right', for example:


>>> import numpy as np
>>> from bisect import bisect_left, bisect_right
>>> data = [2, 4, 7, 9]
>>> bisect_left(data, 4)
1
>>> np.searchsorted(data, 4)
1
>>> bisect_right(data, 4)
2
>>> np.searchsorted(data, 4, side='right')
2

So, let's compare the performance of 1 again:


if __name__ == "__main__":
 import random
 lst = [random.randint(0, 10000) for _ in xrange(100000)]
 lst.sort()

 def test_recursion():
 binary_search_recursion(lst, 999, 0, len(lst)-1)

 def test_loop():
 binary_search_loop(lst, 999)

 import timeit
 t1 = timeit.Timer("test_recursion()", setup="from __main__ import test_recursion")
 t2 = timeit.Timer("test_loop()", setup="from __main__ import test_loop")

 print "Recursion:", t1.timeit()
 print "Loop:", t2.timeit()
0

You can find numpy.searchsorted The efficiency is very low, which is not on the same order of magnitude as bisect. So searchsorted is not good for searching normal arrays, but it is for searching numpy.ndarray Is quite fast:


if __name__ == "__main__":
 import random
 lst = [random.randint(0, 10000) for _ in xrange(100000)]
 lst.sort()

 def test_recursion():
 binary_search_recursion(lst, 999, 0, len(lst)-1)

 def test_loop():
 binary_search_loop(lst, 999)

 import timeit
 t1 = timeit.Timer("test_recursion()", setup="from __main__ import test_recursion")
 t2 = timeit.Timer("test_loop()", setup="from __main__ import test_loop")

 print "Recursion:", t1.timeit()
 print "Loop:", t2.timeit()
1

numpy.searchsorted Multiple values can be searched simultaneously:


if __name__ == "__main__":
 import random
 lst = [random.randint(0, 10000) for _ in xrange(100000)]
 lst.sort()

 def test_recursion():
 binary_search_recursion(lst, 999, 0, len(lst)-1)

 def test_loop():
 binary_search_loop(lst, 999)

 import timeit
 t1 = timeit.Timer("test_recursion()", setup="from __main__ import test_recursion")
 t2 = timeit.Timer("test_loop()", setup="from __main__ import test_loop")

 print "Recursion:", t1.timeit()
 print "Loop:", t2.timeit()
2

conclusion

The above is the whole content of this article, I hope the content of this article can help you to learn or use python. If you have any questions, you can leave a message to communicate.


Related articles: