Python implements string matching algorithm code example

  • 2020-06-15 09:20:06
  • OfStack

Problems with string matching

In Python, there are two ways to find the existence of a substring in a long string: 1 is the find() function of str, find() function only returns the starting position of the substring matched, if not, -1; 2 is the findall function of the re module, which returns all matched substrings.

But if you use the findall function, you need to be aware of the special characters that exist in the string

Brute force string matching:

Align the pattern with m (pattern length) characters before the text, then match each character pair from left to right until all matches or a mismatched character is encountered. In the latter case, the mode moves 1 bit to the right.

The code is as follows:


def string_match(string, sub_str): 
 #  Brute force string matching  
 for i in range(len(string)-len(sub_str)+1): 
  index = i  # index Point to the 1 Three characters to be compared  
  for j in range(len(sub_str)): 
   if string[index] == sub_str[j]: 
    index += 1 
   else: 
    break 
   if index-i == len(sub_str): 
    return i 
 return -1 

if __name__ == "__main__": 
 print(string_match("adbcbdc", "dc")) 

The worst case, the algorithm belongs to Θ (nm), in fact, the average efficiency of the algorithm is better than the worst efficiency. In fact when find random text, it belongs to the efficiency of linear Θ (n).

Horspool algorithm:

The Horsepool algorithm is a simplified version of the Boyer-ES31en algorithm, which is also a typical example of one space for one time. The algorithm allocates the beginning characters of the pattern P to the text T, compares the last character of the pattern, and moves the pattern backward if the comparison attempt fails. The comparison is from right to left on each try.

In the brute-force algorithm, each movement of the pattern is 1 character. The core idea of Horspool algorithm is to improve the movement of the pattern matching window by using space to exchange time. Unlike a brute-force algorithm, the pattern matches from right to left and is stored in the table by pre-calculating the distance of each move.

The code is as follows:


__author__ = 'Wang' 
from collections import defaultdict 
def shift_table(pattern): 
 #  generate  Horspool  Algorithm's moving table  
 #  The current detection character is c , the mode length is m 
 #  If the current c Not included before the pattern m-1 In characters, the length of the move mode m 
 #  In other cases move the one on the far right c To the end of the pattern 1 The distance between two characters  
 table = defaultdict(lambda: len(pattern)) 
 for index in range(0, len(pattern)-1): 
  table[pattern[index]] = len(pattern) - 1 - index 
 return table 
def horspool_match(pattern, text): 
 #  implementation  horspool  String matching algorithm  
 #  Match successful, return pattern in text At the beginning of; Otherwise returns  -1 
 table = shift_table(pattern) 
 index = len(pattern) - 1 
 while index <= len(text) - 1: 
  print("start matching at", index) 
  match_count = 0 
  while match_count < len(pattern) and pattern[len(pattern)-1-match_count] == text[index-match_count]: 
   match_count += 1 
  if match_count == len(pattern): 
   return index-match_count+1 
  else: 
   index += table[text[index]] 
 return -1 

if __name__ == "__main__": 
 print(horspool_match("barber", "jim_saw_me_in_a_barbershopp")) 

Obviously, the worst Horspool algorithm efficiency belong to Θ (nm). In looking for the random text, it belongs to the efficiency of linear Θ (n). Although the efficiency types are the same, the Horspool algorithm is, on average, much faster than the brute force algorithm.

conclusion

That's it for the Python string matching algorithm code example. Those who are interested can continue to see this site:

Python implementation scheduling algorithm code details

Graph traversal of Python algorithm

Python programming to implement ant colony algorithm details

If there is any deficiency, please let me know. Thank you for your support!


Related articles: