python realizes finding the longest loop substring length

2020-07-21 08:57:38
OfStack

Given a string, find its longest textual substring length, such as the input string '35534321'. Its longest textual substring is '3553', so return 4.

The easiest way to do this is to enumerate all the substrings, and then 11 to determine if it is palindrome, returning the longest length of the substring. Needless to say, enumeration implementation time is intolerable. So is there an efficient way to find an echo substring? The answer, of course, is the central extension method, which selects an element as the center and then diverges outward to find the largest loop substring with that element as the center. However, a new problem arises. The length of the hui substring can be either cardinal or even, and there is no central element for the even length hui substring. Is there a way to classify even - and odd-length substrings as class 1, using central extension? It's the manacher algorithm that inserts special characters into the original string, such as '#3#5#5#3#4#3#2#1#'. Now we can just use the center extension for the new string, and the radius is the length of the substring.

Now that the implementation idea is clear, first transform the string '35534321' -- > '#3#5#5#3#4#3#2#1#', and then find the length of the longest loopback substring centered around each element. The python implementation is given below:


#!/usr/bin/python
# -*- coding: utf-8 -*-

def max_substr(string):
  s_list = [s for s in string]
  string = '#' + '#'.join(s_list) + '#'
  max_length = 0
  length = len(string)
  for index in range(0, length):
    r_length = get_length(string, index)
    if max_length < r_length:
      max_length = r_length
  return max_length

def get_length(string, index):
  #  Circulation and index For the center of the longest back text string 
  length = 0
  r_ = len(string)
  for i in range(1,index+1):
    if index+i < r_ and string[index-i] == string[index+i]:
      length += 1
    else:
      break
  return length

if __name__ == "__main__":
  result = max_substr("35534321")
  print result

The function has been realized, and there is no bug after testing, but let's calm down and think 1. Is there room for optimization for the current solution? According to the present solution, we have found the maximum textual substring of each element center in '35534321'. When we go to '4', we already know that the current length of the longest context substring max_length is 4, so we figured out that the 4-centered length of the longest context substring is 3, it's smaller than max_length, so we don't update max_length. In other words, we did a poor job of calculating the maximum length of the 4-centered loop string. Since this is the place where we want to optimize, the longest back to an element and no more than max_length text string length, we don't need to calculate its longest back to the text string in traversing a new element, we must first determine to it as the center of the length of the text string can transcend max_length, if not more than, keep lixia 1 elements. The following is the optimized implementation:


#!/usr/bin/python
# -*- coding: utf-8 -*-

def max_substr(string):
  s_list = [s for s in string]
  string = '#' + '#'.join(s_list) + '#'
  max_length = 0
  length = len(string)
  for index in range(0, length):
    r_length = get_length2(string, index, max_length)
    if max_length < r_length:
      max_length = r_length
  return max_length

def get_length2(string, index, max_length):
  #  Find the longest string based on the known longest string 
  # 1. center + The maximum radius is beyond the string range , return
  r_ = len(string)
  if index + max_length > r_:
    return max_length

  # 2. You can't go beyond the maximum radius , return
  l_string = string[index - max_length + 1 : index + 1]
  r_string = string[index : index + max_length]
  if l_string != r_string[::-1]:
    return max_length

  # 3. Calculate the new maximum radius 
  result = max_length
  for i in range(max_length, r_):
    if index-i >= 0 and index+i < r_ and string[index-i] == string[index+i]:
      result += 1
    else:
      break
  return result - 1

if __name__ == "__main__":
  result = max_substr("35534321")
  print result

So how much does the speed increase? Taking 1000 strings of '1' as an example, the algorithm execution time before optimization is 0.239018201828 and after optimization is 0.0180191993713, which increases the speed by about 10 times


/usr/bin/python /Users/hakuippei/PycharmProjects/untitled/the_method_of_programming.py
0.239018201828
0.0180191993713

Here is another example:


#!usr/bin/env python
#encoding:utf-8

'''
__Author__: More cold city 
 Function: Find the longest subsequence 
'''

def slice_window(one_str,w=1):
  '''
   Sliding window function 
  '''
  res_list=[]
  for i in range(0,len(one_str)-w+1):
    res_list.append(one_str[i:i+w])
  return res_list


def is_huiwen(one_str_list): 
  '''
   The input 1 A list of strings to determine if they are palindromes  
  ''' 
  if len(one_str_list)==1: 
    return True  
  else: 
    half=len(one_str_list)/2 
    if len(one_str_list)%2==0: 
      first_list=one_str_list[:half] 
      second_list=one_str_list[half:] 
    else: 
      first_list=one_str_list[:half] 
      second_list=one_str_list[half+1:] 
    if first_list==second_list[::-1]: 
      return True  
    else: 
      return False 


def find_longest_sub_palindrome_str(one_str):
  '''
   The main function , Look for the longest contextual subsequence 
  '''
  all_sub=[]
  for i in range(1,len(one_str)):
    all_sub+=slice_window(one_str,i)
  all_sub.append(one_str)
  new_list=[]
  for one in all_sub:
    if is_huiwen(list(one)):
      new_list.append(one)
  new_list.sort(lambda x,y:cmp(len(x),len(y)),reverse=True)
  print new_list[0]


if __name__ == '__main__':
  one_str_list=['uabcdcbaop','abcba','dmfdkgbbfdlg','mnfkabcbadk']
  for one_str in one_str_list:
    find_longest_sub_palindrome_str(one_str)

The results are as follows:


abcdcba 
abcba 
bb 
abcba 
[Finished in 0.3s]