Pattern matching algorithm of deep string of general algorithm and KMP algorithm

  • 2020-04-02 01:02:04
  • OfStack

The string positioning operation, often called pattern matching of strings, is one of the most important operations in various processing systems.
The most naive algorithm of pattern matching is backtracking method, that is, the pattern string matches the main string character by character. When the pattern string does not match the main string, the main string backtracks to the next position where the matching with the pattern string begins. The pattern string backtracks to the first position and continues to match. The time complexity of the algorithm is O (m*n), and the algorithm is as follows:

//For the pattern matching algorithm of the naive string, S is the main string and T is the pattern string, that is, to find whether there is the same string with T in S
int Index(char *S, char *T, int pos)//Pos records which bit to start matching can be directly replaced by 0
{
 int i=pos, j=0; 
 while(i <strlen(S) && j <strlen(T))//Ensure that the length of the string is not exceeded
 {
  if (S[i] == T[j])
      { ++i; ++j;} //If they are the same, continue the backward comparison
  else 
      {i = i-j+1; j =0;} //If not, go back and look it up again
 }
 if (j == strlen(T))
  return i-strlen(T); //If the match is successful, the index at the same starting position in S as the string T is returned
 else return 0; //If the match is unsuccessful, return 0
}

Time complexity of O (m * n) is a little big, so people found the KMP algorithm, the core idea is: when a mismatch occurs, the string is not back, pattern string back to the position of "appropriate", where appropriate, only related with the pattern string, so we can calculate the pattern of each character in the string, when mismatch occurred, should go back to where. The overall time complexity of the algorithm is O(m+m).
The algorithm is as follows:

void GetNext(char* T, int *next)
{
 int i=1,j=0; 
 next[1]=0;
 while( i < strlen(T) )
 { 
  if (j == 0 || T[i] == T[j])
  {
    ++i; ++j; 
    next[i] = j;
  } 
  else j = next[j];
 }
} 
int KMP(char* S, char* T, int pos)
{
 int i = pos, j = 1;
 while (i)
 {
  if (S[i] == T[j])
  { 
   ++ i;  ++ j;
  } 
  else 
   j = next[j]; 
 }
 if (j > strlen(T)) 
  return i-T[0];
 else 
  return 0; 
} 

The operation of next is not optimal, because he did not consider the case of aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab, so there will be a large number of 1 in front, such algorithm complexity is no different from the original naive algorithm. So change it a little bit:

void GetNextEx(char *T, int *next)
{
 int i=1,j=0; next[1] = 0;
 while(i < strlen(T))
 {
  if (j == 0 || T[i] == T[j])
  {
   ++i; ++j;
   if (T[i] == T[j])
    next[i] = next[j];  //Reduce the number of retreats
   else   next[i] = j;  //Same as the algorithm above, next[I]=j
  }
  else j = next[j]; 
 } 
} 


Related articles: