Pattern matching algorithm of deep string of general algorithm and KMP algorithm
- 2020-04-02 01:02:04
- OfStack
The string positioning operation, often called pattern matching of strings, is one of the most important operations in various processing systems.
The most naive algorithm of pattern matching is backtracking method, that is, the pattern string matches the main string character by character. When the pattern string does not match the main string, the main string backtracks to the next position where the matching with the pattern string begins. The pattern string backtracks to the first position and continues to match. The time complexity of the algorithm is O (m*n), and the algorithm is as follows:
Time complexity of O (m * n) is a little big, so people found the KMP algorithm, the core idea is: when a mismatch occurs, the string is not back, pattern string back to the position of "appropriate", where appropriate, only related with the pattern string, so we can calculate the pattern of each character in the string, when mismatch occurred, should go back to where. The overall time complexity of the algorithm is O(m+m).
The algorithm is as follows:
The operation of next is not optimal, because he did not consider the case of aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab, so there will be a large number of 1 in front, such algorithm complexity is no different from the original naive algorithm. So change it a little bit:
The most naive algorithm of pattern matching is backtracking method, that is, the pattern string matches the main string character by character. When the pattern string does not match the main string, the main string backtracks to the next position where the matching with the pattern string begins. The pattern string backtracks to the first position and continues to match. The time complexity of the algorithm is O (m*n), and the algorithm is as follows:
//For the pattern matching algorithm of the naive string, S is the main string and T is the pattern string, that is, to find whether there is the same string with T in S
int Index(char *S, char *T, int pos)//Pos records which bit to start matching can be directly replaced by 0
{
int i=pos, j=0;
while(i <strlen(S) && j <strlen(T))//Ensure that the length of the string is not exceeded
{
if (S[i] == T[j])
{ ++i; ++j;} //If they are the same, continue the backward comparison
else
{i = i-j+1; j =0;} //If not, go back and look it up again
}
if (j == strlen(T))
return i-strlen(T); //If the match is successful, the index at the same starting position in S as the string T is returned
else return 0; //If the match is unsuccessful, return 0
}
Time complexity of O (m * n) is a little big, so people found the KMP algorithm, the core idea is: when a mismatch occurs, the string is not back, pattern string back to the position of "appropriate", where appropriate, only related with the pattern string, so we can calculate the pattern of each character in the string, when mismatch occurred, should go back to where. The overall time complexity of the algorithm is O(m+m).
The algorithm is as follows:
void GetNext(char* T, int *next)
{
int i=1,j=0;
next[1]=0;
while( i < strlen(T) )
{
if (j == 0 || T[i] == T[j])
{
++i; ++j;
next[i] = j;
}
else j = next[j];
}
}
int KMP(char* S, char* T, int pos)
{
int i = pos, j = 1;
while (i)
{
if (S[i] == T[j])
{
++ i; ++ j;
}
else
j = next[j];
}
if (j > strlen(T))
return i-T[0];
else
return 0;
}
The operation of next is not optimal, because he did not consider the case of aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab, so there will be a large number of 1 in front, such algorithm complexity is no different from the original naive algorithm. So change it a little bit:
void GetNextEx(char *T, int *next)
{
int i=1,j=0; next[1] = 0;
while(i < strlen(T))
{
if (j == 0 || T[i] == T[j])
{
++i; ++j;
if (T[i] == T[j])
next[i] = next[j]; //Reduce the number of retreats
else next[i] = j; //Same as the algorithm above, next[I]=j
}
else j = next[j];
}
}