Java implementation of the Sunday algorithm sample sharing

  • 2020-04-01 02:45:12
  • OfStack

The two most famous string matching algorithms are KMP algorithm (knuth-moris-pratt) and BM algorithm (boyer-moore). Both algorithms have linear search time in the worst case. However, in practice, KMP algorithm is not much faster than the simplest C library function STRSTR (), while BM algorithm is often 3-5 times faster than KMP algorithm (not in practice). However, BM algorithm is not the fastest algorithm yet. Here, we introduce a search algorithm Sunday algorithm which is faster than BM algorithm.

The idea of the Sunday algorithm is very similar to the idea of bad characters in BM algorithm. The only difference is that after the Sunday algorithm fails to match, it takes the character of the last position of the corresponding part of the Pattern string in the target string to match the bad character. When a match fails, it is determined whether the characters at the current offset +Pattern string length +1 (assuming the position K) in the parent string exist in the Pattern string. If it exists, align the position with the character in the Pattern string, and match again from scratch. If it does not exist, move the Pattern string backwards, align it with the character at the parent k+1, and then match. Repeat the above operation until it is found, or the mother string is found. I wrote a small example to implement this algorithm.

In the code, two string matching algorithms are implemented, one is Sunday, the other is a common way to move a bit each time, the efficiency of the two is compared in the main function, are nanosecond level. The detailed steps of the algorithm have been commented in the code. With respect to BM algorithm, we will compare and analyze it next time when it is empty.


import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;

public class SundySearch {
    String text = null;
    String pattern = null;
    int currentPos = 0;
    
    List<Integer> matchedPosList = new LinkedList<Integer>();

    
    Map<Character, Integer> map = new HashMap<Character, Integer>();
    public SundySearch(String text, String pattern) {
        this.text = text;
        this.pattern = pattern;
        this.initMap();
    };
    
    private void initMap() {
        for (int i = 0; i < pattern.length(); i++) {
            this.map.put(pattern.charAt(i), i);
        }
    }
    
    public List<Integer> normalMatch() {
        //The match fails. Keep going
        if (!matchFromSpecialPos(currentPos)) {
            currentPos += 1;
            if ((text.length() - currentPos) < pattern.length()) {
                return matchedPosList;
            }
            normalMatch();
        } else {
            //Match successful, record the position
            matchedPosList.add(currentPos);
            currentPos += 1;
            normalMatch();
        }

        return matchedPosList;
    }
    
    public List<Integer> sundayMatch() {
        //If there is no match
        if (!matchFromSpecialPos(currentPos)) {
            //If the K character in the Text does not appear in the Pattern string, the entire Pattern string length is skipped
            if ((currentPos + pattern.length() + 1) < text.length()
                    && !map.containsKey(text.charAt(currentPos + pattern.length() + 1))) {
                currentPos += pattern.length();
            }else {
                //If the K character in the Text appears in the Pattern string, align the position of the K character in the Text with the position of the last occurrence of the K character in the Pattern string
                if ((currentPos + pattern.length() + 1) > text.length()) {
                    currentPos += 1;
                } else {
                    currentPos += pattern.length() - (Integer) map.get(text.charAt(currentPos + pattern.length()));
                }
            }

            //The match completes, returning the initial displacement of all successful matches
            if ((text.length() - currentPos) < pattern.length()) {
                return matchedPosList;
            }

            sundayMatch();
        }else {
            //Match successfully forward one bit and then match again
            matchedPosList.add(currentPos);
            currentPos += 1;
            sundayMatch();
        }
        return matchedPosList;
    }
    
    public boolean matchFromSpecialPos(int pos) {
        if ((text.length()-pos) < pattern.length()) {
            return false;
        }
        for (int i = 0; i < pattern.length(); i++) {
            if (text.charAt(pos + i) == pattern.charAt(i)) {
                if (i == (pattern.length()-1)) {
                    return true;
                }
                continue;
            } else {
                break;
            }
        }

        return false;
    }
    public static void main(String[] args) {
        SundySearch sundySearch = new SundySearch("hello  ...   adolf  adfsadfklf adf234masdfsdfdsfdsfdsffwerwrewrerwerwersdf2666sdflsdfk", "adf");

        long begin = System.nanoTime();
        System.out.println("NormalMatch:" + sundySearch.normalMatch());
        System.out.println("NormalMatch:" + (System.nanoTime() - begin));

        begin = System.nanoTime();
        System.out.println("SundayMatch:" + sundySearch.sundayMatch());
        System.out.println("SundayMatch:" + (System.nanoTime() - begin));
    }
}


Related articles: