Java regular multi string matching substitution

2020-04-01 01:31:14
OfStack

Java is also relatively simple to use:
1. Compile the literal value of the regular expression to get the corresponding Pattern object;

2. Create Matcher matching the given input with this pattern;

3. Operations are performed through matcher objects, which have a wealth of methods, and more powerful combinations of methods.


public static void main(String[] args) { 
    //The data source of the replaced keyword
    Map<String,String> tokens = new HashMap<String,String>(); 
    tokens.put("cat", "Garfield"); 
    tokens.put("beverage", "coffee"); 

    //Matches a string similar to a velocity rule
    String template = "${cat} really needs some ${beverage}."; 
    //Generates regular expressions that match patterns
    String patternString = "\$\{(" + StringUtils.join(tokens.keySet(), "|") + ")\}"; 

    Pattern pattern = Pattern.compile(patternString); 
    Matcher matcher = pattern.matcher(template); 

    //Two methods: appendReplacement and appendTail
    StringBuffer sb = new StringBuffer(); 
    while(matcher.find()) { 
        matcher.appendReplacement(sb, tokens.get(matcher.group(1))); 
    } 
    matcher.appendTail(sb); 

    //out: Garfield really needs some coffee. 
    System.out.println(sb.toString()); 

    //For special meaning characters "","$", use Matcher. QuoteReplacement to eliminate special meaning
    matcher.reset(); 
    //out: cat really needs some beverage. 
    System.out.println(matcher.replaceAll("$1")); 
    //out: $1 really needs some $1. 
    System.out.println(matcher.replaceAll(Matcher.quoteReplacement("$1"))); 

    //Get the prefix name of the mailbox. In addition, in fact, the verification of the regular mailbox variety, according to their own needs to write the corresponding regular is king
    String emailPattern = "^([a-z0-9_\.\-\+]+)@([\da-z\.\-]+)\.([a-z\.]{2,6})$"; 
    pattern = Pattern.compile(emailPattern); 
    matcher = pattern.matcher("test@qq.com"); 
    //Verify email
    System.out.println(matcher.find()); 
    //Gets the mailbox name & NBSP; before the @ symbol; Out: the test
    System.out.println(matcher.replaceAll("$1")); 

    //Get a match
    String temp = "<meta-data android:name="appid" android:value="joy"></meta-data>"; 
    pattern = Pattern.compile("android:(name|value)="(.+?)""); 
    matcher = pattern.matcher(temp); 
    while(matcher.find()) { 
        //out: appid, joy 
        System.out.println(matcher.group(2)); 
    } 
}

Some always forget the basics

[...]. Any character within a parenthesis

[^...]. Any character not in parentheses

. Any character other than a newline, equivalent to [^\n]

\w any single character, equivalent to [a-za-z0-9]

\W any non-single-word character, equivalent to [^ a-za-z0-9]

\s any blank character, equivalent to [\ t \ n \ r \ f \ v]

\S any non-blank character, equivalent to [^\ t \ n \ r \ f \ v]

\d any number, equivalent to [0-9]

\D any character other than a number, equivalent to [^0-9]

[\b] a backspace direct quantity (special case)

{n, m} matches the previous term at least n times, but not more than m times

{n,} matches the previous term n times, or many times

{n} matches the previous term exactly n times

? Match the previous item 0 or 1 times, that is, the previous item is optional.

+ matches the previous term 1 or more times, equivalent to {1,}

* matches the previous item 0 or more times.

| select. Matches either the subexpression to the left or the subexpression to the right of the symbol

(...). Group. Divide several items into one unit. This unit can be divided by *, +,? And |, and you can remember the characters that match the group for later reference

\n matches the character matched by the NTH group. The group is a subexpression in parentheses (possibly nested). The group number is the number of left brackets counted from left to right

^ matches the beginning of a character, and in multi-line retrieval matches the beginning of a line

$matches the end of a character, and in a multi-line search, the end of a line

\b matches the boundary of a word. In short, the position between the characters \w and \w (note :[\b] matches the backspace)

\B matches the character of the boundary of the non-word

digression

Email verification, validation email before, online search a regular in their program with inside, actually this is wrong, validation of different companies for email format is different, for example, in 163 and registered qq mailbox, they require a different format, so search a regular expression to set all of the mail format is wrong, in line with their own needs regular is right.