In depth analysis of regular expressions in Java
- 2020-04-01 01:43:37
- OfStack
Regex (regular expression) : RegularExpressions (instead of StringTokenizer); String processing edge; Popular on Unix, perl is better off using regex.
Mainly used in string matching, lookup, and substitution. For example, matching IP (range less than 256) is easy to do with regularization; Pull out a large number of email addresses from web pages to send spam; Pull links from web pages. Contains Matcher (the result of matching a string with a pattern) and pattern.
System.out.println("abc".matches("..."));//Each "." represents a character
System.out.println("ab54564654sbg48746bshj".replaceAll("[0-9]", "-"));//Each "." represents a character
Second,
Pattern p = Pattern.compile("[a-z]{3}");
Matcher m = p.matcher("ggs");//Creates an matcher that matches the given input to this pattern. Inside is actually creating a priority state automaton (compilation principle)
//The String to match in matcher and matches is actually a CharSequence (interface), but String implements that interface and is polymorphic
System.out.println(m.matches());//If it's "GGSS" it doesn't match
//Matches ("[a-z]{3}"), but the above has the advantage of being at least more efficient, and Pattern and Matcher provide a lot of functionality
3. "Meta Character" in regex ". * + "; CTRL + shift + "/" for comment and "\" for uncomment.
"a".matches(".");//True,"." means any character, even Chinese characters
"aa".matches("aa");//True, that is, a regular string can also be used as a regular expression
"aaaa".matches("a*");
"".matches("a*");//true
"aaa".matches("a?");//True once or zero times
"".matches("a?");//true
"a".matches("a?");//true
"544848154564113".matches("\d{3,100}");//true
//This is the simplest IP judgment, but if more than 255 you will not be able to judge
"192.168.0.aaa".matches("\d{1,3}\.\d{1,3}\.\d{1,3}\d{1,3}");
"192".matches("[0-2][0-9][0-9]");
Iv. [ABC] means matching any character; [^ ABC] represents a letter other than ABC (must still be a letter, or false if the string is empty); [a-za-z] is equivalent to "[a-z]|[a-z] "; [A-Z&&[ABS]] means any capital letter of ABS.
//It turns out that there's no difference between | and | and |, and ampersand and ampersand, and I don't know if that's the right way to think about it
System.out.println("C".matches("[A-Z&&[ABS]]"));//false
System.out.println("C".matches("[A-Z&[ABS]]"));//true
System.out.println("A".matches("[A-Z&&[ABS]]"));//true
System.out.println("A".matches("[A-Z&[ABS]]"));//true
System.out.println("C".matches("[A-Z|[ABS]]"));//true
System.out.println("C".matches("[A-Z||[ABS]]"));//true
5. \w word character: [a-za-z_0-9] for user name matching; \s white space character: [\t\n\x0B\f\r]; \S non-white space character: [^\ S]; \W non-word character: [^\ W].
" ntr".matches("\s{4}");//true
" ".matches("\S");//false
"a_8".matches("\w{3}");//true
//"+" means once or more times
"abc888&^%".matches("[a-z]{1,3}\d+[&^#%]+");//true
System.out.println("\".matches("\\"));//true
POSIX character class (us-ascii only)
p{Lower} Lowercase alphabetic characters: [a-z] ;p{Upper} Capital letter characters: [A-Z] ;p{ASCII} all ASCII : [x00-x7F] ;p{Alpha} Alphabetic characters: [p{Lower}p{Upper}] ;p{Digit} Decimal digits: [0-9] .
Boundary matcher
^ the beginning of the line
End of $line
\b word boundary
\B nonword boundary
\A at the beginning of the input
A matching ending on \G
\ the end of the Z input, used only for the last terminator (if any)
End of \z input
"hello world".matches("^h.*");//^ the beginning of the line
"hello world".matches(".*ld$");//End of $line
"hello world".matches("^h[a-z]{1,3}o\b.*");//b word boundary
"helloworld".matches("^h[a-z]{1,3}o\b.*");
" n".matches("^[\s&&[^\n]]*\n$");//Determines a blank line. A blank line begins with a blank character
8. You can also use m.sitart () and m.etnd () under the find method to return the next starting and ending position; If you can't find it, you make a mistake.
Pattern p = Pattern.compile("\d{3,5}");
String s = "133-34444-333-00";
Matcher m = p.matcher(s);
m.matches();//Matches matches all strings
m.reset();
m.find();
m.find();
m.find();//Try to find the next child of the input sequence that matches the pattern
m.find();
m.lookingAt();
m.lookingAt();
m.lookingAt();
m.lookingAt();
9. String substitution
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestRegexReplacement {
public static void main(String[] args) {
Pattern p = Pattern.compile("java",Pattern.CASE_INSENSITIVE);//The following parameter is "case insensitive", which means "case insensitive".
Matcher m = p.matcher("Java java hxsyl Ilovejava java JaVaAcmer");
while(m.find()) {
System.out.println(m.group());//M.group outputs all Java (ignoring case)
}
String s = m.replaceAll("Java");//String also has this method
System.out.println(s);
m.reset();//Be sure to add, because find and matcher interact
StringBuffer sb = new StringBuffer();
int i = 0;
while(m.find()) {
i++;
//I &1 cannot be directly written as a Boolean
if((i&1)==1) {
m.appendReplacement(sb, "Java");
}else {
m.appendReplacement(sb, "java");
}
}
m.appendTail(sb);//Add the remaining string after the last Java found
System.out.println(sb);//Only Acmer is output without reset
}
}
Ten, grouping
Pattern p = Pattern.compile("(\d{3,5})([a-z]{2})");
String s = "123aaa-77878bb-646dd-00";
Matcher m = p.matcher(s);
while(m.find()) {
System.out.println(m.group());
System.out.println(m.group(1));//Output the matching Numbers for each pair
System.out.println(m.group(2));//Output each matching pair of letters
}
Eleven, crawl the web in the email
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class EmailSpider {
public static void main(String[] args) {
// TODO Auto-generated method stub
try {
BufferedReader br = new BufferedReader(new FileReader("F:\regex.html"));
String line = "";
try {
while((line=br.readLine())!=null) {
solve(line);
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
private static void solve(String line) {
// TODO Auto-generated method stub
//A regular expression can't go wrong if it doesn't do its job, because it's a string
Pattern p = Pattern.compile("[\w[.-]]+@[\w[.-]]+\.[\w]+");
Matcher m = p.matcher(line);
while(m.find()) {
System.out.println(m.group());
}
}
}
Code statistics
View Code
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
public class CoderCount {
static long normalLines = 0;
static long commentLines = 0;
static long whiteLines = 0;
public static void main(String[] args) {
File f = new File("D:\share\src");
File[] codeFiles = f.listFiles();
for(File child : codeFiles){
if(child.getName().matches(".*\.java$")) {
solve(child);
}
}
System.out.println("normalLines:" + normalLines);
System.out.println("commentLines:" + commentLines);
System.out.println("whiteLines:" + whiteLines);
}
private static void solve(File f) {
BufferedReader br = null;
boolean comment = false;
try {
br = new BufferedReader(new FileReader(f));
String line = "";
while((line = br.readLine()) != null) {
/*
* //Some comment lines have a TAB in front of them
* Not written in the book readLine after
* The last line will have a null pointer
*/
line = line.trim();
//After readLine has read the string, the newline is removed
if(line.matches("^[\s&&[^\n]]*$")) {
whiteLines ++;
} else if (line.startsWith("")) {
commentLines ++;
comment = true;
} else if (line.startsWith("")) {
commentLines ++;
} else if (true == comment) {
commentLines ++;
if(line.endsWith("*/")) {
comment = false;
}
} else if (line.startsWith("//")) {
commentLines ++;
} else {
normalLines ++;
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if(br != null) {
try {
br.close();
br = null;
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
13, Quantifiers
Including? * +; By default, all Greedy and Reluctant and Possessive (exclusive).
//The grouping was added to get a better view
Pattern p = Pattern.compile("(.{3,10})+[0-9]");
String s = "aaaa5bbbb6";//The length is ten
Matcher m = p.matcher(s);
if(m.find()) {
System.out.println(m.start() + "----" + m.end());
}else {
System.put.println("Not match!");
}
Xiv. Supplement (non-capture group)
//The meaning of a non-capture group is the opposite of the literal meaning, which means capture if it matches
Pattern p = Pattern.compile("(?=a).{3}");
String s = "444a66b";
Matcher m = p.matcher(s);
while(m.find()) {
System.out.println(m.group());
}
Back Reference
Pattern p = Pattern.compile("(\d\d)\1");
String s = "1212";
Matcher m = p.matcher(s);
System.out.println(m.matches());
16. Short for flags
"." does not match newline, just remember CASE_INSENSITIVE, shorthand for "by embedded flag expression & PI; (? I) case-insensitive matching can also be enabled ".