Solution of Java Regular Expressions Failing to Match Results

  • 2021-08-21 20:42:35
  • OfStack

As shown below:


String str = "\uFEFF<?xml version=\"1.0\" encoding=\"utf-8\"?><Response xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"><Header ShouldRecordPerformanceTime=\"false\" Timestamp=\"2018-6-25 21:24:03\" RequestID=\"2c4d0b24-fd48-4a92-a2d8-c66793df2059\" ResultCode=\"Success\" AssemblyVersion=\"2.9.5.0\" RequestBodySize=\"0\" SerializeMode=\"Xml\" RouteStep=\"1\" Environment=\"pro\" /><SSPATResponse><Result>0</Result><FareDetail /><Price>0</Price><ErrCode>102</ErrCode><DetailInfo>Send:APPLOCK\n" +
    "Rev:\n" +
    " Available resource locked successfully , 60  If no instruction is entered within seconds, the resource will be Buk Withdraw \n" +
    "Send:IG\n" +
    "Rev:\n" +
    "NO PNR\n" +
    "Send:\n" +
    "SS:AA186/N/27JUN18/PEKORD/NN1;\n" +
    "Rev:\n" +
    "AA 186 N 27JUN PEKORD NN1 WL OPEN \n" +
    "UNABLE TO SELL.PLEASE CHECK THE AVAILABILITY WITH \"AV\" AGAIN\n" +
    "Send:IG\n" +
    "Rev:</DetailInfo><PatOfficeno>SHA717</PatOfficeno></SSPATResponse><ResponseStatus><Timestamp xmlns=\"http://soa.ctrip.com/common/types/v1\">2018-06-25T21:24:03.4535624+08:00</Timestamp><Ack xmlns=\"http://soa.ctrip.com/common/types/v1\">Success</Ack></ResponseStatus></Response>";
 
String regex = "<DetailInfo>((.|\\n")*?)</DetailInfo>";

str is the string to match (passed in) and regex is the regular expression

The purpose is to match < DetailInfo > Contents in the tag

It can be matched in local test, but not online.

It's really puzzling...

Later, I carefully compared the str imported from 1 offline with the str copied locally, and found a slight difference

The str line delimiter passed in on the line is\ r\ n, but after copying and pasting it locally, it becomes\ n

In my regular expression, I only match the case of\ n, so this bug appears

Remind yourself of the differences between systems. The line separator on win is\ n, and Linux is\ r\ n

In order to fit all environments, you can directly use System. lineSeparator () instead, of course, you can also write the expression like this (


<DetailInfo>((.|\\n|\\r\\n")*?)</DetailInfo>

Supplement: Java regular expression matching pits

Today, when judging whether there is a certain string in the string, we use String. matches (regex) directly, but we can't match it anyway. It is ok to use a lot of online regularization tools. After finding problems, summarize 1 to prevent stepping on the pit again.

1. Premises #

The way to judge whether a string is contained in a string in java:

1, #


String.matches(regex);

Reading the source code, it is found that this method essentially calls Pattern. matches (regex, str), and this method adjusts Pattern. compile (regex). matcher (input). matches () method, while Matcher. matches () method tries to match the whole region with the pattern. If the match is successful, more information can be obtained through start, end and group methods.

That is, this method appends $(regex $) before and after the expression, which is a full match for this string

Instead of matching only substrings, if you only want to match substrings, you need the expression to match the whole segment

2, #


Pattern.compile(regex).matcher(str).find()

The Matcher. find () method is a method that only matches strings

If you do not want to use global matching, you can use the Matcher. find () method

2. Attached source code #

1. String. matches (regex) #

String.matches(regex)


public boolean matches(String regex) {
    return Pattern.matches(regex, this);
}
Pattern.matches(regex, this)

public static boolean matches(String regex, CharSequence input) {
  Pattern p = Pattern.compile(regex);
  Matcher m = p.matcher(input);
  return m.matches();
}

2. Matcher. find () #

Pattern.compile


public static Pattern compile(String regex) {
    return new Pattern(regex, 0);
}
Pattern.matcher

public Matcher matcher(CharSequence input) {
    if (!compiled) {
      synchronized(this) {
        if (!compiled)
          compile();
      }
    }
    Matcher m = new Matcher(this, input);
    return m;
}

Matcher.find()


public boolean find() {
    int nextSearchIndex = last;
    if (nextSearchIndex == first)
      nextSearchIndex++;
    // If next search starts before region, start it at region
    if (nextSearchIndex < from)
      nextSearchIndex = from;
    // If next search starts beyond region then it fails
    if (nextSearchIndex > to) {
      for (int i = 0; i < groups.length; i++)
        groups[i] = -1;
      return false;
    }
    return search(nextSearchIndex);
}

3. Summary #

Each match has its advantages and disadvantages, and everyone can choose according to their needs

Matcher. find () is more convenient if you just need to get whether a string contains a string


Related articles: