Java USES regular expressions to extract data

  • 2020-05-26 08:33:11
  • OfStack

What is a regular expression

A regular expression is a specification that can be used for pattern matching and substitution. A regular expression is a literal pattern composed of ordinary characters (such as characters a to z) and special characters (metacharacters), which describes one or more strings to be matched when finding the text body. The regular expression ACTS as a template to match a character pattern to the string being searched.

Java USES regular expressions to extract data

Java regular expressions are used for a wide range of purposes. Before, we needed to cut the txt text of 1 large 3M into several small texts. If we wrote C#, it would be very simple, and the code was only a few lines of 210.

The shard file code is not pasted, mainly pasted 1 how to use regular expressions to group large strings:

For example, you now have an endlist.txt text file that reads like this:


1300102, The Beijing municipal 
1300103, The Beijing municipal 
1300104, The Beijing municipal 
1300105, The Beijing municipal 
1300106, The Beijing municipal 
1300107, The Beijing municipal 
1300108, The Beijing municipal 
1300109, The Beijing municipal 
1300110, The Beijing municipal 
1300111, The Beijing municipal 
1300112, The Beijing municipal 
1300113, The Beijing municipal 
1300114, The Beijing municipal 
1300115, The Beijing municipal 
1300116, The Beijing municipal 
1300117, The Beijing municipal 
1300118, The Beijing municipal 
1300119, The Beijing municipal 

The seven digits represent the first seven digits of the phone number, followed by Chinese characters indicating the place where the number belongs. Now I'm going to follow these steps 130 131 132... It begins with 130.txt 131.txt 132.txt... In these files.


public static void main(String args[]) {
  File file = null;
  BufferedReader br = null;
  StringBuffer buffer = null;
  String childPath = "src/endlist.txt";
  String data = "";
  try {
   file = new File(childPath);
   buffer = new StringBuffer();
   InputStreamReader isr = new InputStreamReader(new FileInputStream(file), "utf-8");
   br = new BufferedReader(isr);
   int s;
   while ((s = br.read()) != -1) {
    buffer.append((char) s);
   }
   data = buffer.toString();
  } catch (Exception e) {
   e.printStackTrace();
  }
  Map<String, ArrayList<String>> resultMap = new HashMap<String, ArrayList<String>>();
  for (int i = 0; i < 10; i++) {
   resultMap.put("13" + i, new ArrayList<String>());
  }
  Pattern pattern = Pattern.compile("(\\d{3})(\\d{4},[\u4e00-\u9fa5]*\\n)");
  Matcher matcher = pattern.matcher(data); 
  while (matcher.find()) {
   resultMap.get(matcher.group(1)).add(matcher.group(2));
  }
  for (int i = 0; i < 10; i++) {
   if (resultMap.get("13" + i).size() > 0) {
    try {
     File outFile = new File("src/13" + i + ".txt");
     FileOutputStream outputStream = new FileOutputStream(outFile);
     OutputStreamWriter writer = new OutputStreamWriter(outputStream, "utf-8");
     ArrayList<String> tempList = resultMap.get("13" + i);
     for (int j = 0; j < tempList.size(); j++) {
      writer.append(resultMap.get("13" + i).get(j));
     }
     writer.close();
     outputStream.close();
    } catch (Exception e) {
     // TODO Auto-generated catch block
     e.printStackTrace();
    }
   }
  }
 }

Line 24 USES the regular expression "(\\d{3})(\\d{4},[\ \u4e00-\ \u9fa5]*\\n)" for each (), the index starts at 1, and 0 represents the entire expression. So this expression is divided into two groups, the first group represents 3 Numbers, the second group represents 4 Numbers plus multiple Chinese characters plus 1 newline character. The extraction is shown in lines 26-28.

conclusion

The above is the whole content of this article, I hope the content of this article to your study or work can bring 1 definite help, if you have questions you can leave a message to communicate.


Related articles: