python String Cutting: Comparative Analysis of str. split of and re. split of

  • 2021-07-22 10:02:03
  • OfStack

1. str. split does not support regular and multiple cutting symbols, and does not sense the number of spaces. For example, cutting with spaces will lead to the following situations.


>>> s1="aa bb cc"
>>> s1.split(' ')
['aa', 'bb', '', 'cc']

Therefore, split is only suitable for simple character segmentation

2. re. split supports regular and multi-character cutting


>>> print line
abc aa;bb,cc | dd(xx).xxx 12.12'	xxxx
 Cut by space 
>>> re.split(r' ',line)
['abc', 'aa;bb,cc', '|', 'dd(xx).xxx', "12.12'\txxxx"]
 Add spaces in optional boxes [] Inside 
>>> re.split(r'[ ]',line)
['abc', 'aa;bb,cc', '|', 'dd(xx).xxx', "12.12'\txxxx"]
 Cut by all white space characters: \s ( [\t\n\r\f\v] ) \S (Any non-white space character) [^\t\n\r\f\v]
>>> re.split(r'[\s]',line)
['abc', 'aa;bb,cc', '|', 'dd(xx).xxx', "12.12'", 'xxxx']
 Multi-character matching 
>>> re.split(r'[;,]',line)
['abc aa', 'bb', "cc | dd(xx).xxx 12.12'\txxxx"]
>>> re.split(r'[;,\s]',line)
['abc', 'aa', 'bb', 'cc', '|', 'dd(xx).xxx', "12.12'", 'xxxx']
 Use parentheses to capture the fit of the grouping, leaving the delimiter by default 
re.split('([;])',line)
['abc aa', ';', "bb,cc | dd(xx).xxx 12.12'\txxxx"]
 Remove the separator and add ?:
>>> re.split(r'(?:;)',line)
['abc aa', "bb,cc | dd(xx).xxx 12.12'\txxxx"]

Related articles: