python Example of separating sentences with all punctuation marks

  • 2021-07-18 08:24:49
  • OfStack

Problem

Give a paragraph, which consists of short sentences, which may be separated by arbitrary punctuation marks. Want to extract all the short sentences.

Solve

Using re. split function and regular matching method, all short sentences are separated once.


import re
pattern = r',|\.|/|;|\'|`|\[|\]|<|>|\?|:|"|\{|\}|\~|!|@|#|\$|%|\^|&|\(|\)|-|=|\_|\+| , | . | , | ; | ' |'| " | " | · | ! | | … | ( | ) '
test_text = 'b,b.b/b;b\'b`b[b]b<b>b?b:b"b{b}b~b!b@b#b$b%b^b&b(b)b-b=b_b+b , b . b , b ; b ' b'b " b " b · b ! b b … b ( b ) b'
result_list = re.split(pattern, test_text)
print(result_list)

Output is


['b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b']

It can be seen that all b have been extracted.


Related articles: