Python implements simple text string manipulation

  • 2020-07-21 08:57:05
  • OfStack

This article illustrates Python's approach to simple text string processing. To share for your reference, specific as follows:

For 1 text string, use Python's string.split() Method Cut it. Let's see how this works in practice.


mySent = 'This book is the best book on python!'
print mySent.split()

Output:


['This', 'book', 'is', 'the', 'best', 'book', 'on', 'python!']

As you can see, sharding works well, but punctuation is also treated as a word and can be treated using regular expressions, where the delimiter is any string other than a word or number.


import re
reg = re.compile('\\W*')
mySent = 'This book is the best book on python!'
listof = reg.split(mySent)
print listof

The output is:


['This', 'book', 'is', 'the', 'best', 'book', 'on', 'python', '']

You now have a list of 1 series of words, but the empty string needs to be removed.

You can calculate the length of each string and return only strings greater than 0.


import re
reg = re.compile('\\W*')
mySent = 'This book is the best book on python!'
listof = reg.split(mySent)
new_list = [tok for tok in listof if len(tok)>0]
print new_list

The output is:


['This', 'book', 'is', 'the', 'best', 'book', 'on', 'python']

Finally, notice that the first letter in the sentence is capitalized. We need to convert from uppercase to lowercase in the same form as 1. Python, which converts strings to lowercase ( .lower() ) or upper case ( .upper() )


import re
reg = re.compile('\\W*')
mySent = 'This book is the best book on python!'
listof = reg.split(mySent)
new_list = [tok.lower() for tok in listof if len(tok)>0]
print new_list

The output is:


['this', 'book', 'is', 'the', 'best', 'book', 'on', 'python']

Here's the full email:

content


Hi Peter,

With Jose out of town, do you want to
meet once in a while to keep things
going and do some interesting stuff?

Let me know
Eugene


import re
reg = re.compile('\\W*')
email = open('email.txt').read()
list = reg.split(email)
new_txt = [tok.lower() for tok in list if len(tok)>0]
print new_txt

Output:

['hi', 'peter', 'with', 'jose', 'out', 'of', 'town', 'do', 'you', 'want', 'to', 'meet', 'once', 'in', 'a', 'while', 'to', 'keep', 'things', 'going', 'and', 'do', 'some', 'interesting', 'stuff', 'let', 'me', 'know', 'eugene']

For more information about Python, please visit Python String Manipulation Skills Summary, Python Data Structure and Algorithm Tutorial, Python Function Using Skills Summary, Python Introduction and Advanced Classic Tutorial and Python File and Directory Operation Skills Summary.

I hope this article is helpful for Python programming.


Related articles: