String handling tips in Python

2020-05-10 18:25:52
OfStack

1. How to split a string with multiple delimiters?

The actual case

We want to split a string into different character segments according to the delimiter. The string contains many different delimiters, such as:


s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd'

Among them <,>,<;>,<|>,<\t> They're all delimiters. How do you deal with them?

The solution

Continuous use split() Method, one separator at a time


#  use Python2 def mySplit(s,ds): res = [s] for d in ds: t = [] map(lambda x: t.extend(x.split(d)), res) res = t return [x for x in res if x] s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd' result = mySplit(s, ';,|\t') print(result)


C:\Users\Administrator>C:\Python\Python27\python.exe E:\python-intensive-training\s2.py ['asd', 'aad', 'dasd', 'dasd', 'sdasd', 'asd', 'Adas', 'sdasd', 'Asdasd', 'd', 'asd']

Using regular expressions re.split() Method, split the string once


>>> import re >>> re.split('[,;\t|]+','asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd') ['asd', 'aad', 'dasd', 'dasd', 'sdasd', 'asd', 'Adas', 'sdasd', 'Asdasd', 'd', 'asd']

2. How do I determine if the string a begins or ends with the string b?

The actual case

If a directory contains the following files:


quicksort.c graph.py heap.java install.sh stack.cpp ......

Now I need to give .sh and .py Executable permissions on the folder at the end

The solution

Using strings startswith() and endswith() methods


>>> import os, stat >>> os.listdir('./') ['heap.java', 'quicksort.c', 'stack.cpp', 'install.sh', 'graph.py'] >>> [name for name in os.listdir('./') if name.endswith(('.sh','.py'))] ['install.sh', 'graph.py'] >>> os.chmod('install.sh', os.stat('install.sh').st_mode | stat.S_IXUSR)


[root@iZ28i253je0Z t]# ls -l install.sh -rwxr--r-- 1 root root 0 Sep 15 18:13 install.sh

3. How to adjust the format of Chinese text?

The actual case

A log file of a piece of software in which the date format is yyy-mm-dd :


2016-09-15 18:27:26 statu unpacked python3-pip:all 2016-09-15 19:27:26 statu half-configured python3-pip:all 2016-09-15 20:27:26 statu installd python3-pip:all 2016-09-15 21:27:26 configure asdasdasdas:all python3-pip:all

You need to change the middle date to an American date format mm/dd/yyy , 2016-09-15 --> 09/15/2016 How to deal with it?

The solution

Using regular expressions split()0 Method to do string substitution

Capture the content of each section in the order of the individual capture groups in the replacement string, using the capture groups of regular expressions.


>>> log = '2016-09-15 18:27:26 statu unpacked python3-pip:all' >>> import re #  According to the order  >>> re.sub('(\d{4})-(\d{2})-(\d{2})', r'\2/\3/\1' , log) '09/15/2016 18:27:26 statu unpacked python3-pip:all' #  Grouping using regular expressions  >>> re.sub('(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})', r'\g<month>/\g<day>/\g<year>' , log) '09/15/2016 18:27:26 statu unpacked python3-pip:all'

4. How to concatenate multiple small strings into one large string?

The actual case

When designing a network program, we customized a network protocol based on UDP and passed 1 series of parameters to the server in a fixed order:


hwDetect: "<0112>" gxDepthBits: "<32>" gxResolution: "<1024x768>" gxRefresh: "<60>" fullAlpha: "<1>" lodDist: "<100.0>" DistCull: "<500.0>"

In the program, we collect the parameters into a list in order:


#  use Python2 def mySplit(s,ds): res = [s] for d in ds: t = [] map(lambda x: t.extend(x.split(d)), res) res = t return [x for x in res if x] s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd' result = mySplit(s, ';,|\t') print(result)

Finally, we need to splice each parameter into one packet for sending:


"<0112><32><1024x768><60><1><100.0><500.0>"

The solution

Iterate over the list, sequentially concatenating each string using the '+' operation


>>> for n in ["<0112>","<32>","<1024x768>","<60>","<1>","<100.0>","<500.0>"]: ... result += n ... >>> result '<0112><32><1024x768><60><1><100.0><500.0>'

use str.join() Method to quickly concatenate all the strings in the list


#  use Python2 def mySplit(s,ds): res = [s] for d in ds: t = [] map(lambda x: t.extend(x.split(d)), res) res = t return [x for x in res if x] s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd' result = mySplit(s, ';,|\t') print(result)

If there are Numbers in the list, you can use the generator to convert:


#  use Python2 def mySplit(s,ds): res = [s] for d in ds: t = [] map(lambda x: t.extend(x.split(d)), res) res = t return [x for x in res if x] s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd' result = mySplit(s, ';,|\t') print(result)

5. How to align a string left, right, or center?

The actual case

A dictionary stores 1 series of property values:


#  use Python2 def mySplit(s,ds): res = [s] for d in ds: t = [] map(lambda x: t.extend(x.split(d)), res) res = t return [x for x in res if x] s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd' result = mySplit(s, ';,|\t') print(result)

In the program, we want to output its content in the following format. What do we do with it?


#  use Python2 def mySplit(s,ds): res = [s] for d in ds: t = [] map(lambda x: t.extend(x.split(d)), res) res = t return [x for x in res if x] s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd' result = mySplit(s, ';,|\t') print(result)

The solution

Using strings str.ljust() , str.rjust,str.cente() Center left and right


>>> info = {'ip':'127.0.0.1','blog': 'www.anshengme.com','title': 'Hello world','port': '80'} #  Get the dictionary keys The maximum length  >>> max(map(len, info.keys())) 5 >>> w = max(map(len, info.keys())) >>> for k in info: ... print(k.ljust(w), ':',info[k]) ... #  Get the result  port : 80 blog : www.anshengme.com ip : 127.0.0.1 title : Hello world

use format() Method, pass like ' < 20',' > The 20','^20' parameter does the same thing


>>> for k in info: ... print(format(k,'^'+str(w)), ':',info[k]) ... port : 80 blog : www.anshengme.com ip : 127.0.0.1 title : Hello world

6. How do I remove unwanted characters from a string?

The actual case

Filter out the extra white space characters after the user enters the card: anshengm.com@gmail.com

Filter '\r' in edit text under certain windows: hello word\r\n

Remove the unicode combination (tone) from the text: 'ní ha & # 780; o, chi & # 772; fa & # 768; n '

The solution

string strip() , lstrip(),rstrip() Method to remove characters from both ends of a string


#  use Python2 def mySplit(s,ds): res = [s] for d in ds: t = [] map(lambda x: t.extend(x.split(d)), res) res = t return [x for x in res if x] s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd' result = mySplit(s, ';,|\t') print(result)

Delete a fixed position of the character, you can use the slice + Mosaic method


>>> s[:3] + s[4:] 'abc123'

A string of replace() Method or regular expression split()0 Delete any position character


>>> s = '\tabc\t123\txyz' >>> s.replace('\t', '') 'abc123xyz'

use split()0 To delete multiple


>>> import re >>> re.sub('[\t\r]','', string) 'abc123xyzopq'

string re.split()0 Method to delete multiple different characters at the same time


>>> import string >>> s = 'abc123xyz' >>> s.translate(string.maketrans('abcxyz','xyzabc')) 'xyz123abc'


>>> s = '\rasd\t23\bAds' >>> s.translate(None, '\r\t\b') 'asd23Ads'


# python2.7 >>> i = u'ní hǎo, chī fàn' >>> i u'ni\u0301 ha\u030co, chi\u0304 fa\u0300n' >>> i.translate(dict.fromkeys([0x0301, 0x030c, 0x0304, 0x0300])) u'ni hao, chi fan'

conclusion

The above is the string processing skills in Python for you to sort out. In this paper, cases, solutions and examples are used to demonstrate how to solve the problem, which has a definite reference value for you to learn or use python. If necessary, you can refer to it.

More about Python related content interested readers to view this site project: Python string skills summary, Python coding skills summary, Python pictures skills summary, "Python data structure and algorithm tutorial", "Python Socket programming skills summary", "Python function using techniques", "Python introduction and advanced tutorial" and "Python file and directory skills summary"