String handling tips in Python
- 2020-05-10 18:25:52
- OfStack
1. How to split a string with multiple delimiters?
The actual case
We want to split a string into different character segments according to the delimiter. The string contains many different delimiters, such as:
s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd'
Among them
<,>,<;>,<|>,<\t>
They're all delimiters. How do you deal with them?
The solution
Continuous use
split()
Method, one separator at a time
# use Python2 def mySplit(s,ds): res = [s] for d in ds: t = [] map(lambda x: t.extend(x.split(d)), res) res = t return [x for x in res if x] s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd' result = mySplit(s, ';,|\t') print(result)
C:\Users\Administrator>C:\Python\Python27\python.exe E:\python-intensive-training\s2.py ['asd', 'aad', 'dasd', 'dasd', 'sdasd', 'asd', 'Adas', 'sdasd', 'Asdasd', 'd', 'asd']
Using regular expressions
re.split()
Method, split the string once
>>> import re >>> re.split('[,;\t|]+','asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd') ['asd', 'aad', 'dasd', 'dasd', 'sdasd', 'asd', 'Adas', 'sdasd', 'Asdasd', 'd', 'asd']
2. How do I determine if the string a begins or ends with the string b?
The actual case
If a directory contains the following files:
quicksort.c graph.py heap.java install.sh stack.cpp ......
Now I need to give
.sh
and
.py
Executable permissions on the folder at the end
The solution
Using strings
startswith()
and
endswith()
methods
>>> import os, stat >>> os.listdir('./') ['heap.java', 'quicksort.c', 'stack.cpp', 'install.sh', 'graph.py'] >>> [name for name in os.listdir('./') if name.endswith(('.sh','.py'))] ['install.sh', 'graph.py'] >>> os.chmod('install.sh', os.stat('install.sh').st_mode | stat.S_IXUSR)
[root@iZ28i253je0Z t]# ls -l install.sh -rwxr--r-- 1 root root 0 Sep 15 18:13 install.sh
3. How to adjust the format of Chinese text?
The actual case
A log file of a piece of software in which the date format is
yyy-mm-dd
:
2016-09-15 18:27:26 statu unpacked python3-pip:all 2016-09-15 19:27:26 statu half-configured python3-pip:all 2016-09-15 20:27:26 statu installd python3-pip:all 2016-09-15 21:27:26 configure asdasdasdas:all python3-pip:all
You need to change the middle date to an American date format
mm/dd/yyy
,
2016-09-15 --> 09/15/2016
How to deal with it?
The solution
Using regular expressions
split()
0
Method to do string substitution
Capture the content of each section in the order of the individual capture groups in the replacement string, using the capture groups of regular expressions.
>>> log = '2016-09-15 18:27:26 statu unpacked python3-pip:all' >>> import re # According to the order >>> re.sub('(\d{4})-(\d{2})-(\d{2})', r'\2/\3/\1' , log) '09/15/2016 18:27:26 statu unpacked python3-pip:all' # Grouping using regular expressions >>> re.sub('(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})', r'\g<month>/\g<day>/\g<year>' , log) '09/15/2016 18:27:26 statu unpacked python3-pip:all'
4. How to concatenate multiple small strings into one large string?
The actual case
When designing a network program, we customized a network protocol based on UDP and passed 1 series of parameters to the server in a fixed order:
hwDetect: "<0112>" gxDepthBits: "<32>" gxResolution: "<1024x768>" gxRefresh: "<60>" fullAlpha: "<1>" lodDist: "<100.0>" DistCull: "<500.0>"
In the program, we collect the parameters into a list in order:
# use Python2 def mySplit(s,ds): res = [s] for d in ds: t = [] map(lambda x: t.extend(x.split(d)), res) res = t return [x for x in res if x] s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd' result = mySplit(s, ';,|\t') print(result)
0
Finally, we need to splice each parameter into one packet for sending:
"<0112><32><1024x768><60><1><100.0><500.0>"
The solution
Iterate over the list, sequentially concatenating each string using the '+' operation
>>> for n in ["<0112>","<32>","<1024x768>","<60>","<1>","<100.0>","<500.0>"]: ... result += n ... >>> result '<0112><32><1024x768><60><1><100.0><500.0>'
use
str.join()
Method to quickly concatenate all the strings in the list
# use Python2 def mySplit(s,ds): res = [s] for d in ds: t = [] map(lambda x: t.extend(x.split(d)), res) res = t return [x for x in res if x] s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd' result = mySplit(s, ';,|\t') print(result)
3
If there are Numbers in the list, you can use the generator to convert:
# use Python2 def mySplit(s,ds): res = [s] for d in ds: t = [] map(lambda x: t.extend(x.split(d)), res) res = t return [x for x in res if x] s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd' result = mySplit(s, ';,|\t') print(result)
4
5. How to align a string left, right, or center?
The actual case
A dictionary stores 1 series of property values:
# use Python2 def mySplit(s,ds): res = [s] for d in ds: t = [] map(lambda x: t.extend(x.split(d)), res) res = t return [x for x in res if x] s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd' result = mySplit(s, ';,|\t') print(result)
5
In the program, we want to output its content in the following format. What do we do with it?
# use Python2 def mySplit(s,ds): res = [s] for d in ds: t = [] map(lambda x: t.extend(x.split(d)), res) res = t return [x for x in res if x] s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd' result = mySplit(s, ';,|\t') print(result)
6
The solution
Using strings
str.ljust()
,
str.rjust,str.cente()
Center left and right
>>> info = {'ip':'127.0.0.1','blog': 'www.anshengme.com','title': 'Hello world','port': '80'} # Get the dictionary keys The maximum length >>> max(map(len, info.keys())) 5 >>> w = max(map(len, info.keys())) >>> for k in info: ... print(k.ljust(w), ':',info[k]) ... # Get the result port : 80 blog : www.anshengme.com ip : 127.0.0.1 title : Hello world
use
format()
Method, pass like '
<
20','
>
The 20','^20' parameter does the same thing
>>> for k in info: ... print(format(k,'^'+str(w)), ':',info[k]) ... port : 80 blog : www.anshengme.com ip : 127.0.0.1 title : Hello world
6. How do I remove unwanted characters from a string?
The actual case
Filter out the extra white space characters after the user enters the card: anshengm.com@gmail.com
Filter '\r' in edit text under certain windows: hello word\r\n
Remove the unicode combination (tone) from the text: 'ní ha & # 780; o, chi & # 772; fa & # 768; n '
The solution
string
strip()
,
lstrip(),rstrip()
Method to remove characters from both ends of a string
# use Python2 def mySplit(s,ds): res = [s] for d in ds: t = [] map(lambda x: t.extend(x.split(d)), res) res = t return [x for x in res if x] s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd' result = mySplit(s, ';,|\t') print(result)
9
Delete a fixed position of the character, you can use the slice + Mosaic method
>>> s[:3] + s[4:] 'abc123'
A string of
replace()
Method or regular expression
split()
0
Delete any position character
>>> s = '\tabc\t123\txyz' >>> s.replace('\t', '') 'abc123xyz'
use
split()
0
To delete multiple
>>> import re >>> re.sub('[\t\r]','', string) 'abc123xyzopq'
string
re.split()
0
Method to delete multiple different characters at the same time
>>> import string >>> s = 'abc123xyz' >>> s.translate(string.maketrans('abcxyz','xyzabc')) 'xyz123abc'
>>> s = '\rasd\t23\bAds' >>> s.translate(None, '\r\t\b') 'asd23Ads'
# python2.7 >>> i = u'ní hǎo, chī fàn' >>> i u'ni\u0301 ha\u030co, chi\u0304 fa\u0300n' >>> i.translate(dict.fromkeys([0x0301, 0x030c, 0x0304, 0x0300])) u'ni hao, chi fan'
conclusion
The above is the string processing skills in Python for you to sort out. In this paper, cases, solutions and examples are used to demonstrate how to solve the problem, which has a definite reference value for you to learn or use python. If necessary, you can refer to it.
More about Python related content interested readers to view this site project: Python string skills summary, Python coding skills summary, Python pictures skills summary, "Python data structure and algorithm tutorial", "Python Socket programming skills summary", "Python function using techniques", "Python introduction and advanced tutorial" and "Python file and directory skills summary"