Python string processing example details
- 2020-06-01 10:15:26
- OfStack
Python string processing example details
1. Split a string with multiple delimiters
1. How to split a string with multiple delimiters
Problem: we want to split a string into different fields according to the delimiter. The string contains many different delimiters, such as:
s = "ab;cd|efg|hi,jkl|mn\topq;rst,uvw\txyz"
Among them; ,|,\t are all separator symbols, how to deal with them?
Method 1: continuously use the str.split () method, one separator at a time
s = "ab;cd|efg|hi,jkl|mn\topq;rst,uvw\txyz"
def mySplit(s,ds):
res = [s]
for d in ds:
t = []
map(lambda x: t.extend(x.split(d)), res)
res = t
return res
print mySplit(s,';|,\t')
Output:
['ab', 'cd', 'efg', 'hi', 'jkl', 'mn', 'opq', 'rst', 'uvw', 'xyz']
Method 2: use the regular expression re.split () method to split the string once
import re
s = "ab;cd|efg|hi,jkl|mn\topq;rst,uvw\txyz"
print re.split(r'[;|,\t]+',s)
Output:
['ab', 'cd', 'efg', 'hi', 'jkl', 'mn', 'opq', 'rst', 'uvw', 'xyz']
2. Adjust the Chinese text format of the string
1. How do I determine if the string a begins or ends with the string b
Questions: 1 series is a file system directory file: a. py, quicksort. c, stack. cpp, b. sh, write a program to all. And sh file. Add user can execute permissions py file?
Solution: use the str.startswith () and end.startswith () methods in the string (note: parameters use tuples when multiple matches)
In [1]: import os
# Lists the current directory .sh And in order to .py Closing file
In [2]: [name for name in os.listdir('.') if name.endswith(('.py','.sh'))]
Out[2]: ['b.sh', 'a.py']
In [3]: import stat
# To view a.py File permissions
In [4]: os.stat('a.py').st_mode
Out[4]: 33204
# Convert file permissions to 8 Base, that is, the usual see permissions
In [5]: oct(os.stat('a.py').st_mode)
Out[5]: '0100664'
# Change file permissions to add 1 Two executable permissions
In [6]: os.chmod('a.py',os.stat('a.py').st_mode | stat.S_IXUSR)
In [7]: ll
total 0
-rwxrw-r-- 1 yangyang 0 5 month 9 14:48 a.py*
-rw-rw-r-- 1 yangyang 0 5 month 9 14:48 b.sh
-rw-rw-r-- 1 yangyang 0 5 month 9 14:48 quicksort.c
-rw-rw-r-- 1 yangyang 0 5 month 9 14:48 stack.cpp
2. How to adjust the format of Chinese text
Question: the log file of a piece of software, in which the date format is "yyyy-mm-dd" :
2017-05-08 09:12:48 status half-configured passwd:amd64 1:4.2-3.1ubuntu5.2
2017-05-08 09:12:48 status installed passwd:amd64 1:4.2-3.1ubuntu5.2
2017-05-08 09:12:48 status unpacked passwd:amd64 1:4.2-3.1ubuntu5.2
2017-05-08 09:12:48 status unpacked passwd:amd64 1:4.2-3.1ubuntu5.2
2017-05-08 09:12:48 status half-configured passwd:amd64 1:4.2-3.1ubuntu5.2
2017-05-08 09:12:48 status installed passwd:amd64 1:4.2-3.1ubuntu5.2
2017-05-08 09:12:48 startup packages configure
09:12:48 startup packages configure
We would like to change the middle date to the us date format "mm/dd/yyyy", 2017-05-08 = = > 05/08/2017, what should I do?
Solution: use the regular expression re.sub () method for string substitution, capture each section with the capture group of the regular expression, and adjust the capture order of each group in the string.
In [1]: import re
In [2]: log = open('/var/log/dpkg.log').read()
# (\d{4}) Match to the 4 A number of 1 Capture groups in the order of 1 . Therefore, the following is used instead \1 At the end, r Is to prevent strings from being escaped
In [3]: print re.sub('(\d{4})-(\d{2})-(\d{2})',r'\2/\3/\1', log)
05/08/2017 09:12:48 status unpacked passwd:amd64 1:4.2-3.1ubuntu5.2
05/08/2017 09:12:48 status unpacked passwd:amd64 1:4.2-3.1ubuntu5.2
05/08/2017 09:12:48 status unpacked passwd:amd64 1:4.2-3.1ubuntu5.2
05/08/2017 09:12:48 status half-configured passwd:amd64 1:4.2-3.1ubuntu5.2
05/08/2017 09:12:48 status installed passwd:amd64 1:4.2-3.1ubuntu5.2
05/08/2017 09:12:48 startup packages configure
# You can also give each capture group a name, rather than processing it in the default order
In [5]: print re.sub('(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})',r'\g<month>/\g<day>/\g<year>', log)
05/08/2017 09:12:48 status unpacked passwd:amd64 1:4.2-3.1ubuntu5.2
05/08/2017 09:12:48 status unpacked passwd:amd64 1:4.2-3.1ubuntu5.2
05/08/2017 09:12:48 status unpacked passwd:amd64 1:4.2-3.1ubuntu5.2
05/08/2017 09:12:48 status half-configured passwd:amd64 1:4.2-3.1ubuntu5.2
05/08/2017 09:12:48 status installed passwd:amd64 1:4.2-3.1ubuntu5.2
05/08/2017 09:12:48 startup packages configure
3. String concatenation
1. How to concatenate multiple small strings into a single large string
Problem: in the program we collect the parameters in order into a list: [" < 0112 > ", " < 32 > "," < 1024x768 > "," < 60 > "], the parameters should be spliced into datagram for sending." < 0112 > < 32 > < 1024x768 > < 60 > "
Solutions:
Method 1: iterate over the list, concatenating each string in sequence using the "+" operation
In [1]: pl = ["<0112>", "<32>","<1024x768>","<60>" ]
In [2]: s = ''
# This method can produce many temporary results, resulting in a waste of resources
In [3]: for p in pl:
...: s = s + p
...: print s
...:
<0112>
<0112><32>
<0112><32><1024x768>
<0112><32><1024x768><60>
In [4]: s
Out[4]: '<0112><32><1024x768><60>'
Method 2: use the str.join () method to quickly concatenate all the strings in the list
In [5]: ''.join(pl)
Out[5]: '<0112><32><1024x768><60>'
There is a list l = ['abc',123,45,'xyz'], how do I make 123 and 45 concatenate as strings
In [6]: l = ['abc',123,45,'xyz']
# Use generator expressions, which are less expensive than list expressions
In [7]: (str(x) for x in l)
...:
Out[7]: <generator object <genexpr> at 0x7fe3cadef550>
In [8]: ''.join(str(x) for x in l)
Out[8]: 'abc12345xyz'
4. Center alignment of strings
1. How to align a string left, right, or center
Problem: a dictionary stores 1 series of property values
{
"loDist":100.0,
"smartCull":0.04,
"farclip":477
}
What if you want to output in a program in a neat format?
Solutions:
Method 1: align, right, center, str.ljust (), str.rjust (), str.center () with the string str.ljust (), str.rjust (), str.center ()
Method 2: using the format method, pass similar ' < 20',' > The 20','^20' parameter does the same thing
s = "ab;cd|efg|hi,jkl|mn\topq;rst,uvw\txyz"
def mySplit(s,ds):
res = [s]
for d in ds:
t = []
map(lambda x: t.extend(x.split(d)), res)
res = t
return res
print mySplit(s,';|,\t')
Output:
['ab', 'cd', 'efg', 'hi', 'jkl', 'mn', 'opq', 'rst', 'uvw', 'xyz']
0
2. Get rid of unwanted strings
Question:
1. Filter out the extra white space characters before and after user input: 'nick@gmail.com'
2. Filter the text '\r': 'hello world\r\n'
3. Remove the combination of unicode symbols (tones) from the text :u'z '; u'
Solutions:
Method 1: string strip(),lstrip(),rstrip()
Method 2: delete a single fixed position of the character, can be used to slice + Mosaic
Method 3: the replace method of the string or the regular expression re.sub () method deletes arbitrary positional characters
Method 4: the string translate() method, which can delete many different characters at the same time
s = "ab;cd|efg|hi,jkl|mn\topq;rst,uvw\txyz"
def mySplit(s,ds):
res = [s]
for d in ds:
t = []
map(lambda x: t.extend(x.split(d)), res)
res = t
return res
print mySplit(s,';|,\t')
Output:
['ab', 'cd', 'efg', 'hi', 'jkl', 'mn', 'opq', 'rst', 'uvw', 'xyz']
1
Thank you for reading, I hope to help you, thank you for your support of this site!