Python string processing example details

  • 2020-06-01 10:15:26
  • OfStack

Python string processing example details

1. Split a string with multiple delimiters

1. How to split a string with multiple delimiters

Problem: we want to split a string into different fields according to the delimiter. The string contains many different delimiters, such as:


s = "ab;cd|efg|hi,jkl|mn\topq;rst,uvw\txyz"

Among them; ,|,\t are all separator symbols, how to deal with them?

Method 1: continuously use the str.split () method, one separator at a time


s = "ab;cd|efg|hi,jkl|mn\topq;rst,uvw\txyz"

def mySplit(s,ds):
 res = [s]
 for d in ds:
  t = []
  map(lambda x: t.extend(x.split(d)), res)
  res = t
 return res

print mySplit(s,';|,\t')

 Output: 
['ab', 'cd', 'efg', 'hi', 'jkl', 'mn', 'opq', 'rst', 'uvw', 'xyz']

Method 2: use the regular expression re.split () method to split the string once


import re

s = "ab;cd|efg|hi,jkl|mn\topq;rst,uvw\txyz"

print re.split(r'[;|,\t]+',s)

 Output: 
['ab', 'cd', 'efg', 'hi', 'jkl', 'mn', 'opq', 'rst', 'uvw', 'xyz']

2. Adjust the Chinese text format of the string

1. How do I determine if the string a begins or ends with the string b

Questions: 1 series is a file system directory file: a. py, quicksort. c, stack. cpp, b. sh, write a program to all. And sh file. Add user can execute permissions py file?

Solution: use the str.startswith () and end.startswith () methods in the string (note: parameters use tuples when multiple matches)


In [1]: import os

#  Lists the current directory .sh And in order to .py Closing file 
In [2]: [name for name in os.listdir('.') if name.endswith(('.py','.sh'))]
Out[2]: ['b.sh', 'a.py']

In [3]: import stat

#  To view  a.py  File permissions 
In [4]: os.stat('a.py').st_mode
Out[4]: 33204

#  Convert file permissions to 8 Base, that is, the usual see permissions 
In [5]: oct(os.stat('a.py').st_mode)
Out[5]: '0100664'

#  Change file permissions to add 1 Two executable permissions 
In [6]: os.chmod('a.py',os.stat('a.py').st_mode | stat.S_IXUSR)

In [7]: ll
total 0
-rwxrw-r-- 1 yangyang 0 5 month  9 14:48 a.py*
-rw-rw-r-- 1 yangyang 0 5 month  9 14:48 b.sh
-rw-rw-r-- 1 yangyang 0 5 month  9 14:48 quicksort.c
-rw-rw-r-- 1 yangyang 0 5 month  9 14:48 stack.cpp

2. How to adjust the format of Chinese text

Question: the log file of a piece of software, in which the date format is "yyyy-mm-dd" :


2017-05-08 09:12:48 status half-configured passwd:amd64 1:4.2-3.1ubuntu5.2
2017-05-08 09:12:48 status installed passwd:amd64 1:4.2-3.1ubuntu5.2
2017-05-08 09:12:48 status unpacked passwd:amd64 1:4.2-3.1ubuntu5.2
2017-05-08 09:12:48 status unpacked passwd:amd64 1:4.2-3.1ubuntu5.2
2017-05-08 09:12:48 status half-configured passwd:amd64 1:4.2-3.1ubuntu5.2
2017-05-08 09:12:48 status installed passwd:amd64 1:4.2-3.1ubuntu5.2
2017-05-08 09:12:48 startup packages configure
09:12:48 startup packages configure

We would like to change the middle date to the us date format "mm/dd/yyyy", 2017-05-08 = = > 05/08/2017, what should I do?

Solution: use the regular expression re.sub () method for string substitution, capture each section with the capture group of the regular expression, and adjust the capture order of each group in the string.


In [1]: import re

In [2]: log = open('/var/log/dpkg.log').read()
# (\d{4})  Match to the 4 A number of 1 Capture groups in the order of 1 . Therefore, the following is used instead \1 At the end, r Is to prevent strings from being escaped 
In [3]: print re.sub('(\d{4})-(\d{2})-(\d{2})',r'\2/\3/\1', log)
05/08/2017 09:12:48 status unpacked passwd:amd64 1:4.2-3.1ubuntu5.2
05/08/2017 09:12:48 status unpacked passwd:amd64 1:4.2-3.1ubuntu5.2
05/08/2017 09:12:48 status unpacked passwd:amd64 1:4.2-3.1ubuntu5.2
05/08/2017 09:12:48 status half-configured passwd:amd64 1:4.2-3.1ubuntu5.2
05/08/2017 09:12:48 status installed passwd:amd64 1:4.2-3.1ubuntu5.2
05/08/2017 09:12:48 startup packages configure

#  You can also give each capture group a name, rather than processing it in the default order 
In [5]: print re.sub('(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})',r'\g<month>/\g<day>/\g<year>', log)
05/08/2017 09:12:48 status unpacked passwd:amd64 1:4.2-3.1ubuntu5.2
05/08/2017 09:12:48 status unpacked passwd:amd64 1:4.2-3.1ubuntu5.2
05/08/2017 09:12:48 status unpacked passwd:amd64 1:4.2-3.1ubuntu5.2
05/08/2017 09:12:48 status half-configured passwd:amd64 1:4.2-3.1ubuntu5.2
05/08/2017 09:12:48 status installed passwd:amd64 1:4.2-3.1ubuntu5.2
05/08/2017 09:12:48 startup packages configure

3. String concatenation

1. How to concatenate multiple small strings into a single large string

Problem: in the program we collect the parameters in order into a list: [" < 0112 > ", " < 32 > "," < 1024x768 > "," < 60 > "], the parameters should be spliced into datagram for sending." < 0112 > < 32 > < 1024x768 > < 60 > "

Solutions:

Method 1: iterate over the list, concatenating each string in sequence using the "+" operation


In [1]: pl = ["<0112>", "<32>","<1024x768>","<60>" ]

In [2]: s = ''

#  This method can produce many temporary results, resulting in a waste of resources 
In [3]: for p in pl:
 ...:  s = s + p
 ...:  print s
 ...:  
<0112>
<0112><32>
<0112><32><1024x768>
<0112><32><1024x768><60>
In [4]: s
Out[4]: '<0112><32><1024x768><60>'

Method 2: use the str.join () method to quickly concatenate all the strings in the list


In [5]: ''.join(pl)
Out[5]: '<0112><32><1024x768><60>'

There is a list l = ['abc',123,45,'xyz'], how do I make 123 and 45 concatenate as strings


In [6]: l = ['abc',123,45,'xyz']

#  Use generator expressions, which are less expensive than list expressions 
In [7]: (str(x) for x in l)
 ...: 
Out[7]: <generator object <genexpr> at 0x7fe3cadef550>

In [8]: ''.join(str(x) for x in l)
Out[8]: 'abc12345xyz'

4. Center alignment of strings

1. How to align a string left, right, or center

Problem: a dictionary stores 1 series of property values


{
 "loDist":100.0,
 "smartCull":0.04,
 "farclip":477
}

What if you want to output in a program in a neat format?

Solutions:

Method 1: align, right, center, str.ljust (), str.rjust (), str.center () with the string str.ljust (), str.rjust (), str.center ()

Method 2: using the format method, pass similar ' < 20',' > The 20','^20' parameter does the same thing


s = "ab;cd|efg|hi,jkl|mn\topq;rst,uvw\txyz"

def mySplit(s,ds):
 res = [s]
 for d in ds:
  t = []
  map(lambda x: t.extend(x.split(d)), res)
  res = t
 return res

print mySplit(s,';|,\t')

 Output: 
['ab', 'cd', 'efg', 'hi', 'jkl', 'mn', 'opq', 'rst', 'uvw', 'xyz']

0

2. Get rid of unwanted strings

Question:

1. Filter out the extra white space characters before and after user input: 'nick@gmail.com'

2. Filter the text '\r': 'hello world\r\n'

3. Remove the combination of unicode symbols (tones) from the text :u'z '; u'

Solutions:

Method 1: string strip(),lstrip(),rstrip()

Method 2: delete a single fixed position of the character, can be used to slice + Mosaic

Method 3: the replace method of the string or the regular expression re.sub () method deletes arbitrary positional characters

Method 4: the string translate() method, which can delete many different characters at the same time


s = "ab;cd|efg|hi,jkl|mn\topq;rst,uvw\txyz"

def mySplit(s,ds):
 res = [s]
 for d in ds:
  t = []
  map(lambda x: t.extend(x.split(d)), res)
  res = t
 return res

print mySplit(s,';|,\t')

 Output: 
['ab', 'cd', 'efg', 'hi', 'jkl', 'mn', 'opq', 'rst', 'uvw', 'xyz']

1

Thank you for reading, I hope to help you, thank you for your support of this site!


Related articles: