Easily Encountered Function re. sub bytes string for python Data Cleaning

  • 2021-07-26 08:14:16
  • OfStack

re.sub

Function, a powerful replacement function than replace, replaces the module on regular expression matching with repl

re.sub(pattern, repl, string, count=0, flags=0)

Returns the leftmost regular expression qualified string replaced by repl. If the regular expression does not match, the string is not modified.

\n is converted to a single newline character,

\ r is converted to a carriage return, and so forth. Unknown escapes such as\ j are left alone. If followed by the number such as\ 6, replace the sixth string, group 6 in the pattern. For example:


>>>
>>> re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):',
...  r'static PyObject*\npy_\1(void)\n{',
...  'def myfunc():')
'static PyObject*\npy_myfunc(void)\n{'

If repl is a function, a call will occur for each non-overlapping pattern. This function takes a single match object parameter and returns a replacement string. For example:


>>>
>>> def dashrepl(matchobj):
...  if matchobj.group(0) == '-': return ' '
...  else: return '-'
>>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')
'pro--gram files'
>>> re.sub(r'\sAND\s', ' & ', 'Baked Beans And Spam', flags=re.IGNORECASE)
'Baked Beans & Spam'

The template can be a string or an RE object

count is the maximum number of replacements, which is not a negative integer. If omitted or taken 0, all documents will be matched and replaced;

class bytes([source[, encoding[, errors]]])

Returns 1 new array object, which cannot modify array elements. Value range per element: 0 < = x < 256. The main difference between bytes function and bytearray function is that the elements of objects generated by bytes function cannot be modified, while the elements of objects generated by bytearray function can be modified. Therefore, except that the modifiable object function is different from the bytearray function, all other usage methods are the same. Finally, its parameters are defined in the same way as bytearray function.

Instances


a = bytes("abs",'utf-8')

print(a)
b'abs'

b = bytes(1)

print(b)
b'\x00'

class bytearray([source[, encoding[, errors]]])

Returns 1 new byte array. The bytearray class is 0 < = x < 256 integer variable sequence. It has most of the common methods for variable sequences described in Variable Sequence Types, and most of the methods for byte types, see Bytes and Bytearray operations.

Optional source parameters can be used to initialize arrays in several different ways:

If it is a string, you must also give encoding (and optional error) parameters; bytearray () then uses str. encode () to convert the string to bytes.

If it is an integer, the array will have that size and will be initialized with null bytes.

If it is an object that conforms to the buffer interface, the byte array is initialized using the object's read-only buffer.

If it is 1 iterable, it must be 0 < = x < 256 integers, which are used as the initial contents of the array.

Without parameters, an array of size 0 will be created.

bytes.strip([chars]) & bytearray.strip([chars])

Returns a copy of the sequence that deletes the specified leading and trailing bytes. The chars parameter is a binary sequence that specifies the set of byte values to delete-the name refers to a method that typically uses the ASCII character. If omitted or none, the chars parameter defaults to deleting ASCII spaces. The chars parameter is not a prefix or suffix; Instead, all combinations of their values are stripped:


> b' spacious '.strip()
b'spacious'
> b'www.example.com'.strip(b'cmowz.')
b'example'

string.punctuation

An ASCII string considered a punctuation mark in the C locale


Related articles: