The way Python handles control characters in text files

  • 2020-05-26 09:22:56
  • OfStack

Control characters

Control characters (Control Character), or non-print characters, appear in the text of a particular message and represent certain control functions, such as control characters: LF (line feed), CR (enter), FF (page change), DEL (delete), BS (backspace), BEL (ring), etc. Special characters for communication: SOH (header), EOT (tail), ACK (confirmation), etc.

The specific control character 1 has the following two sets:

The 7-bit ASCII defines 33 codes as control characters, they are 0 to 31, and 127, (located in 0x00-0x1F and 0x7F).

The compatible 8-bit ISO/IEC 8859-1 plus 32 codes from 128 to 159 defined from ISO/IEC 6429 is located at 0x80-0x9F.

Control character list: http://

Python solution for control characters :(not 11 verified)

Plan 1:

strip_control_characters = lambda s:"".join(i for i in s if 31<ord(i)<127)

Scheme 2:

def strip_control_characters(str_input): 
 if str_input: 
 import re 
 # unicode invalid characters 
 RE_XML_ILLEGAL = u'([\u0000-\u0008\u000b-\u000c\u000e-\u001f\ufffe-\uffff])' + \
   u'|' + \
   u'([%s-%s][^%s-%s])|([^%s-%s][%s-%s])|([%s-%s]$)|(^[%s-%s])' % \
 str_input = re.sub(RE_XML_ILLEGAL, "", input) 
 # ascii control characters 
 str_input = re.sub(r"[\x01-\x1F\x7F]", "", input) 
 return str_input

Solution 3:

import re
def remove_control_chars(s):
 control_chars = ''.join(map(unichr, range(0,32) + range(127,160)))
 control_char_re = re.compile('[%s]' % re.escape(control_chars))
 return control_char_re.sub('', s)
cleaned_json = remove_control_chars(original_json)
obj = simplejson.loads(cleaned_json)


The above is the whole content of this article, I hope the content of this article can help you to learn or use python, if you have any questions, you can leave a message to communicate.

Related articles: