Solve the efficiency problem of converting python3 integer array to bytes

  • 2021-09-20 21:00:48
  • OfStack

Yesterday, when I was doing an CTF question, I encountered a picture XOR problem. The operation is as follows:

Read 1 picture in, and then perform XOR operation per byte. The core code can be simplified as follows:


#coding:utf-8
'''
 @DateTime: 2017-11-25 13:51:33
 @Version: 1.0
 @Author: Unname_Bao
'''
import six
key = b'\xdcd~\xb6^g\x11\xe1U7R\x18!+9d\xdcd~\xb6^g\x11\xe1U7R\x18!+9d'
with open('flag.encrypted','rb') as f:
 c = f.read()
flag = b''
for i in range(32):
 flag += six.int2byte(key[i%32]^c[i])
with open('flag.png','wb') as f:
 f.write(flag)

Then I encountered an efficiency problem. After running for more than 10 minutes, I didn't get the result. At first, I thought it was a type conversion problem. Because it was urgent, I changed it to C + + code to solve it. Later, I didn't think much about it.

Today, when I was free, I found that there was a very big problem in the code before the code:

Memory application problem

Since the flag. encrypted file size is 6.47 MB, my script idea is to keep adding after the byte array, but it ignores its essence.

That is, in the process of memory application, because the array length is finally 600 + W, there is a problem that there are many times when the group memory is insufficient and it is necessary to re-apply for memory. However, the memory application in python is obviously not as efficient as push_back in C + + vector.

Moreover, in python, no matter list, string or byte, there is no reserve function, so memory space cannot be reserved (at this time, it is really necessary to spit out the design of python for speed optimization).

Therefore, we can only use another method to optimize, that is, first apply for a required memory space with list, and then convert it to bytes.

The code is as follows:


#coding:utf-8
'''
 @DateTime: 2017-11-26 14:09:29
 @Version: 2.0
 @Author: Unname_Bao
'''
key = b'\xdcd~\xb6^g\x11\xe1U7R\x18!+9d\xdcd~\xb6^g\x11\xe1U7R\x18!+9d'
with open('flag.encrypted','rb') as f:
 c = f.read()
flag = list('1'*len(c))
for i in range(len(c)):
 flag[i] = key[i%32]^c[i]
flag = bytes(flag)
with open('flag.png','wb') as f:
 f.write(flag)

If you write this way, you will finish the task almost instantly, but it is still much slower than C + +, which is inevitable.

Supplement: bytes issues for python2 and python3


>>> s = ' Programming '
>>> print s
 Programming 
>>> s
'\xe7\xbc\x96\xe7\xa8\x8b'
>>>

In python2, if you call a string variable directly, it will print its bytes (which can be understood as the memory address of the string in hexadecimal, but essentially binary). In python2, bytes and str are the same thing.

Why do you want an bytes? Because all data are stored in binary system in essence, when transmitting data, it is necessary to convert these data into binary system (bytes) for transmission. In addition, there is a separate data type in python2. After decoding the string, it will become unicode.


>>> s
'\xe8\xb7\xaf\xe9\xa3\x9e' #utf-8
>>> s.decode('utf-8')
u'\u8def\u98de' #unicode  In unicode Corresponding positions in the coding table 
>>> print(s.decode('utf-8'))
 Luffy  #unicode  Characters in format 

The reason is that the default encoding of python2 is ASCII. Later, in order to support multiple languages, I wanted to get an unicode. But it is very difficult to convert ASCII to unicode directly, so Uncle Turtle directly made a new character type, which is called unicode. To put it bluntly, you have to save the string into unicode type first in memory

In 2008, python3 was born, and there was a big change:

1. The encoding of string is changed to unicode, and the default encoding of file is changed to utf-8.

2. Make a clear distinction between str and bytes. str is a character in unicode format, bytes is a simple binary system, and it is very important that in python3, only unicode shows you the glyphs, and other codes are displayed with bytes, which means that you are forced to use unicode.

At the end of the paper, I will give you a hint that Python only has a variety of coding problems, which is nothing more than where the coding settings are wrong

Common causes of coding errors are:

Default encoding of Python interpreter

Python source file encoding

Encoding used by Terminal

Language setting of operating system


Related articles: