Detailed explanation of Python small data pool and code block cache mechanism
- 2021-10-25 07:19:04
- OfStack
Large integer object pool
String resident mechanism
Simple principle:
Preface
Except for "summary", the rest of this paper is a cognitive process; 3.7. 5; I don't know where to find this part of the official document. I haven't found it at present. Who knows what can trouble me to leave a message? Thank you!
Summary:
If it is under the same 1 code block, the cache mechanism under the same 1 code block is adopted;
If it is a different code block, the resident mechanism of small data pool is adopted;
It should be noted that when entering interactively, each command is a code block;
The way to realize Intern retention mechanism is very simple, that is, by maintaining a string storage pool, which is a dictionary structure. When compiling, if the string already exists in the pool, it will not create a new string, and directly return the previously created string object.
If it has not been added to the pool before, a string object is constructed first, and this object is added to the pool to facilitate the next acquisition;
String 1 with lengths of 0 and 1 will definitely reside;
String residence occurs when the program is compiled;
The resident string must be composed of ASCll letters, numbers and underscores;
1. Caching mechanism for code blocks
Python programs are constructed from blocks of code. A block is the text of an Python program, which is executed as a unit.
Code block: 1 module, 1 function, 1 class, 1 file, etc. are all 1 code block;
Interaction: Enter Python interpreter in cmd, and every command input is a code block;
When Python executes the command of initializing the object of the same code block, it will check whether its value exists, and if so, it will be reused;
If the caching mechanism of code blocks is satisfied, there is only one of them in memory, that is, id is the same;
Scope of application of caching mechanism of code block: int (float), str, bool;
int (float): Any number is multiplexed under the same code block;
bool: True and False will exist in the dictionary as 1, 0 and reuse;
str: In the same code block, only 1 string with the same value exists in memory:
s1 = 'janes@ ! #*ewq'
s2 = 'janes@ ! #*ewq'
print(s1 is s2) # True
a1 = 'janes45613256132!@#$%#^%@$%' * 1
b1 = 'janes45613256132!@#$%#^%@$%' * 1
print(a1 is b1) # True
s1 = 'hah_' * 6
s2 = 'hah_' * 6
print(s1 is s2) # True
2. Small data pools
Python automatically caches integers of-5 ~ 256. When you assign these integers to variables, you will not recreate the objects, but use the already created cached objects;
Python will meet the rule of 1 string in the string resident pool, create a copy, when you assign these strings to variables, will not recreate the object, but use the string resident pool to create a good object;
bool values are True and False. No matter how many variables you create point to True and False, there is only one in memory;
Small data pool is only for int (float), str, bool;;
Small data pool is a caching mechanism for different code blocks.
# cmd, -5~256 Although the small integer of is not the same 1 In the code block , But they apply the small data pool mechanism
>>>a = 245
>>>b = 245
>>>a is b # True
# The length is 0 And 1 String of 1 Will be resident ;
# String residence occurs when the program is compiled ;
# The string to be hosted must be specified by the ASCll Alphabet , Composition of numbers and underscores ;
>>>s1 = '@'
>>>s2 = '@'
>>>s1 is s2 # True
>>>s1 = ''
>>>s2 = ''
>>>s1 is s2 # True
>>>s1 = 'a_b_c'
>>>s2 = 'a_b_c'
>>>s1 is s2 # True
>>>s1 = 'a b_c'
>>>s2 = 'a b_c'
>>>s1 is s2 # False
>>>s1 = 'a_b_c' * 1
>>>s2 = 'a_b_c' * 1
>>>s1 is s2 # True
>>>s1 = 'abd_d23' * 3
>>>s2 = 'abd_d23' * 3
>>>s1 is s2 # True
>>>a, b = "some_thing!", "some_thing!"
>>>a is b # False
>>>a, b = "some_thing", "some_thing"
>>>a is b # True
a1 = 1000
b1 = 1000
a1 is b1 # True
class C1(object):
a = 100
b = 100
c = 1000
d = 1000
class C2(object):
a = 100
b = 1000
print(C1.a is C1.b) # True
print(C1.a is C2.a) # True
print(C1.c is C1.d) # True
print(C1.c is C2.b) # False
3. Advantages and disadvantages
Advantages: Strings with the same value (such as identifiers) are directly used from the pool, avoiding frequent creation and destruction, improving efficiency and saving memory;
Disadvantages: splicing strings, affecting performance such as string modification;
Because it is immutable, it is not inplace to modify the string in place, and it is necessary to create a new object, which is why it is not recommended to use + and join () when splicing multiple strings;
join () is the first to calculate the length of all strings, and then 11 copies, only new 1 object;
Small integer object pool
In order to avoid frequent application and destruction of memory space by integers, python uses a pool of small integer objects. Python defines small integers as [-5,256], and these integer objects are established in advance and will not be garbage collected;
In an Python program, no matter where this integer is in LEGB, all integers in this range use the same object;
# 3.7.5, ipython7.18.1
a = -5
b = -5
a is b # True
a = -6
b = -6
a is b # False
a = 256
b = 256
a is b # True
a = 257
b = 257
a is b # Flase
Large integer object pool
In cmd terminal, every time a large integer is assigned once, every large integer will be recreated. In Pycharm, every time it runs, all codes are loaded into memory and belong to a whole, so at this time, there will be a large integer object pool in one code block. The large integer is the same object;
c and d are in one code block, while C1.b and C2.b have their own code blocks, so they are not equal;
# cmd Terminal
a = 1000
b = 1000
a is b # False
--------------------
class C1(object):
a = 100
b = 100
c = 1000
d = 1000
class C2(object):
a = 100
b = 1000
print(C1.a is C1.b) # True
print(C1.a is C2.a) # True
print(C1.c is C1.d) # True ?? Don't cmd There are also large integer pools in ?? Class is loaded in the 1 Block memory , Same value and same address ??
print(C1.c is C2.b) # False
# pycharm Wait in the editor
a = 1000
b = 1000
a is b # True
--------------------
class C1(object):
a = 100
b = 100
c = 1000
d = 1000
class C2(object):
a = 100
b = 1000
print(C1.a is C1.b) # True
print(C1.a is C2.a) # True
print(C1.c is C1.d) # True
print(C1.c is C2.b) # False
String resident mechanism
In order to improve the efficiency and performance of string usage, Python interpreter uses intern (string resident) technology to improve string efficiency when compiling. What is intern mechanism? That is, string objects with the same value will only be saved in one copy and put in one string storage pool, which is common and certainly cannot be changed, which also determines that strings must be immutable objects (integer types are also immutable objects)? ? Floating-point numbers will not work;
Simple principle:
The way to implement the Intern retention mechanism is very simple, Is by maintaining a string deposit pool, This pool is a dictionary structure. When compiling, if the string already exists in the pool, no new string will be created, and the string object created before will be directly returned. If it has not been added to the pool before, a string object will be constructed first, and this object will be added to the pool for the next time. ;
However, the use strategy of intern mechanism inside the interpreter is elegant. Some scenarios will automatically use intern, and some places need to be started manually. Look at the following common scenarios:
# cmd Floating point number in is not cached
a = 1.0
b = 1.0
a is b # False
# cmd Not all strings in the intern Mechanism ; Only Strings including underscores, numbers and letters will be used by intern-- Class identifier
s1="hello"
s2="hello"
s1 is s2 # True
# If there are spaces, it is not enabled by default intern Mechanism
s1="hell o"
s2="hell o"
s1 is s2 # False
s1 = "hell!*o"
s2 = "hell!*o"
print(s1 is s2) # False
# If 1 String length exceeds 20 Characters, do not start intern Mechanism -- Look, many of them are written like this on the Internet , Not exceeding 210 It's true, but I'm on my own 3.7/8.5 I tried it on the version 1 Next, I found that there seems to be no limit. I don't know if it is Python Updated, or what is the problem...
s1 = "a" * 20
s2 = "a" * 20
s1 is s2 # True
s1 = "a" * 21
s2 = "a" * 21
s1 is s2 # True
s1 = "ab" * 10
s2 = "ab" * 10
s1 is s2 # True
s1 = "ab" * 11
s2 = "ab" * 11
s1 is s2 # True
# 'kz' + 'c' Has become at compile time 'kzc' , and s1 + 'c' Medium s1 Is a variable , Will be spliced at run time , So it was not intern?
'kz' + 'c' is 'kzc' # True
s1 = 'kz'
s2 = 'kzc'
s1+'c' is 'kzc' # False
# pycharm In the editor, as long as it is the same as 1 Strings, all of which are True Is not a string of underscores, numbers, or letters
s1 = "hell o"
s2 = "hell o"
print(s1 is s2) # True
s1 = "hell!*o"
s2 = "hell!*o"
print(s1 is s2) # True
s1 = "a" * 20
s2 = "a" * 20
print(s1 is s2) # True
s1 = "a" * 21
s2 = "a" * 21
print(s1 is s2) # True
s1 = "ab" * 10
s2 = "ab" * 10
print(s1 is s2) # True
s1 = "ab" * 11
s2 = "ab" * 11
print(s1 is s2) # True
'kz' + 'c' is 'kzc' # True
s1 = 'kz'
s2 = 'kzc'
s1+'c' is 'kzc' # False
# Editor, float It is also cached
a = 1.0
b = 1.0
a is b
The above is the detailed explanation of Python small data pool and code block caching mechanism, more information about Python small data pool and code block caching mechanism please pay attention to other related articles on this site!