Detailed explanation of Python small data pool and code block cache mechanism

  • 2021-10-25 07:19:04
  • OfStack

Preface to the table of contents Summary: 1. Caching mechanism for code blocks 2. Small data pools 3. Advantages and disadvantages Small integer object pool
Large integer object pool
String resident mechanism
Simple principle:

Preface

Except for "summary", the rest of this paper is a cognitive process; 3.7. 5; I don't know where to find this part of the official document. I haven't found it at present. Who knows what can trouble me to leave a message? Thank you!

Summary:

If it is under the same 1 code block, the cache mechanism under the same 1 code block is adopted;
If it is a different code block, the resident mechanism of small data pool is adopted;
It should be noted that when entering interactively, each command is a code block;

The way to realize Intern retention mechanism is very simple, that is, by maintaining a string storage pool, which is a dictionary structure. When compiling, if the string already exists in the pool, it will not create a new string, and directly return the previously created string object.
If it has not been added to the pool before, a string object is constructed first, and this object is added to the pool to facilitate the next acquisition;

String 1 with lengths of 0 and 1 will definitely reside;
String residence occurs when the program is compiled;
The resident string must be composed of ASCll letters, numbers and underscores;

1. Caching mechanism for code blocks

Python programs are constructed from blocks of code. A block is the text of an Python program, which is executed as a unit.
Code block: 1 module, 1 function, 1 class, 1 file, etc. are all 1 code block;
Interaction: Enter Python interpreter in cmd, and every command input is a code block;

When Python executes the command of initializing the object of the same code block, it will check whether its value exists, and if so, it will be reused;
If the caching mechanism of code blocks is satisfied, there is only one of them in memory, that is, id is the same;
Scope of application of caching mechanism of code block: int (float), str, bool;

int (float): Any number is multiplexed under the same code block;
bool: True and False will exist in the dictionary as 1, 0 and reuse;
str: In the same code block, only 1 string with the same value exists in memory:


s1 = 'janes@ ! #*ewq'
s2 = 'janes@ ! #*ewq'
print(s1 is s2)	 # True 

a1 = 'janes45613256132!@#$%#^%@$%' * 1
b1 = 'janes45613256132!@#$%#^%@$%' * 1
print(a1 is b1) # True

s1 = 'hah_' * 6
s2 = 'hah_' * 6
print(s1 is s2) # True

2. Small data pools

Python automatically caches integers of-5 ~ 256. When you assign these integers to variables, you will not recreate the objects, but use the already created cached objects;
Python will meet the rule of 1 string in the string resident pool, create a copy, when you assign these strings to variables, will not recreate the object, but use the string resident pool to create a good object;
bool values are True and False. No matter how many variables you create point to True and False, there is only one in memory;

Small data pool is only for int (float), str, bool;;
Small data pool is a caching mechanism for different code blocks.


# cmd, -5~256  Although the small integer of is not the same 1 In the code block ,  But they apply the small data pool mechanism 
>>>a = 245
>>>b = 245
>>>a is b # True

#  The length is 0 And 1 String of 1 Will be resident ;
#  String residence occurs when the program is compiled ;
#  The string to be hosted must be specified by the  ASCll Alphabet ,  Composition of numbers and underscores ;
>>>s1 = '@'
>>>s2 = '@'
>>>s1 is s2 # True

>>>s1 = ''
>>>s2 = ''
>>>s1 is s2 # True

>>>s1 = 'a_b_c'
>>>s2 = 'a_b_c'
>>>s1 is s2 # True

>>>s1 = 'a b_c'
>>>s2 = 'a b_c'
>>>s1 is s2 # False

>>>s1 = 'a_b_c' * 1
>>>s2 = 'a_b_c' * 1
>>>s1 is s2 # True

>>>s1 = 'abd_d23' * 3
>>>s2 = 'abd_d23' * 3
>>>s1 is s2 # True

>>>a, b = "some_thing!", "some_thing!"
>>>a is b # False

>>>a, b = "some_thing", "some_thing"
>>>a is b # True

a1 = 1000
b1 = 1000
a1 is b1 # True

class C1(object): 
   a = 100
   b = 100
   c = 1000
   d = 1000
 
 
class C2(object):
   a = 100
   b = 1000

print(C1.a is C1.b)  # True
print(C1.a is C2.a)  # True
print(C1.c is C1.d)  # True
print(C1.c is C2.b)  # False

3. Advantages and disadvantages

Advantages: Strings with the same value (such as identifiers) are directly used from the pool, avoiding frequent creation and destruction, improving efficiency and saving memory;

Disadvantages: splicing strings, affecting performance such as string modification;
Because it is immutable, it is not inplace to modify the string in place, and it is necessary to create a new object, which is why it is not recommended to use + and join () when splicing multiple strings;
join () is the first to calculate the length of all strings, and then 11 copies, only new 1 object;

Small integer object pool

In order to avoid frequent application and destruction of memory space by integers, python uses a pool of small integer objects. Python defines small integers as [-5,256], and these integer objects are established in advance and will not be garbage collected;
In an Python program, no matter where this integer is in LEGB, all integers in this range use the same object;


# 3.7.5, ipython7.18.1
a = -5
b = -5
a is b # True

a = -6
b = -6
a is b # False

a = 256
b = 256
a is b # True

a = 257
b = 257
a is b # Flase

Large integer object pool

In cmd terminal, every time a large integer is assigned once, every large integer will be recreated. In Pycharm, every time it runs, all codes are loaded into memory and belong to a whole, so at this time, there will be a large integer object pool in one code block. The large integer is the same object;
c and d are in one code block, while C1.b and C2.b have their own code blocks, so they are not equal;


# cmd  Terminal 
a = 1000
b = 1000
a is b # False
--------------------
class C1(object): 
   a = 100
   b = 100
   c = 1000
   d = 1000
 
 
class C2(object):
   a = 100
   b = 1000

print(C1.a is C1.b)  # True
print(C1.a is C2.a)  # True
print(C1.c is C1.d)  # True ??  Don't  cmd  There are also large integer pools in  ??  Class is loaded in the 1 Block memory , Same value and same address  ?? 
print(C1.c is C2.b)  # False

# pycharm  Wait in the editor 
a = 1000
b = 1000
a is b # True
--------------------
class C1(object): 
   a = 100
   b = 100
   c = 1000
   d = 1000
 
 
class C2(object):
   a = 100
   b = 1000

print(C1.a is C1.b)  # True
print(C1.a is C2.a)  # True
print(C1.c is C1.d)  # True
print(C1.c is C2.b)  # False

String resident mechanism

In order to improve the efficiency and performance of string usage, Python interpreter uses intern (string resident) technology to improve string efficiency when compiling. What is intern mechanism? That is, string objects with the same value will only be saved in one copy and put in one string storage pool, which is common and certainly cannot be changed, which also determines that strings must be immutable objects (integer types are also immutable objects)? ? Floating-point numbers will not work;

Simple principle:

The way to implement the Intern retention mechanism is very simple, Is by maintaining a string deposit pool, This pool is a dictionary structure. When compiling, if the string already exists in the pool, no new string will be created, and the string object created before will be directly returned. If it has not been added to the pool before, a string object will be constructed first, and this object will be added to the pool for the next time. ;
However, the use strategy of intern mechanism inside the interpreter is elegant. Some scenarios will automatically use intern, and some places need to be started manually. Look at the following common scenarios:


# cmd  Floating point number in is not cached 
a = 1.0
b = 1.0
a is b # False

# cmd  Not all strings in the intern Mechanism ;  Only   Strings including underscores, numbers and letters will be used by  intern-- Class identifier 
s1="hello"
s2="hello"
s1 is s2 # True

#  If there are spaces, it is not enabled by default intern Mechanism 
s1="hell o"
s2="hell o"
s1 is s2 # False

s1 = "hell!*o"
s2 = "hell!*o"
print(s1 is s2) # False

#  If 1 String length exceeds 20 Characters, do not start intern Mechanism  --  Look, many of them are written like this on the Internet ,  Not exceeding 210 It's true, but I'm on my own  3.7/8.5  I tried it on the version 1 Next, I found that there seems to be no limit. I don't know if it is  Python  Updated, or what is the problem... 
s1 = "a" * 20
s2 = "a" * 20
s1 is s2 # True

s1 = "a" * 21
s2 = "a" * 21
s1 is s2 # True

s1 = "ab" * 10
s2 = "ab" * 10
s1 is s2 # True

s1 = "ab" * 11
s2 = "ab" * 11
s1 is s2 # True

# 'kz' + 'c'  Has become at compile time  'kzc' , and  s1 + 'c'  Medium  s1  Is a variable ,  Will be spliced at run time , So it was not intern?
'kz' + 'c' is 'kzc' # True

s1 = 'kz'
s2 = 'kzc'
s1+'c' is 'kzc' # False

# pycharm  In the editor, as long as it is the same as 1 Strings, all of which are  True Is not a string of underscores, numbers, or letters 
s1 = "hell o"
s2 = "hell o"
print(s1 is s2) # True

s1 = "hell!*o"
s2 = "hell!*o"
print(s1 is s2) # True

s1 = "a" * 20
s2 = "a" * 20
print(s1 is s2) # True

s1 = "a" * 21
s2 = "a" * 21
print(s1 is s2) # True

s1 = "ab" * 10
s2 = "ab" * 10
print(s1 is s2) # True

s1 = "ab" * 11
s2 = "ab" * 11
print(s1 is s2) # True

'kz' + 'c' is 'kzc' # True

s1 = 'kz'
s2 = 'kzc'
s1+'c' is 'kzc' # False

#  Editor, float  It is also cached 
a = 1.0
b = 1.0
a is b

The above is the detailed explanation of Python small data pool and code block caching mechanism, more information about Python small data pool and code block caching mechanism please pay attention to other related articles on this site!


Related articles: