Memory leaks in Python and usage analysis of gc modules

  • 2020-04-02 13:53:48
  • OfStack

Generally speaking, in Python, in order to solve the memory leak problem, object reference counting is adopted, and automatic garbage collection is realized based on reference counting.
Thanks to Python's automatic garbage collection, many beginners think they're living the good life without memory leaks. But a closer look at the Python documentation describing the function with s/s () shows that there are clouds on each of these happy days. Here's a excerpt from the document:

Some common situations that may prevent the reference count of an object from going to zero include: Circular references between objects (e.g., a doubly linked list or a tree data structure with the parent and child Pointers). A reference to the object on the stack frame of a function that caught an exception (the traceback stored in sys.exc_traceback keeps the stack frame alive); Or a reference to the object on the stack frame that raised an unhandled exception in interactive mode (the traceback stored in sys.last_traceback keeps the stack frame alive).

As you can see, the cyclic reference between objects with the function s/s () is the main cause of the memory leak.
Note also that looping references between Python objects with no function of cascade () can be garbage collected automatically.

How do you know if an object has a memory leak?

Method 1. When you think an object should be destroyed (i.e., the reference count is 0), you can get the reference count of the object by sys.getrefcount(obj), and judge whether there is a memory leak based on whether the return value is 0. If the reference count returned is not zero, the object obj cannot be collected by the garbage collector at this point.

You can also use the Python extension module gc to see the details of objects that cannot be recycled.


First, take a look at the normal test code:


#--------------- code begin --------------
# -*- coding: utf-8 -*-
import gc
import sys

class CGcLeak(object):
  def __init__(self):
    self._text = '#'*10

  def __del__(self):
    pass

def make_circle_ref():
  _gcleak = CGcLeak()
#  _gcleak._self = _gcleak # test_code_1
  print '_gcleak ref count0:%d' % sys.getrefcount(_gcleak)
  del _gcleak
  try:
    print '_gcleak ref count1:%d' % sys.getrefcount(_gcleak)
  except UnboundLocalError:
    print '_gcleak is invalid!'

def test_gcleak():
  # Enable automatic garbage collection.
  gc.enable()
  # Set the garbage collection debugging flags.
  gc.set_debug(gc.DEBUG_COLLECTABLE | gc.DEBUG_UNCOLLECTABLE | /
    gc.DEBUG_INSTANCES | gc.DEBUG_OBJECTS)

  print 'begin leak test...'
  make_circle_ref()

  print 'begin collect...'
  _unreachable = gc.collect()
  print 'unreachable object num:%d' % _unreachable
  print 'garbage object num:%d' % len(gc.garbage)

if __name__ == '__main__':
  test_gcleak()

In test_gcleak(), after the garbage collector debug flag is set, collect() is used for garbage collection, and finally the number of garbage objects found by the garbage collection and the number of garbage objects in the entire interpreter are printed.

Gc.garbage is a list object whose items are found to be unreachable (that is, garbage objects) by the garbage collector, but cannot be released (that is, not recycled). Document description: A list of objects which the collector found to be unreachable but could not be freed (uncollectable objects).
In general, the objects in gc.garbage are the objects in the reference ring. Because Python doesn't know what safe order to call the function of s/s () of objects in the loop, the object always lives in gc.garbage, resulting in a memory leak. If you know a safe order, break the reference loop and execute del gc.garbage[:] to clear the garbage object list.

The output of the above code is (the string after # is annotated by the author) :


#-----------------------------------------
begin leak test...
#  variable  _gcleak  The reference count is  2.
_gcleak ref count0:2
# _gcleak  Become inaccessible (unreachable) Illegal variable of .
_gcleak is invalid!
#  Start garbage collection 
begin collect...
#  The number of unreachable garbage objects found in this garbage collection is  0.
unreachable object num:0
#  The number of garbage objects in the entire interpreter is  0.
garbage object num:0
#-----------------------------------------

This shows that the reference count of the _gcleak object is correct and that no memory leaks have occurred.

If you do not comment out the test_code_1 statement in make_circle_ref() :


_gcleak._self = _gcleak

That is, let _gcleak form a self-referential loop to itself. Run the above code again, and the output will be:


#-----------------------------------------
begin leak test...
_gcleak ref count0:3
_gcleak is invalid!
begin collect...
#  Find garbage objects that can be recycled :  Address is  012AA090 That type of  CGcLeak.
gc: uncollectable <CGcLeak 012AA090>
gc: uncollectable <dict 012AC1E0>
unreachable object num:2
#!!  The number of garbage objects that cannot be recycled is  1 , causing a memory leak! 
garbage object num:1
#-----------------------------------------

visible < 012 aa090 CGcLeak > Object memory leak!! The extra dict garbage is the dictionary of the leaked _gcleak object. The printed dictionary information is:


{'_self': <__main__.CGcLeak object at 0x012AA090>, '_text': '##########'}

In addition to circular references to themselves, circular references between multiple objects can also cause memory leaks. Simple examples are as follows:


#--------------- code begin --------------

class CGcLeakA(object):
  def __init__(self):
    self._text = '#'*10

  def __del__(self):
    pass

class CGcLeakB(object):
  def __init__(self):
    self._text = '*'*10

  def __del__(self):
    pass

def make_circle_ref():
  _a = CGcLeakA()
  _b = CGcLeakB()
  _a._b = _b # test_code_2
  _b._a = _a # test_code_3
  print 'ref count0:a=%d b=%d' % /
    (sys.getrefcount(_a), sys.getrefcount(_b))
#  _b._a = None  # test_code_4
  del _a
  del _b
  try:
    print 'ref count1:a=%d' % sys.getrefcount(_a)
  except UnboundLocalError:
    print '_a is invalid!'
  try:
    print 'ref count2:b=%d' % sys.getrefcount(_b)
  except UnboundLocalError:
    print '_b is invalid!'

#--------------- code end ----------------

The output result after this test is:


#-----------------------------------------
begin leak test...
ref count0:a=3 b=3
_a is invalid!
_b is invalid!
begin collect...
gc: uncollectable <CGcLeakA 012AA110>
gc: uncollectable <CGcLeakB 012AA0B0>
gc: uncollectable <dict 012AC1E0>
gc: uncollectable <dict 012AC0C0>
unreachable object num:4
garbage object num:2
#-----------------------------------------

You can see that _a and _b objects have memory leaks. Since both are cyclic references, the garbage collector does not know how to recycle, i.e., it does not know which object to call first with the s/s function.

Memory leaks can be avoided by breaking the circular reference in either of the following ways:

1. Comment out the test_code_2 statement in make_circle_ref();
2. Comment out the test_code_3 statement in make_circle_ref();
3. Uncomment the test_code_4 statement in make_circle_ref().

The corresponding output result becomes:


#-----------------------------------------
begin leak test...
ref count0:a=2 b=3 #  Note: the output here varies according to the situation .
_a is invalid!
_b is invalid!
begin collect...
unreachable object num:0
garbage object num:0
#-----------------------------------------

Conclusion: Python's gc has strong features, such as setting gc.set_debug(gc.debug_leak) to check for memory leaks caused by circular references. If memory leak checking is done during development; Being able to ensure that there are no memory leaks at release time can improve performance by extending Python's garbage collection interval or even actively turning the garbage collection mechanism off.


Related articles: