Tutorial on using the optimizer to improve the performance of Python programs

2020-05-05 11:24:00
OfStack

It would be unwise to start optimizing without first thinking about this Knuth diction. However, if you quickly write code that adds some features, it can be ugly, and you need to be careful. This article is for that.

So here are some useful tools and patterns to quickly optimize Python. Its main purpose is simple: find bottlenecks quickly, fix them, and make sure you fix them.
writes a test

Before you start tuning, write an advanced test to prove that the original code is slow. You might need to take some minimum data set to reproduce it slowly enough. Usually one or two programs that show runtime seconds are enough to handle some improvements.

It is also necessary to have some basic tests to ensure that your optimizations do not change the behavior of the original code. You can also change the benchmark of the tests slightly while running them many times to optimize your code.

So now, let's look at the optimization tool.
simple timer

Timers are simple and one of the most flexible ways to record execution time. You can put it anywhere with minimal side effects. Running your own timer is easy, and you can customize it to work the way you want it to. For example, your simple timer would look like this:


import time
 
def timefunc(f):
 def f_timer(*args, **kwargs):
  start = time.time()
  result = f(*args, **kwargs)
  end = time.time()
  print f.__name__, 'took', end - start, 'time'
  return result
 return f_timer
 
def get_number():
 for x in xrange(5000000):
  yield x
 
@timefunc
def expensive_function():
 for x in get_number():
  i = x ^ x ^ x
 return 'some result!'
 
# prints "expensive_function took 0.72583088875 seconds"
result = expensive_function()

Of course, you can make it more powerful with context management by adding some checkpoints or some other feature:


import time
 
class timewith():
 def __init__(self, name=''):
  self.name = name
  self.start = time.time()
 
 @property
 def elapsed(self):
  return time.time() - self.start
 
 def checkpoint(self, name=''):
  print '{timer} {checkpoint} took {elapsed} seconds'.format(
   timer=self.name,
   checkpoint=name,
   elapsed=self.elapsed,
  ).strip()
 
 def __enter__(self):
  return self
 
 def __exit__(self, type, value, traceback):
  self.checkpoint('finished')
  pass
 
def get_number():
 for x in xrange(5000000):
  yield x
 
def expensive_function():
 for x in get_number():
  i = x ^ x ^ x
 return 'some result!'
 
# prints something like:
# fancy thing done with something took 0.582462072372 seconds
# fancy thing done with something else took 1.75355315208 seconds
# fancy thing finished took 1.7535982132 seconds
with timewith('fancy thing') as timer:
 expensive_function()
 timer.checkpoint('done with something')
 expensive_function()
 expensive_function()
 timer.checkpoint('done with something else')
 
# or directly
timer = timewith('fancy thing')
expensive_function()
timer.checkpoint('done with something')

The timer also requires you to do some digging. Wrap some of the more advanced functions, determine where the bottleneck is, and then drill down into the function to be able to repeat over and over again. When you find some inappropriate code, fix it, and then test it to make sure it's fixed.

A few tips: don't forget the handy timeit module! It is more useful for benchmarking small pieces of code than for actual investigation.

Timer advantages: easy to understand and implement. It is also very easy to compare after modification. It works in many languages. Disadvantages: sometimes it's a bit too simple for very complex code, and you may spend more time putting or moving the reference code than fixing the problem!

Built-in optimizer

Enabling the built-in optimizer is like using a cannon. It's very powerful, but it's a little bit awkward, and it's complicated to use and explain.

You can learn more about the profile module, but the basics are simple: you can enable and disable the optimizer, and it prints all the function calls and execution times. It compiles and prints the output for you. A simple decorator is as follows:


import cProfile
 
def do_cprofile(func):
 def profiled_func(*args, **kwargs):
  profile = cProfile.Profile()
  try:
   profile.enable()
   result = func(*args, **kwargs)
   profile.disable()
   return result
  finally:
   profile.print_stats()
 return profiled_func
 
def get_number():
 for x in xrange(5000000):
  yield x
 
@do_cprofile
def expensive_function():
 for x in get_number():
  i = x ^ x ^ x
 return 'some result!'
 
# perform profiling
result = expensive_function()

In the case of the above code, you should see something printed on the terminal, which reads:


5000003 function calls in 1.626 seconds
 
 Ordered by: standard name
 
 ncalls tottime percall cumtime percall filename:lineno(function)
 5000001 0.571 0.000 0.571 0.000 timers.py:92(get_number)
  1 1.055 1.055 1.626 1.626 timers.py:96(expensive_function)
  1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

As you can see, it gives you the number of times different functions are called, but it misses some crucial information: which function is running so slowly?

However, this is a good start for fundamental optimization. Sometimes it takes even less effort to find a solution. I often use it to debug programs before digging into which functions are slow or called too many times.

built-in advantages: no extra dependencies and very fast. Very useful for quick high level checks. Built-in disadvantages of : relatively limited information, requiring further debugging; Reporting is a bit less straightforward, especially for complex code.

Line Profiler

If the built-in optimizer is a cannon, then line profiler can be considered an ion cannon. It's very heavyweight and powerful.

In this case, we'll use the excellent line_profiler library. For ease of use, we'll repackage the decorator, which is a simple way to prevent it from being put into production code.


try:
 from line_profiler import LineProfiler
 
 def do_profile(follow=[]):
  def inner(func):
   def profiled_func(*args, **kwargs):
    try:
     profiler = LineProfiler()
     profiler.add_function(func)
     for f in follow:
      profiler.add_function(f)
     profiler.enable_by_count()
     return func(*args, **kwargs)
    finally:
     profiler.print_stats()
   return profiled_func
  return inner
 
except ImportError:
 def do_profile(follow=[]):
  "Helpful if you accidentally leave in production!"
  def inner(func):
   def nothing(*args, **kwargs):
    return func(*args, **kwargs)
   return nothing
  return inner
 
def get_number():
 for x in xrange(5000000):
  yield x
 
@do_profile(follow=[get_number])
def expensive_function():
 for x in get_number():
  i = x ^ x ^ x
 return 'some result!'
 
result = expensive_function()

If you run the above code, you can see the following report:


Timer unit: 1e-06 s
 
File: test.py
Function: get_number at line 43
Total time: 4.44195 s
 
Line #  Hits   Time Per Hit % Time Line Contents
==============================================================
 43           def get_number():
 44 5000001  2223313  0.4  50.1  for x in xrange(5000000):
 45 5000000  2218638  0.4  49.9   yield x
 
File: test.py
Function: expensive_function at line 47
Total time: 16.828 s
 
Line #  Hits   Time Per Hit % Time Line Contents
==============================================================
 47           def expensive_function():
 48 5000001  14090530  2.8  83.7  for x in get_number():
 49 5000000  2737480  0.5  16.3   i = x ^ x ^ x
 50   1   0  0.0  0.0  return 'some result!'

As you can see, there is a very detailed report that gives you complete insight into how the code works. Instead of the built-in cProfiler, it calculates the time spent on core features of the language, such as loops and imports, and gives the time spent on different lines.

These details make it easier for us to understand the inside of the function. If you are researching a third-party library, you can import it directly and add decorators to analyze it.

A few tips: just decorate your test function and use the problem function as the next argument.

Line Profiler advantages: very direct and detailed reporting. Ability to track functions in third-party libraries. Disadvantages: don't use Line Profiler because it makes your code run much slower than it really does. This is additional demand.

summary and best practices

You should use simpler tools to perform fundamental checks on test cases, and use slower but more detailed line_profiler to drill down into functions.

Nine times out of ten, you might find that looping through a function or an incorrect data structure consumes 90 percent of the time. Some tuning tools are perfect for you.

If you still feel this is too slow, instead use some of your own secret weapons, such as comparison property access techniques or tuning balance checking techniques. You can also use the following method:

1. Endure them slowly or cache them

2. Rethink the entire implementation of

3. More use of optimized data structures

4. Write an C extension

Note that optimizing code is a guilty pleasure! It's fun to speed up your Python code in the right way, but be careful not to break the logic. Readable code is more important than speed. It's better to cache it and then optimize it.