Explain yield and generator in python from the simple to the deep

  • 2020-05-27 06:21:58
  • OfStack

preface

This paper will introduce yield and generator in detail, including the following contents: what generator is, the method to generate generator, the features of generator, the basic and advanced application scenarios of generator, and the precautions in the use of generator. This article does not include enhanced generator or pep342, which will be covered later.

generator basis

In the function (function) definition of python, whenever the yield expression (Yield expression) is present, then you actually define 1 generator function, call this generator function The return value is 1 generator. This is different from a normal function call, For example:


def gen_generator():
 yield 1

def gen_value():
 return 1
 
if __name__ == '__main__':
 ret = gen_generator()
 print ret, type(ret) #<generator object gen_generator at 0x02645648> <type 'generator'>
 ret = gen_value()
 print ret, type(ret) # 1 <type 'int'>

As you can see from the code above, gen_generator The function returns an instance of generator

generator has the following special features:

The & # 8226; Following the iterator (iterator) protocol, the iterator protocol needs to be implemented __iter__ , next interface

The & # 8226; It can enter and return multiple times, and it can pause the execution of the code in the body of the function

Here's a look at the test code:


>>> def gen_example():
... print 'before any yield'
... yield 'first yield'
... print 'between yields'
... yield 'second yield'
... print 'no yield anymore'
... 
>>> gen = gen_example()
>>> gen.next() #   The first 1 Time to call next
before any yield
'first yield'
>>> gen.next() #   The first 2 Time to call next
between yields
'second yield'
>>> gen.next() #   The first 3 Time to call next
no yield anymore
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
StopIteratio

Calling the gen example method does not output anything, indicating that the code for the body of the function has not yet started executing. When the next method of generator is called, generator executes to the yield expression, returns the content of the yield expression, and then suspends (suspends) in this place, so the first call to next prints the first sentence and returns "first yield". A pause means that the method's local variables, pointer information, and runtime environment are saved until the next call to the next method is restored. After the second call to next, pause at the last yield and call again next() Method will throw an StopIteration exception.

Because the for statement automatically catches StopIteration exceptions, the more common method for generator (essentially any iterator) is to use it in a loop:


def generator_example():
 yield 1
 yield 2

if __name__ == '__main__':
 for e in generator_example():
 print e
 # output 1 2

What is the difference between generator function and generator function

(1) function runs from the first line each time, and generator runs from the last yield

(2) function calls return 1 (group 1) value once, while generator can return multiple times

(3) function can be repeatedly called for countless times, and an generator instance cannot continue to be called after the last value of yield or return

Using Yield in a function and then calling that function is one way to generate generator. Another common way is to use it generator expression , For example:


  >>> gen = (x * x for x in xrange(5))
  >>> print gen
  <generator object <genexpr> at 0x02655710>

generator application

generator basic applications

The most important reason for using generator is that you can generate and "return" results on demand, rather than generating all the return values once, and sometimes you don't even know "all the return values" at all.

For example, for the following code


RANGE_NUM = 100
 for i in [x*x for x in range(RANGE_NUM)]: #  The first 1 Method: iterate over the list 
 # do sth for example
 print i

 for i in (x*x for x in range(RANGE_NUM)): #  The first 2 Way: right generator To iterate 
 # do sth for example
 print i

In the code above, the output of the two for statements looks like 1, which literally means the difference between brackets and braces. The first method returns a list, and the second method returns an generator object. As RANGE_NUM grows, the larger the list returned by the first method, the larger the memory footprint; But for the second method there is no difference.

Let's look at an example that can "return" an infinite number of times:


def fib():
 a, b = 1, 1
 while True:
 yield a
 a, b = b, a+b 

This generator has the ability to generate an infinite number of "return values," and the user can decide when to stop the iteration

generator advanced applications

Use scenario 1:

generator Generator can be used to generate data flow, is not immediately return values, but wait until is needed to produce the return value, the equivalent of an active process of pull (pull), such as there are now a log file, each line to produce 1 records, for every 1 records, people in different departments may handle in different ways, but we can provide a common, according to the need to generate data flow.


def gen_data_from_file(file_name):
 for line in file(file_name):
 yield line

def gen_words(line):
 for word in (w for w in line.split() if w.strip()):
 yield word

def count_words(file_name):
 word_map = {}
 for line in gen_data_from_file(file_name):
 for word in gen_words(line):
  if word not in word_map:
  word_map[word] = 0
  word_map[word] += 1
 return word_map

def count_total_chars(file_name):
 total = 0
 for line in gen_data_from_file(file_name):
 total += len(line)
 return total
 
if __name__ == '__main__':
 print count_words('test.txt'), count_total_chars('test.txt')

The above example is from PyCon1 lecture in 2008. gen_words gen_data_from_file Is the data producer, while count_words count_total_chars is the data consumer. As you can see, the data is only pulled when needed, not prepared in advance. In addition the gen_words (w for w in line.split() if w.strip()) It also produces an generator

Use scenario 2:

In some programming scenarios, one thing might need to execute one part of the logic, then wait for a period of time, or wait for an asynchronous result, or wait for a state, and then continue to execute another part of the logic. For example, in the microservice architecture, after the service A executes a piece of logic, it goes to the service B to request some data, and then continues the execution on the service A. Or in game programming, a skill can be divided into several parts. You can execute one part of the action (effect) first, wait for a period of time, and then continue. For situations like this, where you need to wait and you don't want to block, we use a callback (callback). Here's a simple example:


 def do(a):
 print 'do', a
 CallBackMgr.callback(5, lambda a = a: post_do(a))
 
 def post_do(a):
 print 'post_do', a

The CallBackMgr here registers a time after 5s, and calls it after 5s lambda Function, you can see that the 1 piece of logic is split into two functions, and the context needs to be passed (a here). Let's use yield to modify this example. yield return value represents the wait time.


 @yield_dec
 def do(a):
 print 'do', a
 yield 5
 print 'post_do', a

So here we need to implement 1 YieldManager, through yield_dec This decrator registers do generator to YieldManager and calls the next method after 5s. The Yield version implements the same functionality as callback 1, but it looks a lot clearer.

Here is a simple implementation for your reference:


# -*- coding:utf-8 -*-
import sys
# import Timer
import types
import time

class YieldManager(object):
 def __init__(self, tick_delta = 0.01):
 self.generator_dict = {}
 # self._tick_timer = Timer.addRepeatTimer(tick_delta, lambda: self.tick())

 def tick(self):
 cur = time.time()
 for gene, t in self.generator_dict.items():
  if cur >= t:
  self._do_resume_genetator(gene,cur)

 def _do_resume_genetator(self,gene, cur ):
 try:
  self.on_generator_excute(gene, cur)
 except StopIteration,e:
  self.remove_generator(gene)
 except Exception, e:
  print 'unexcepet error', type(e)
  self.remove_generator(gene)

 def add_generator(self, gen, deadline):
 self.generator_dict[gen] = deadline

 def remove_generator(self, gene):
 del self.generator_dict[gene]

 def on_generator_excute(self, gen, cur_time = None):
 t = gen.next()
 cur_time = cur_time or time.time()
 self.add_generator(gen, t + cur_time)

g_yield_mgr = YieldManager()

def yield_dec(func):
 def _inner_func(*args, **kwargs):
 gen = func(*args, **kwargs)
 if type(gen) is types.GeneratorType:
  g_yield_mgr.on_generator_excute(gen)

 return gen
 return _inner_func

@yield_dec
def do(a):
 print 'do', a
 yield 2.5
 print 'post_do', a
 yield 3
 print 'post_do again', a

if __name__ == '__main__':
 do(1)
 for i in range(1, 10):
 print 'simulate a timer, %s seconds passed' % i
 time.sleep(1)
 g_yield_mgr.tick()

Matters needing attention:

(1) Yield cannot be nested!


>>> def gen_example():
... print 'before any yield'
... yield 'first yield'
... print 'between yields'
... yield 'second yield'
... print 'no yield anymore'
... 
>>> gen = gen_example()
>>> gen.next() #   The first 1 Time to call next
before any yield
'first yield'
>>> gen.next() #   The first 2 Time to call next
between yields
'second yield'
>>> gen.next() #   The first 3 Time to call next
no yield anymore
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
StopIteratio
0

The code above accesses each element in the nested sequence, expecting the output to be 1, 2, 3, 4, and 5, when the actual output is 1, 2, and 5. Why is visit, as you can see in the comment, 1 generator function , so line 4 returns gen_generator0 , and the code does not iterate over the generator instance. So just change the code and iterate over this temporary generator.


>>> def gen_example():
... print 'before any yield'
... yield 'first yield'
... print 'between yields'
... yield 'second yield'
... print 'no yield anymore'
... 
>>> gen = gen_example()
>>> gen.next() #   The first 1 Time to call next
before any yield
'first yield'
>>> gen.next() #   The first 2 Time to call next
between yields
'second yield'
>>> gen.next() #   The first 3 Time to call next
no yield anymore
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
StopIteratio
1

Or it can be used in python 3.3 yield from , this syntax is added in pep380


>>> def gen_example():
... print 'before any yield'
... yield 'first yield'
... print 'between yields'
... yield 'second yield'
... print 'no yield anymore'
... 
>>> gen = gen_example()
>>> gen.next() #   The first 1 Time to call next
before any yield
'first yield'
>>> gen.next() #   The first 2 Time to call next
between yields
'second yield'
>>> gen.next() #   The first 3 Time to call next
no yield anymore
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
StopIteratio
2

(2) generator function USES return

In python doc, it is explicitly mentioned that return can be used, and an StopIteration exception is thrown when generator is executed here.


def gen_with_return(range_num):
 if range_num < 0:
 return
 else:
 for i in xrange(range_num):
  yield i

if __name__ == '__main__':
 print list(gen_with_return(-1))
 print list(gen_with_return(1))

However, generator function return cannot take any return value with it


>>> def gen_example():
... print 'before any yield'
... yield 'first yield'
... print 'between yields'
... yield 'second yield'
... print 'no yield anymore'
... 
>>> gen = gen_example()
>>> gen.next() #   The first 1 Time to call next
before any yield
'first yield'
>>> gen.next() #   The first 2 Time to call next
between yields
'second yield'
>>> gen.next() #   The first 3 Time to call next
no yield anymore
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
StopIteratio
4

The above code will report an error: SyntaxError: 'return' with argument inside generator

conclusion


Related articles: