Generators and yield in Python are described in detail

  • 2020-04-02 14:29:47
  • OfStack

List derivation with generator expressions

When we create a list, we create an object that we can iterate over:


>>> squares=[n*n for n in range(3)]
>>> for i in squares:
 print i
 
0
1
4

This type of list creation is common and is called list derivation. But iterators like lists, such as STR, file, and so on, are easy to use, but they are stored in memory and can be cumbersome if the values are large.

Unlike the generator expression, which performs the same computation as the list but iterates over the generated results. The syntax is the same as the list derivation, except that the brackets are replaced by braces:


>>> squares=(n*n for n in range(3))
>>> for i in squares:
 print i
 
0
1
4

Instead of creating an object in the form of a sequence and reading all the values into memory, a Generator object is created that iterates and generates values as required.

So, are there other ways to generate a generator?

Example: Fibonacci sequence

For example, if there is a requirement to generate the first 10 bits of the Fibonacci sequence, we can write:


def fib(n):
    result=[]
    a=1
    b=1
    result.append(a)
    for i in range(n-1):
        a,b=b,a+b
        result.append(a)
    return result
if __name__=='__main__':
    print fib(10)

Functions work well when Numbers are low, but problems arise when Numbers are high, and it's not a good idea to generate lists of thousands and thousands of Numbers.

In this way, the requirement becomes: write a function that generates an iterable object, or, rather than having the function return all the values at once, return one value at a time.

This seems to contradict our common sense that when we call a normal Python function, we usually start with the first line of the function and end with a return statement, an exception, or a function (which can be thought of as implicitly returning None):


def fib(n):
    a=1
    b=1
    for i in range(n-1):
        a,b=b,a+b
        return a
if __name__=='__main__':
    print fib(10)
>>>
1    # It gets stuck when it returns the first value

Once the function returns control to the caller, it is all over. All the work done in the function and the data stored in the local variables will be lost. When this function is called again, everything is created from scratch. The function has only one chance to return results, so it must return all results at once. That's what we usually think. But what if they are not? See the amazing yield:

def fib(n):
    a=1
    yield a
    b=1
    for i in range(n-1):
        a,b=b,a+b
        yield a
if __name__=='__main__':
    for i in fib(10):
        print i
>>>
1
1
2
3
5
8
13
21
34

The Generator Generator

The definition of a generator in python is simple. A function that USES the yield keyword can be called a generator, which generates a sequence of values:


def countdown(n):
    while n>0:
        yield n
        n-=1
if __name__=='__main__':
    for i in countdown(10):
        print i

The generator function returns the generator. Note that generators are a special class of iterators. As an iterator, the generator must define several methods, one of which is s/s. Like the iterator, we can use the next() function (Python3 is) to get the next value:

>>> c=countdown(10)
>>> c.next()
10
>>> c.next()
9

Each time the generator is called, it returns a value to the caller. Use yield inside the generator to complete this action. The easiest way to remember what yield does is to think of it as a special return for a generator function. When next() is called, the generator function continues to execute statements until the yield is encountered, where the "state" of the generator function is frozen, the values of all variables are preserved, and the location of the next line of code to execute is recorded until next() is called again to continue the statement after the yield.

Next () cannot execute indefinitely, and when the iteration ends, a StopIteration exception is thrown. If you want to end the generator before the iteration is over, you can use the close() method.


>>> c.next()
1
>>> c.next()
StopIteration
>>> c=countdown(10)
>>> c.next()
10
>>> c.close()
>>> c.next()
StopIteration

Coroutine and yield expression

The yield statement also has the more powerful function of appearing as a statement to the right of the assignment operator, accepting a value, or both.


def recv():
    print 'Ready'
    while True:
        n=yield
        print 'Go %s'%n
>>> c=recv()
>>> c.next()
Ready
>>> c.send(1)
Go 1
>>> c.send(2)
Go 2

A function that USES the yield statement in this way is called a coroutine. In this case, the initial call to next() is necessary so that the coroutine can execute a statement that leads to the first yield expression. Here the coroutine hangs, waiting for the relevant generator object's send() method to send it a value. The value passed to send() is returned by the yield expression in the coroutine.

The coroutine runs indefinitely, and can be explicitly closed using the method close().

If a value is provided in a yield expression, the coroutine can receive and emit the return value using the yield statement.


def split_line():
    print 'ready to split'
    result=None
    while True:
        line=yield result
        result=line.split()
>>> s=split_line()
>>> s.next()
ready to split
>>> s.send('1 2 3')
['1', '2', '3']
>>> s.send('a b c')
['a', 'b', 'c']

Note: it is important to understand the order of precedence in this example. The first next() method executes the coroutine to yield result, which returns the value of result, None. In the following send() call, the received values are put into line and split into result. The return value of the send() method is the value of the next yield statement. That is, the send() method can pass a value to a yield expression, but its return value comes from the next yield expression, not the yield expression that receives the value sent ().

If you want to use the send() method to start the execution of the coroutine, you must first send a value of None, because there is no yield statement to accept the value, otherwise an exception will be thrown.


>>> s=split_line()
>>> s.send('1 2 3')
TypeError: can't send non-None value to a just-started generator
>>> s=split_line()
>>> s.send(None)
ready to split

Use generators and coroutines

At first glance, it may not seem obvious how to use generators and coroutines to solve practical problems. But generators and coroutines are especially useful for solving certain problems in systems, networks, and distributed computing. In fact, yield has become one of Python's most powerful keywords.

For example, to create a file pipeline:


import os,sys
def default_next(func):
    def start(*args,**kwargs):
        f=func(*args,**kwargs)
        f.next()
        return f
    return start
@default_next
def find_files(target):
    topdir=yield
    while True:
        for path,dirname,filelist in os.walk(topdir):
            for filename in filelist:
                target.send(os.path.join(path,filename)) @default_next
def opener(target):
    while True:
        name=yield
        f=open(name)
        target.send(f)
   
@default_next
def catch(target):
    while True:
        f=yield
        for line in f:
            target.send(line)
           
@default_next
def printer():
    while True:
        line=yield
        print line

Then connect these coroutines to create a data flow processing pipeline:

finder=find_files(opener(catch(printer())))
finder.send(toppath)

Execution of the program is driven entirely by sending data to the first coroutine, find_files(), which remains active forever until it explicitly calls close().

In short, generators are very powerful. Coroutines can be used to achieve some form of concurrency. In some types of applications, you can implement collaborative user-space multithreading, or greenlets, with a task scheduler and some generators or cooperators. The power of yield will be truly demonstrated in cooperative multitasking, cooperative multitasking, and asynchronous IO.


Related articles: