Python Advanced: Generator Lazy Version Iterator Explanation

  • 2021-07-06 11:16:22
  • OfStack

Talking from containers and iterative objects

All containers are iterative (iterable), and the iterator provides an next method. iter () returns 1 iterator, which is traversed by the next () function.


def is_iterable(param):
try: 
iter(param) 
return True
except TypeError:
return False
params = [
1234,
'1234',
[1, 2, 3, 4],
set([1, 2, 3, 4]),
{1:1, 2:2, 3:3, 4:4},
(1, 2, 3, 4)
]
for param in params:
print('{} is iterable? {}'.format(param, is_iterable(param)))
##########  Output  ##########
# 1234 is iterable? False
# 1234 is iterable? True
# [1, 2, 3, 4] is iterable? True
# {1, 2, 3, 4} is iterable? True
# {1: 1, 2: 2, 3: 3, 4: 4} is iterable? True
# (1, 2, 3, 4) is iterable? True

All data structures except numbers are iterative.

What is a generator

The generator is the lazy version of the iterator. Example:


import os
import psutil

# Displays the current  python  Memory size occupied by program 
def show_memory_info(hint):
pid = os.getpid()
p = psutil.Process(pid)

info = p.memory_full_info()
memory = info.uss / 1024. / 1024
print('{} memory used: {} MB'.format(hint, memory))

def test_iterator():
show_memory_info('initing iterator')
list_1 = [i for i in range(100000000)]
show_memory_info('after iterator initiated')
print(sum(list_1))
show_memory_info('after sum called')

def test_generator():
show_memory_info('initing generator')
list_2 = (i for i in range(100000000))
show_memory_info('after generator initiated')
print(sum(list_2))
show_memory_info('after sum called')

test_iterator()
test_generator()
%time test_iterator()
%time test_generator()

#########  Output  ##########

initing iterator memory used: 48.9765625 MB
after iterator initiated memory used: 3920.30078125 MB
4999999950000000
after sum called memory used: 3920.3046875 MB
Wall time: 17 s
initing generator memory used: 50.359375 MB
after generator initiated memory used: 50.359375 MB
4999999950000000
after sum called memory used: 50.109375 MB
Wall time: 12.5 s

[i for i in range (100000000)] declares an iterator, and each element is saved to memory after being generated, which takes up a huge amount of memory. (i for i in range (100000000)) initializes a generator. It can be seen that the generator does not occupy a lot of memory like iterator 1. Compared with test_iterator (), test_generator () saves the process of generating 100 million elements at one time. When next () is called, the next variable will be generated.

What tricks can the generator play

There is an identity in mathematics, (1 + 2 + 3 +... + n) ^ 2 = 1 ^ 3 + 2 ^ 3 + 3 ^ 3 +... + n ^ 3, expressed in the following code


def generator(k):
i = 1
while True:
yield i ** k
i += 1

gen_1 = generator(1)
gen_3 = generator(3)
print(gen_1)
print(gen_3)

def get_sum(n):
sum_1, sum_3 = 0, 0
for i in range(n):
next_1 = next(gen_1)
next_3 = next(gen_3)
print('next_1 = {}, next_3 = {}'.format(next_1, next_3))
sum_1 += next_1
sum_3 += next_3
print(sum_1 * sum_1, sum_3)

get_sum(8)

##########  Output  ##########

# <generator object generator at 0x000001E70651C4F8>
# <generator object generator at 0x000001E70651C390>
# next_1 = 1, next_3 = 1
# next_1 = 2, next_3 = 8
# next_1 = 3, next_3 = 27
# next_1 = 4, next_3 = 64
# next_1 = 5, next_3 = 125
# next_1 = 6, next_3 = 216
# next_1 = 7, next_3 = 343
# next_1 = 8, next_3 = 512
# 1296 1296

The function generator (), which returns a generator that pauses and takes i ** k as the return value of next () when running to yield i ** k. Each time next (gen) is called, the paused program starts and executes, and the value of i is remembered and continues to accumulate until next_1 is 8 and next_3 is 512.

A closer look at this example reveals that the iterator is a finite set and the generator can be an infinite set. Calling next (), the generator will automatically generate new elements according to the operation, and then return to you, very convenient.

Let's look at another problem: Given an list and a specified number, find the position of this number in list:


# Conventional writing 
def index_normal(L, target):
result = []
for i, num in enumerate(L):
if num == target:
result.append(i)
return result
print(index_normal([1, 6, 2, 4, 5, 2, 8, 6, 3, 2], 2))
##########  Output  ##########
[2, 5, 9]
# Generator writing 
def index_generator(L, target):
for i, num in enumerate(L):
if num == target:
yield i
print(list(index_generator([1, 6, 2, 4, 5, 2, 8, 6, 3, 2], 2)))
#########  Output  ##########
[2, 5, 9]

Look at another example:

Find sub-sequence: Given two strings a and b, find whether the string a is a sub-sequence of the string b, so-called sub-sequence, that is, one sequence is contained in another sequence and the order is 1

Algorithm: Two pointers are used to point to the headers of two strings, and then move back to find the same value. If one of the pointers runs through the whole string and there is no same value, it is not a sub-sequence


def is_subsequence(a, b):
b = iter(b)
return all(i in b for i in a)
print(is_subsequence([1, 3, 5], [1, 2, 3, 4, 5]))
print(is_subsequence([1, 4, 3], [1, 2, 3, 4, 5]))
#########  Output  ##########
True
False

The following code is an evolved version of the above code


def is_subsequence(a, b):
b = iter(b)
print(b)

gen = (i for i in a)
print(gen)

for i in gen:
print(i)

gen = ((i in b) for i in a)
print(gen)

for i in gen:
print(i)

return all(((i in b) for i in a))

print(is_subsequence([1, 3, 5], [1, 2, 3, 4, 5]))
print(is_subsequence([1, 4, 3], [1, 2, 3, 4, 5]))

##########  Output  ##########

# <list_iterator object at 0x000001E7063D0E80>
# <generator object is_subsequence.<locals>.<genexpr> at 0x000001E70651C570>
# 1
# 3
# 5
# <generator object is_subsequence.<locals>.<genexpr> at 0x000001E70651C5E8>
# True
# True
# True
# False
# <list_iterator object at 0x000001E7063D0D30>
# <generator object is_subsequence.<locals>.<genexpr> at 0x000001E70651C5E8>
# 1
# 4
# 3
# <generator object is_subsequence.<locals>.<genexpr> at 0x000001E70651C570>
# True
# True
# False
# False

First, iter (b) turns b into an iterator. The purpose is to implement the next function internally, and (i for i in a) will produce a generator, as well as ((i in b) for i in a). Then (i in b) is equal to:


while True:
val = next(b)
if val == i:
yield True

This is a very clever use of the generator's features, and when the next () function runs, it saves the current pointer. For example, the following example


b = (i for i in range(5))
print(2 in b)
print(4 in b)
print(3 in b)
##########  Output  ##########
True
True
False

Related articles: