Iterators in Python ramble

2020-04-02 14:33:53
OfStack

The problem arises when you loop through Python, and anyone familiar with Python knows that it doesn't have a for loop like the for loop in other languages, so you can only loop through it in a for in way. The most typical application is to generate a list through the range function, and then operate with for in, as follows:



#!/usr/bin/env python

for i in range(10):

    print i

The meaning of the code is easy to understand. Range will generate a list, and if you go through the list with "for in", you will get "for" (I = 0; i. < N; I++) the same effect, the range function explanation can (link: https://docs.python.org/2/library/functions.html#range). Problem again, the range of this object will produce a list, the list of the content of iron is stored in memory, when need to the number of cycle is too big, is quite memory, in order to statistics using a range of memory, I have done six times used, respectively use range to produce 100100, 00100, 000100000100000, 0100000 00 the length of the list, and then statistical memory footprint:



 The test code   memory 

range(100) 2.0MB

range(10000) 2.2MB

range(100000) 3.8MB

range(1000000) 19.5MB

range(10000000) 168.5MB

range(100000000) 1465.8MB

As you can see, with the increase of the cardinality, the footprint of memory increases geometrically, so it is obvious to avoid using range when performing large loop operations.

In order to solve the above problem, python provides another xrange () function, the function and the range is very similar, but smaller than range will be a lot of memory, relevant instructions can (link: https://docs.python.org/2/library/functions.html#range), after the test, using xrange object, no matter how much parameter is memory almost have no change. Again, how is xrange implemented inside, and why is the performance so different from range? To test my conjecture, I first tried to implement zrange, a function similar to xrange, in python:



#!/usr/bin/env python

class zrange(object):

    def __init__(self,stop):

        self.__pointer=0

        self.stop=stop

    def __iter__(self):  

        return self  

    def next(self): #python3.0 The switch to __next__

        if self.__pointer  >= self.stop:

            raise StopIteration

        else:

            self.__pointer = self.__pointer + 1

            return self.__pointer-1

test = zrange(10000000)

for i in test:

    print i

The result is the same as xrange. The memory footprint test of zrange shows that, like xrange, the size of the parameters has little impact on the memory footprint. So what's the difference between it and range?

As mentioned above, range produces a list, and both the custom zrange and the built-in xrange of the system produce an object. Objects like xrange or zrange are called iterable objects, which provide a way for the outside to traverse its internal elements without worrying about its internal implementation. In the above zrange implementation, the most critical implementation is to set up an internal pointer with s/s, which records the current access location, and the next access can be handled by the state of the pointer.

In Python and other languages, there are many similar ways to access the contents of an object iteratively, such as reading the contents of a file:



#!/usr/bin/env python

f = open('zrange.py','r')

while True:

    line = f.readline()

    if not line:

        break

    print line.strip()

f.close()

We all know that readline saves more resources than reandlines. In fact, readline and readlines are similar to xrange and range. One is to record the current position by pointer and move the pointer forward by one unit on the next access. In the file manipulation function, you can also manually adjust the position of the pointer by seeking to skip or repeatedly read something.

It can be said that in the implementation of an iterator, its internal pointer is the key to saving resources and making the iteration work properly.