Tutorial on using composite functions in the Python module

2020-05-09 18:50:36
OfStack

Understanding new concepts

The idea of iterators is introduced in Python V2.2. Well, it's not ten out of ten; The "germ" of this idea is already in the older function xrange() and the file method.xreadlines (). By introducing the yield keyword, Python 2.2 extends this 1 concept in many aspects of the internal implementation and makes it easier to program custom iterators (the advent of yield transforms functions into generators, which in turn return iterators).

The motivation behind an iterator is twofold. Processing data as a sequence is usually the easiest way to do this, and sequences processed in linear order do not always need to exist simultaneously.

The x*() precursor provides a clear example of these principles. If you want to perform an operation thousands of times, it may take some time to execute your program, but program 1 generally does not require a large memory footprint. Also, for many types of files, you can process them line by line without having to store the entire file in memory. It is best to lazily process all other kinds of sequences as well; They may depend on data arriving gradually through the channel, or on calculations performed step by step.

Most of the time, iterators are used in for loops, just like real sequences. The iterator provides the.next () method, which can be called explicitly, but there is a 9,109 percent chance that what you see is down:


for x in iterator:
  do_something_with(x)

The loop is terminated when an StopIteration exception is generated by a behind-the-scenes call to iterator.next (). By the way, by calling iter(seq), the actual sequence can be converted to an iterator - it doesn't save any memory, but it can be useful in the functions discussed below.

Python's evolving split personality

Python's views on functional programming (FP) are somewhat contradictory. On the one hand, many Python developers despise the traditional FP functions map(), filter(), and reduce(), and often recommend "list understanding" instead. But the full itertools module is made up of functions of exactly the same type as these functions, except that these functions operate on "lazy sequences" (iterators) instead of full sequences (lists, tuples). Also, Python 2.3 does not have any "iterator understanding" syntax, which seems to have the same motivation as list understanding.

My guess is that Python will eventually produce some form of iterator understanding, but that depends on finding the natural syntax that works for them. At the same time, in the itertools module, we have a large number of useful combination functions. Roughly speaking, each of these functions takes one argument (usually including one base iterator) and returns one new iterator. For example, the functions ifilter(), imap(), and izip() are all directly equivalent to built-in functions that lack the prefix i, respectively.

Missing equivalent functions

There is no ireduce() in itertools, although it would be natural to have this function; Possible Python implementations are:
Listing 1. Sample implementation of ireduce()


def ireduce(func, iterable, init=None):
  if init is None:
    iterable = iter(iterable)
    curr = iterable.next()
  else:
    curr = init
  for x in iterable:
    curr = func(curr, x)
    yield curr

The use case for ireduce() is similar to the use case for reduce(). For example, suppose you want to add the 1 column number that a large file contains, but stop when 1 condition is met. You can use the following code to monitor the total count being calculated:
Listing 2. Add and total the number of columns


from operator import add
from itertools import *
nums = open('numbers')
for tot in takewhile(condition, ireduce(add, imap(int, nums)):
  print "total =", tot

A more realistic example might be similar to applying an event flow to stateful objects, such as the GUI widget. But even the simple example above shows the FP style of the iterator combinator.

Basic iterator factory

All the functions in itertools can be easily implemented as generators using pure Python. The point of including this module in Python 2.3+ is to provide standard behavior and names for some useful functions. Although programmers can write their own versions, the variants that everyone actually creates are somewhat incompatible. However, the other one is to implement the iterator combinator in efficient C code. Using the itertools function will be a little faster than writing your own combinator. The standard documentation shows the equivalent pure Python implementation of each itertools function, so there is no need to repeat this in this article.

The functions in itertools are so basic - and named so differently - that it might make sense to import all the names from the module. For example, the function enumerate() may be visible in itertools, but it is a built-in function in Python 2.3+. It's worth noting that you can easily express enumerate() with the itertools function:


from itertools import *
enumerate = lambda iterable: izip(count(), iterable)

Let's first take a look at a few itertools functions that, instead of building on other iterators, create iterators entirely from scratch. times() returns an iterator that generates the same object multiple times. This 1 capability is useful in nature, but it does provide a good alternative to using too many xrange() and index variables to simply repeat an operation. That is, do not use:


for i in xrange(1000):
  do_something()

You can use a more neutral one now:


for _ in times(1000):
  do_something()

If times() has only one parameter, it will only repeat None. The function repeat() is similar to times(), but it returns the same object unbounded. This iterator is useful both in loops containing independent break conditions and in combinators such as izip() and imap().

The function count() is somewhat similar to the intersection of repeat() and xrange(). count() returns consecutive integers unbounded (starting with an optional argument). However, if count() does not currently support overflow to the current correct longs, you may still want to use xrange(n, sys.maxint); It's not completely unbounded, but for most purposes, it's actually one thing. Similar to repeat(), count() is especially useful inside other iterator combinators.

Combination function

We have already mentioned several actual combination functions in itertools in passing. ifilter(), izip(), and imap() act as you would expect from their respective sequence functions. ifilterfalse() is convenient, so you don't need to remove predicate functions from lambda and def (and this saves a lot of function call overhead). But functionally, you can define ifilterfalse() as (roughly speaking, None predicates are ignored) :


def ifilterfalse(predicate, iterable):
  return ifilter(lambda predicate: not predicate, iterable)

The functions dropwhile() and takewhile() divide iterators according to predicates. dropwhile() ignores the resulting elements until a predicate is satisfied, and takewhile() terminates when a predicate is satisfied. dropwhile() skips the initial element of the iterator's unspecified number, so it may not start iterating until after some delay. takewhile() starts the iteration immediately, but terminates the iterator if the passed predicate becomes true.

The function islice() is basically the iterator version of the list sharding. You can specify start, stop, and step sizes, just as you would with a regular slice. If a start is given, a large number of elements are deleted until the passed iterator reaches the element that satisfies the condition. This is another case where I think Python can be improved - iterators are better off just recognizing slices, as lists do (as synonyms for islice() behavior).

The last function, starmap(), is slightly different from imap(). If the function passed as a parameter takes more than one parameter, the iterable passed will produce tuples of the right size. This is essentially the same as imap() containing multiple iterable that were passed into iterable, except that it contains the iterables set previously combined with izip() in 1.

In-depth discussion

The functions included in itertools are a good start. Instead of using other functions, using only these functions makes it easier for the Python programmer to utilize and combine iterators. 1 generally speaking, the widespread use of iterators is undoubtedly important for the future of Python. But in addition to what has been included in the past, I would like to make a few Suggestions for future updates to the module. You can use these functions immediately and easily - the names or interface details will vary, of course, if they are included later.

A potentially generic category is those that take multiple iterable as arguments and then generate individual elements from each iterable. In contrast, izip() produces element tuples, while imap() produces values calculated from the base element. The two arrangements that are clear in my mind are chain() and weave(). The first one is similar in effect to sequence collocation (but a bit lazy). That is, in a pure sequence that you might use, for example:


for x in ('a','b','c') + (1, 2, 3):
  do_something(x)

For 1-like iterables, you can use:


for x in chain(iter1, iter2, iter3):
  do_something(x)

The Python implementation is:
Listing 3. Sample implementation of chain()


def chain(*iterables):
  for iterable in iterables:
    for item in iterable:
      yield item

With iterables, you can also combine several sequences by spreading them out. There is no built-in syntax for doing the same thing for a sequence, but weave() itself works well for a full sequence. The following are possible implementations (Magnus Lie Hetland proposes a similar function for comp.lang.python) :
Listing 4. Sample implementation of weave()


def ireduce(func, iterable, init=None):
  if init is None:
    iterable = iter(iterable)
    curr = iterable.next()
  else:
    curr = init
  for x in iterable:
    curr = func(curr, x)
    yield curr

Let me demonstrate the behavior of weave() for 1, because it's not obvious from the implementation point of view:


def ireduce(func, iterable, init=None):
  if init is None:
    iterable = iter(iterable)
    curr = iterable.next()
  else:
    curr = init
  for x in iterable:
    curr = func(curr, x)
    yield curr

Even if one of the iterators reaches the end point, the remaining iterators continue to generate values until all available values are generated at a certain point in time.

I'll just come up with one more viable itertools function. This function is primarily inspired by a functional programming approach to problem formulation. There is some symmetry between icompose() and the proposed function ireduce(). But where ireduce() passes a (lazy) sequence of values to a function and produces each result, icompose() applies the sequence of functions to the return value of each forward function. ireduce() can be used to pass sequences of events to long-lived objects. Whereas icompose() may repeatedly pass an object to an assignment function that returns a new object. The first approach is a fairly traditional OOP approach to considering events, while the second approach is closer to FP.

Here are the possible icompose() implementations:
Listing 5. Sample implementation of icompose()


def ireduce(func, iterable, init=None):
  if init is None:
    iterable = iter(iterable)
    curr = iterable.next()
  else:
    curr = init
  for x in iterable:
    curr = func(curr, x)
    yield curr

conclusion

Iterators - known as lazy sequences - are powerful concepts that open up a new style of programming for Python. But there is a subtle difference between thinking of an iterator only as a data source and thinking of it as a sequence. Neither of these two ideas is intrinsically more true than the other, but the latter pioneered a combinatorial shorthand for manipulation programming events. The combined functions in itertools (especially the ones it might produce that are similar to the ones I suggested) are close to the declarative style of programming. To me, these declaration styles 1 are more error-prone and more powerful.