Detailed usage of the itertools module in Python

2020-04-02 14:07:12
OfStack

This article illustrates the use of the itertools module in Python as an example. Specific analysis is as follows:

In general, the itertools module contains functions that create valid iterators that can loop through data in a variety of ways, and all the iterators returned by functions in this module can be used in conjunction with for loop statements and other functions that contain iterators such as generators and generator expressions.

Chain (iter1, iter2,... The iterN) :

Given a set of iterators (iter1, iter2,... IterN), this function creates a new iterator to link all iterators. The returned iterator generates items from iter1 until iter1 is used up, and then generates items from iter2 until all items in iterN are used up.


from itertools import chain
test = chain('AB', 'CDE', 'F')
for el in test:
  print el

A
B
C
D
E
F

Chain. From_iterable (iterables) :

An alternate chain constructor, where iterables is an iteration variable, generates a sequence of iterations, the result of which is the same as that generated by the following generator code snippet:


>>> def f(iterables):
  for x in iterables:
    for y in x:
      yield y

>>> test = f('ABCDEF')
>>> test.next()
'A'


>>> from itertools import chain
>>> test = chain.from_iterable('ABCDEF')
>>> test.next()
'A'

Combinations (iterable, r) :

Create an iterator to return all subsequences of length r in iterable, and the items in the returned subsequence are sorted in the order of the input iterable:


>>> from itertools import combinations
>>> test = combinations([1,2,3,4], 2)
>>> for el in test:
  print el

  
(1, 2)
(1, 3)
(1, 4)
(2, 3)
(2, 4)
(3, 4)

Count ([n]) :

Create an iterator to generate consecutive integers starting with n, if n is ignored, calculate from 0 (note: this iterator does not support long integers), if you exceed sys.maxint, the counter will overflow and continue to calculate from -sys.maxint-1.

Cycle (iterable) :

An iterator is created that loops over and over the elements in iterable, internally generating a copy of the elements in iterable that returns duplicates in the loop.

Dropwhile (predicate, iterable) :

Create an iterator that drops the item in iterable as long as the predicate(item) is True, and if the predicate returns False, generates the item in iterable and all subsequent items.


def dropwhile(predicate, iterable):
  # dropwhile(lambda x: x<5, [1,4,6,4,1]) --> 6 4 1
  iterable = iter(iterable)
  for x in iterable:
    if not predicate(x):
      yield x
      break
  for x in iterable:
    yield x

Groupby (iterable [key]) :

An iterator is created to group successive items generated by iterable, looking for duplicates as they are grouped.

If the iterable generated the same item in multiple consecutive iterations, will define a group, if apply this function a classification list, then grouping will define all the only item in the list, the key (if already provided) is a function that is applied to each item, if the function return value, the value will be used in subsequent item itself, rather than the comparison of this function returns an iterator generated elements (key, group), of which the key is the key value group, the group is an iterator, generate the all items of the group.

Ifilter (predicate, iterable) :
Create an iterator that only generates the predicate(item) True item in iterable, and if the predicate is None, returns all the items in iterable that calculate True.


ifilter(lambda x: x%2, range(10)) --> 1 3 5 7 9

Ifilterfalse (predicate, iterable) :
Create an iterator that generates only the predicate(item) that is False in iterable, and if the predicate is None, returns all the entries that calculate False in iterable.


ifilterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8

Function, iter1, iter2, iter3... The iterN)
Create an iterator to generate the item function(i1, i2,...) , iN), where i1, i2... IN comes from iterator iter1, iter2... IterN, if function is None, returns (i1, i2,...) , iN), the iteration will stop as soon as one of the provided iterators no longer generates values.


>>> from itertools import *
 >>> d = imap(pow, (2,3,10), (5,2,3))
 >>> for i in d: print i
 
 32
 9
 1000
 
 ####
 >>> d = imap(pow, (2,3,10), (5,2))
 >>> for i in d: print i
 
 32
 9

 ####
 >>> d = imap(None, (2,3,10), (5,2))
 >>> for i in d : print i
 
 (2, 5)
 (3, 2)

Islice (iterable, [start,] stop [, step]):
Create an iterator that generates the items in a manner similar to the slicing return value: iterable[start: stop: step], will skip the previous start items, the iteration will stop at the location specified by stop, and step specifies the pace to be used to skip the items. Unlike slicing, negative values are not used for any start, stop, and step. If the start is omitted, the iteration will start at 0, and if the step is omitted, the step length will be 1.


def islice(iterable, *args):
   # islice('ABCDEFG', 2) --> A B
   # islice('ABCDEFG', 2, 4) --> C D
   # islice('ABCDEFG', 2, None) --> C D E F G
   # islice('ABCDEFG', 0, None, 2) --> A C E G
   s = slice(*args)
   it = iter(xrange(s.start or 0, s.stop or sys.maxint, s.step or 1))
   nexti = next(it)
   for i, element in enumerate(iterable):
     if i == nexti:
       yield element
       nexti = next(it)
 
#If start is None, then iteration starts at zero. If step is None, then the step defaults to one.
#Changed in version 2.5: accept None values for default start and step.

Izip (iter1, iter2,... IterN) :
Create an iterator that generates tuples (i1, i2,...) IN), where i1, i2... IN comes from iterator iter1, iter2... IterN, the iteration stops as soon as one of the provided iterators stops generating values, which are the same as the built-in zip() function.


def izip(*iterables):
   # izip('ABCD', 'xy') --> Ax By
   iterables = map(iter, iterables)
   while iterables:
     yield tuple(map(next, iterables))

Izip_longest (iter1, iter2,... IterN, fillvalue = None) :
It is the same as izip(), but the iteration process continues until all input iteration variables iter1,iter2, and so on are exhausted, and if no different values are specified using the fillvalue keyword parameter, the value of the used iteration variable is filled with None.


def izip_longest(*args, **kwds):
   # izip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
   fillvalue = kwds.get('fillvalue')
   def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
     yield counter()     # yields the fillvalue, or raises IndexError
   fillers = repeat(fillvalue)
   iters = [chain(it, sentinel(), fillers) for it in args]
   try:
     for tup in izip(*iters):
       yield tup
   except IndexError:
     pass

Permutations (iterable [r]) :

Create an iterator that returns a sequence of all items of length r in iterable. If r is omitted, the sequence length is the same as the number of items in iterable:


def permutations(iterable, r=None):
   # permutations('ABCD', 2) --> AB AC AD BA BC BD CA CB CD DA DB DC
   # permutations(range(3)) --> 012 021 102 120 201 210
   pool = tuple(iterable)
   n = len(pool)
   r = n if r is None else r
   if r > n:
     return
   indices = range(n)
   cycles = range(n, n-r, -1)
   yield tuple(pool[i] for i in indices[:r])
   while n:
     for i in reversed(range(r)):
       cycles[i] -= 1
       if cycles[i] == 0:
         indices[i:] = indices[i+1:] + indices[i:i+1]
         cycles[i] = n - i
       else:
         j = cycles[i]
         indices[i], indices[-j] = indices[-j], indices[i]
         yield tuple(pool[i] for i in indices[:r])
         break
     else:
       return

The product (iter1, iter2,... IterN, repeat = 1) :

Create an iterator to generate a tuple representing the cartesian product of items in item1, item2, etc.


def product(*args, **kwds):
   # product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy
   # product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111
   pools = map(tuple, args) * kwds.get('repeat', 1)
   result = [[]]
   for pool in pools:
     result = [x+[y] for x in result for y in pool]
   for prod in result:
     yield tuple(prod)

Repeat (object [times]) :
Create an iterator that repeats the object, times (if supplied) specifies the repeat count, and returns the object indefinitely if times is not provided.


def repeat(object, times=None):
   # repeat(10, 3) --> 10 10 10
   if times is None:
     while True:
       yield object
   else:
     for i in xrange(times):
       yield object

Starmap (func [iterable]) :
Create an iterator that generates the value func(*item) from iterable, which is only valid if the iterable generated item is appropriate for this way of calling the function.


def starmap(function, iterable):
   # starmap(pow, [(2,5), (3,2), (10,3)]) --> 32 9 1000
   for args in iterable:
     yield function(*args)

Takewhile (predicate [iterable]) :
Create an iterator that generates the item in iterable for which the predicate(item) is True, and as soon as the predicate computes to False, the iteration stops immediately.


def takewhile(predicate, iterable):
   # takewhile(lambda x: x<5, [1,4,6,4,1]) --> 1 4
   for x in iterable:
     if predicate(x):
       yield x
     else:
       break

The tee (iterable [n]) :
From iterable create n independent iterators, created by the iterator returned in the form of n tuples, the default value is 2 n, this function is applicable to any object iteration, however, in order to clone the original iterator, generated by the item will be cached, and in all the newly created iterator is used, it is important to note that don't call the tee () after using the original iterator iterable, otherwise the caching mechanism may not be able to work properly.


def tee(iterable, n=2):
  it = iter(iterable)
  deques = [collections.deque() for i in range(n)]
  def gen(mydeque):
    while True:
      if not mydeque:       # when the local deque is empty
        newval = next(it)    # fetch a new value and
        for d in deques:    # load it to all the deques
          d.append(newval)
      yield mydeque.popleft()
  return tuple(gen(d) for d in deques)

#Once tee() has made a split, the original iterable should not be used anywhere else; otherwise, 
the iterable could get advanced without the tee objects being informed.
#This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). 
In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

I believe that this article has a certain reference value for everyone to learn Python programming.