Tutorial on using the Python standard library's collections package

  • 2020-05-30 20:32:10
  • OfStack

preface

Python provides us with four basic data structures: list, tuple, dict and set. However, when dealing with large amounts of data, these four data structures are obviously too single 1. For example, as an array, list is inefficient to insert in some cases, and sometimes we also need to maintain an orderly dict. Therefore, we will use the collections package provided by the Python standard library, which provides a number of useful collection classes. Mastering these collection classes will not only make our code more Pythonic, but also improve the running efficiency of our program.

defaultdict

defaultdict(default_factory) default_factory is added on top of ordinary dict to automatically generate value of the corresponding type when key does not exist. default_factory parameter can be specified as list, set, int and other legal types.

We now have the following set of list. Although we have 5 sets of data, we only have 3 kinds of color after careful observation, but each type of color corresponds to multiple values. Now we want to convert this list to 1 dict, and this key of dict corresponds to 1 color, and value of dict corresponds to 1 list to store multiple values of color. We can use defaultdict(list) To solve the problem.


>>> from collections import defaultdict
>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list)
>>> for k, v in s:
...  d[k].append(v)
...
>>> sorted(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

The above is equivalent to:


>>> d = {}
>>> for k, v in s:
...  d.setdefault(k, []).append(v)
...
>>> sorted(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

If you don't want to have duplicate elements, consider using it defaultdict(set) . The difference between set and list is that the same elements are not allowed in set.


>>> from collections import defaultdict
>>> s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)]
>>> d = defaultdict(set)
>>> for k, v in s:
...  d[k].add(v)
...
>>> sorted(d.items())
[('blue', {2, 4}), ('red', {1, 3})]

OrderedDict

dict before Python3.6 is disordered, but in some cases we need to keep the order of dict. At this time, we can use OrderedDict, which is an subclass of dict, but keeps the order of dict on the basis of dict. Let's look at the usage method below.


>>> # regular unsorted dictionary
>>> d = {'banana': 3, 'apple': 4, 'pear': 1, 'orange': 2}
>>> # dictionary sorted by key
>>> OrderedDict(sorted(d.items(), key=lambda t: t[0]))
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])
>>> # dictionary sorted by value
>>> OrderedDict(sorted(d.items(), key=lambda t: t[1]))
OrderedDict([('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4)])
>>> # dictionary sorted by length of the key string
>>> OrderedDict(sorted(d.items(), key=lambda t: len(t[0])))
OrderedDict([('pear', 1), ('apple', 4), ('orange', 2), ('banana', 3)])

use popitem(last=True) The method allows us to delete key-value in dict (first in first out) order, that is, to delete the last inserted key-value pair, and key-value in dict (first in first out) if last=False.


>>> d = {'banana': 3, 'apple': 4, 'pear': 1, 'orange': 2}
>>> # dictionary sorted by key
>>> d = OrderedDict(sorted(d.items(), key=lambda t: t[0]))
>>> d
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])
>>> d.popitem()
('pear', 1)
>>> d.popitem(last=False)
('apple', 4)

use move_to_end(key, last=True) To change the key-value order of the ordered OrderedDict objects, we can insert any one key-value of the ordered OrderedDict objects into the beginning or end of the dictionary.


>>> d = OrderedDict.fromkeys('abcde')
>>> d
OrderedDict([('a', None), ('b', None), ('c', None), ('d', None), ('e', None)])
>>> d.move_to_end('b')
>>> d
OrderedDict([('a', None), ('c', None), ('d', None), ('e', None), ('b', None)])
>>> ''.join(d.keys())
'acdeb'
>>> d.move_to_end('b', last=False)
>>> ''.join(d.keys())
'bacde'

deque

The advantage of list storing data is that it is fast to find elements by index, but slow to insert and delete elements because list is based on arrays. deque is a bidirectional list for efficient insert and delete operations, suitable for queues and stacks, and thread-safe.

list only provides the append and pop methods to insert/delete elements from the end of list, while deque adds the appendleft/popleft methods to allow us to insert/delete elements efficiently at the beginning of an element. Moreover, the algorithm complexity of append or pop elements at both ends of the queue using deque is about O(1), but for operations where list objects change the list length and data position, for example pop(0) and insert(0, v) The complexity of the operation is as high as O(n).


>>> from collections import deque
>>> dq = deque(range(10), maxlen=10)
>>> dq
deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)
>>> dq.rotate(3)
>>> dq
deque([7, 8, 9, 0, 1, 2, 3, 4, 5, 6], maxlen=10)
>>> dq.rotate(-4)
>>> dq
deque([1, 2, 3, 4, 5, 6, 7, 8, 9, 0], maxlen=10)
>>> dq.appendleft(-1)
>>> dq
deque([-1, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)
>>> dq.extend([11, 22, 33])
>>> dq
deque([3, 4, 5, 6, 7, 8, 9, 11, 22, 33], maxlen=10)
>>> dq.extendleft([10, 20, 30, 40])
>>> dq
deque([40, 30, 20, 10, 3, 4, 5, 6, 7, 8], maxlen=10)

Counter

Count is used to count the number of occurrences of related elements.


>>> from collections import Counter
>>> ct = Counter('abracadabra')
>>> ct
Counter({'a': 5, 'r': 2, 'b': 2, 'd': 1, 'c': 1})
>>> ct.update('aaaaazzz')
>>> ct
Counter({'a': 10, 'z': 3, 'r': 2, 'b': 2, 'd': 1, 'c': 1})
>>> ct.most_common(2)
[('a', 10), ('z', 3)]
>>> ct.elements()
<itertools.chain object at 0x7fbaad4b44e0>

namedtuple

use namedtuple(typename, field_names) Name the elements in tuple to make the program more readable.


>>> from collections import namedtuple
>>> City = namedtuple('City', 'name country population coordinates')
>>> tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))
>>> tokyo
City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))
>>> tokyo.population
36.933
>>> tokyo.coordinates
(35.689722, 139.691667)
>>> tokyo[1]
'JP'

>>> City._fields
('name', 'country', 'population', 'coordinates')
>>> LatLong = namedtuple('LatLong', 'lat long')
>>> delhi_data = ('Delhi NCR', 'IN', 21.935, LatLong(28.613889, 77.208889))
>>> delhi = City._make(delhi_data)
>>> delhi._asdict()
OrderedDict([('name', 'Delhi NCR'), ('country', 'IN'), ('population', 21.935),
   ('coordinates', LatLong(lat=28.613889, long=77.208889))])
>>> for key, value in delhi._asdict().items():
  print(key + ':', value)
name: Delhi NCR
country: IN
population: 21.935
coordinates: LatLong(lat=28.613889, long=77.208889)

ChainMap

ChainMap can be used to merge multiple dictionaries.


>>> d = {}
>>> for k, v in s:
...  d.setdefault(k, []).append(v)
...
>>> sorted(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
0

>>> d = {}
>>> for k, v in s:
...  d.setdefault(k, []).append(v)
...
>>> sorted(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
1

From the above del['elephant'] The error message can be seen that the operation ChainMap for changing the key value will only be in the first dictionary self.maps[0][key] Search, the newly added key value pair will also be added to the first dictionary. Let's improve ChainMap to solve this problem:


>>> d = {}
>>> for k, v in s:
...  d.setdefault(k, []).append(v)
...
>>> sorted(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
2

You can use new_child to deepcopy1 ChainMap:


>>> from collections import ChainMap
>>> a = {'a': 'A', 'c': 'C'}
>>> b = {'b': 'B', 'c': 'D'}
>>> m = ChainMap({'a': 'A', 'c': 'C'}, {'b': 'B', 'c': 'D'})
>>> m
ChainMap({'a': 'A', 'c': 'C'}, {'b': 'B', 'c': 'D'})
>>> m['c']
'C'
>>> m.maps
[{'c': 'C', 'a': 'A'}, {'c': 'D', 'b': 'B'}]
>>> a['c'] = 'E'
>>> m['c']
'E'
>>> m
ChainMap({'c': 'E', 'a': 'A'}, {'c': 'D', 'b': 'B'})

>>> d = {}
>>> for k, v in s:
...  d.setdefault(k, []).append(v)
...
>>> sorted(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
4

UserDict

Now let's improve the dictionary by converting key to str when querying the dictionary:


>>> d = {}
>>> for k, v in s:
...  d.setdefault(k, []).append(v)
...
>>> sorted(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
5

Explanation 1:

isinstance(key, str) is required in s 194en__. Please consider why. Since assuming 1 key does not exist, this will result in infinite recursion, self[str(key)] calling again. S 205en__ must also be implemented as k in d will be called, but note that it will not call s 209en__ even if the lookup fails. One more detail about s/s s s/s is that we are not using them defaultdict(list)0 Because the str(key) in self , as this will result in a recursive call to s 213en__.

Another point to emphasize here is that in Python2.x, dict.keys () returns an list, which means that k in my_list must traverse list. Python3.x has been optimized for dict.keys () with higher performance. It will return an view as set1, refer to the official documentation for details.

The above example can be overwritten with UserDict, and all key is stored as str, which is a more common and concise way of writing:


>>> d = {}
>>> for k, v in s:
...  d.setdefault(k, []).append(v)
...
>>> sorted(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
6

UserDict is a subclass of MutableMapping and Mapping. It inherits two important methods, MutableMapping. update and Mapping. get.

conclusion


Related articles: