A simple guide to the defaultdict and namedtuple modules in Python

  • 2020-04-02 14:46:48
  • OfStack

There are some built-in data types in Python, such as int, STR, list, tuple, dict, etc. Python's collections module provides several additional data types on top of these built-in data types: namedtuple, defaultdict, deque, Counter, OrderedDict, etc. Defaultdict and namedtuple are two useful extension types. Defaultdict inherits from dict, and namedtuple inherits from tuple.
A, defaultdict

  1. Introduction

When using Python's native data structure dict, if accessed in such a way as d[key], a KeyError exception will be thrown when the specified key does not exist. However, if you use defaultdict, as soon as you pass in a default factory method, a request for a nonexistent key will call the factory method and use the result as the default value for the key.

When used, defaultdict needs to pass a factory function (function_factory). Defaultdict (function_factory) builds a dict-like object with a default value, which is generated by calling the factory function.

Example 2.

Here is an example of using defaultdict:
 


In [1]: from collections import defaultdict
 
In [2]: s = [('xiaoming', 99), ('wu', 69), ('zhangsan', 80), ('lisi', 96), ('wu', 100), ('yuan', 98), ('xiaoming', 89)]
 
In [3]: d = defaultdict(list)
 
In [4]: for k, v in s:
  ...:   d[k].append(v)
  ...:  
 
In [5]: d
Out[5]: defaultdict(<type 'list'>, {'lisi': [96], 'xiaoming': [99, 89], 'yuan': [98], 'zhangsan': [80], 'wu': [69, 100]})
 
In [6]: for k, v in d.items():
  ...:   print '%s: %s' % (k, v)
  ...:  
lisi: [96]
xiaoming: [99, 89]
yuan: [98]
zhangsan: [80]
wu: [69, 100]

Students who are familiar with Python can find that the usage of defaultdict(list) is similar to that of dict.setdefault(key, []). The above code USES setdefault to achieve the following:
 


s = [('xiaoming', 99), ('wu', 69), ('zhangsan', 80), ('lisi', 96), ('wu', 100), ('yuan', 98), ('xiaoming', 89)]
d = {}
 
for k, v in s:
  d.setdefault(k, []).append(v)

Principle 3.

From the above examples, we can basic the use of defaultdict, we can use help(defaultdict) to understand the principle of defaultdict. With the help information printed out by the Python console, we can find that defaultdict with default value is mainly realized by the method of s/s. If the factory function is not None, the default value is returned by the factory method, as follows:
 


def __missing__(self, key):
  # Called by __getitem__ for missing key
  if self.default_factory is None:
    raise KeyError((key,))
  self[key] = value = self.default_factory()
  return value

From the above, we can find a few things to pay attention to:

A). S/s method is called if the KEY is found not to exist by calling the s/s method. Therefore, defaultdict will only generate the default value if d[KEY] or d. If the use of d.et (key) will not return the default value, KeyError will appear;

B). Defaultdict is mainly implemented with the method of s/s. Therefore, we can also generate our own defaultdict by implementing this method.


In [1]: class MyDefaultDict(dict):
  ...:   def __missing__(self, key):
  ...:     self[key] = 'default'
  ...:     return 'default'
  ...:  
 
In [2]: my_default_dict = MyDefaultDict()
 
In [3]: my_default_dict
Out[3]: {}
 
In [4]: print my_default_dict['test']
default
 
In [5]: my_default_dict
Out[5]: {'test': 'default'}

Version 4.

Defaultdict was added after Python 2.5, which was not supported in older versions of Python, but knowing how it works, we can implement a defaultdict ourselves.


# http://code.activestate.com/recipes/523034/
try:
  from collections import defaultdict
except:
  class defaultdict(dict):
 
    def __init__(self, default_factory=None, *a, **kw):
      if (default_factory is not None and
        not hasattr(default_factory, '__call__')):
        raise TypeError('first argument must be callable')
      dict.__init__(self, *a, **kw)
      self.default_factory = default_factory
 
    def __getitem__(self, key):
      try:
        return dict.__getitem__(self, key)
      except KeyError:
        return self.__missing__(key)
 
    def __missing__(self, key):
      if self.default_factory is None:
        raise KeyError(key)
      self[key] = value = self.default_factory()
      return value
 
    def __reduce__(self):
      if self.default_factory is None:
        args = tuple()
      else:
        args = self.default_factory,
      return type(self), args, None, None, self.items()
 
    def copy(self):
      return self.__copy__()
 
    def __copy__(self):
      return type(self)(self.default_factory, self)
 
    def __deepcopy__(self, memo):
      import copy
      return type(self)(self.default_factory, copy.deepcopy(self.items()))
 
    def __repr__(self):
      return 'defaultdict(%s, %s)' % (self.default_factory, dict.__repr__(self))

Second, namedtuple

Namedtuples are primarily used to produce data objects that can access elements using their names, and are often used to enhance the readability of code, especially when accessing some tuple type data. In fact, most of the time you should use a namedtuple instead of a tuple to make your code easier to read and more pythonic. For example:


from collections import namedtuple
 
#  Variable names and namedtuple The first parameter is generally the same, but it can be different 
Student = namedtuple('Student', 'id name score')
#  or  Student = namedtuple('Student', ['id', 'name', 'score'])
 
students = [(1, 'Wu', 90), (2, 'Xing', 89), (3, 'Yuan', 98), (4, 'Wang', 95)]
 
for s in students:
  stu = Student._make(s)
  print stu
 
# Output:
# Student(id=1, name='Wu', score=90)
# Student(id=2, name='Xing', score=89)
# Student(id=3, name='Yuan', score=98)
# Student(id=4, name='Wang', score=95)

In the above example, Student is a namedtuple, which, like a tuple, can be retrieved directly from index and is read-only. This is much easier to understand than a tuple, and it's pretty clear what each value represents.


Related articles: