Four very practical python built in data structures
- 2021-11-02 01:52:26
- OfStack
defaultdict
Named Tuple
Counter
array
Python not only implements arrays using the built-in list, but also supports the specified type of native array array like the C language.
Obviously, because list can store various types of objects, while array only stores one specified native type, native array has smaller memory footprint than list when the amount of data is large.
Moreover, unlike C, array is not limited in size when defined, and it supports various common functions supported by list. In contrast, array of Python is more like vector of C + +.
from array import array
l = list(range(100))
a = array.fromlist(l)
print(l.__sizeof__(), a.__sizeof__())
At present, array has two limitations. First, it only supports integers, decimals, and unicode characters, and does not support multiple data types like vector for C + +. In addition, it is troublesome to specify types at present, so we need to use abbreviations corresponding to types instead of simple methods such as int and float.
a = array('i')
a.append(1)
a.append(4)
Type code | C Type | Python Type | Minimum size in bytes |
'b' | signed char | int | 1 |
'B' | unsigned char | int | 1 |
'u' | wchar_t | Unicode character | 2 |
'h' | signed short | int | 2 |
'H' | unsigned short | int | 2 |
'i' | signed int | int | 2 |
'I' | unsigned int | int | 2 |
'l' | signed long | int | 4 |
'L' | unsigned long | int | 4 |
For more detailed information, please refer to: https://docs.python.org/3. 8/library/array.html
defaultdict
The map of C + + automatically constructs a value with the default constructor of value type for the new key, while the default dict of Python throws an exception (except for assignment) for access to the non-existent key. This is because Python doesn't know the type of value, so it can't be constructed by default for us.
defaultdict requires us to specify a type at construction time, and then automatically initializes value as needed. In this way, we can use simple code to achieve many functions.
In the following code, I compare the function of grouping students according to the first letters of their last names with defaultdict and dict, and the function of classifying and counting.
import collections
students = ['Zhang San', 'Li Si', 'Zhou liu', 'Chen qi', 'Cheng ba']
# using defaultdict
dd = collections.defaultdict(list)
for s in students:
key = s[0]
dd[key].append(s)
print(dd)
# using original dict (method 1)
od = {}
for s in students:
key = s[0]
if key not in do:
od[key] = []
od[key].append(s)
print(od)
scores = ['A', 'B', 'C', 'A', 'A', 'B', 'C', 'B', 'A', 'A']
# using defaultdict
dd = collections.defaultdict(int)
for s in scores :
dd[s] += 1
print(dd)
# using original dict (method 2)
od = collections.defaultdict(int)
for s in scores :
if s not in do:
do[s] = 1
else:
do[s] += 1
print(od)
Named Tuple
In programming practice, we often need to create a small data structure to integrate a set of related data, such as latitude and longitude of geographical coordinates, RGB values of colors or upper left and lower right coordinates of rectangular boxes, and complex ones such as constructing a set of parameters of a window.
In practice, we usually have three implementation methods:
nametuple of collections can directly construct a simple type with a name for us, which is similar to handwriting an class conveniently and quickly.
It should be noted that collections. nametuple is an factory function, which is used to help us create a type, not a concrete object of this type. When you create a type, you can specify the name of each attribute, which is then accessed using. And it also supports accessing using subscripts. At the same time, Named Tuple also supports the _ asdict function to convert the internal value into an dict.
# class
class Rect:
def __init__(self, x1, y1, x2, y2):
self.x1 = x1
self.y1 = y1
self.x2 = x2
self.y2 = y2
def area_class(r):
w = r.x2 - r.x1
h = r.y2 - r.y1
return w*h
r1 = Rect(1,3,5,5)
# <__main__.Rect object at 0x7fde252a87f0>
# to show its content, we need to implement __repr__(self) or __str__(self)
print(area_class(r1))
# tuple
def area_tuple(r):
w = r[2]-r[0]
h = r[3]-r[1]
return w*h
r2 = (1,3,5,5)
print(r2)
# (1, 3, 5, 5)
print(area_tuple(r2))
# dict
def area_dict(r):
w = r["x2"] - r["x1"]
h = r["y2"] - r["y1"]
return w*h
r3 = {"x1":1, "y1":3, "x2":5, "y2":5}
print(r3)
# {'x1': 1, 'y1': 3, 'x2': 5, 'y2': 5}
print(area_tuple(r3))
# named tuple
import collections
Rectangle = collections.namedtuple("Rectangle", ["x1", "y1", "x2", "y2"])
def area_namedtuple(r):
w = r.x2 - r.x1
y = r.y2 - r.y1
return w*h
r4 = Rectangle(1,3,5,5)
print(r4)
# Rectangle(x1=1, y1=3, x2=5, y2=5)
x1,y2,x2,y2 = r4
print(x1,y2,x2,y2)
# 1 3 5 5
print(area_namedtuple(r4))
print(area_class(r4)) # work with "." grammar
print(area_tuple(r4)) # work with index
print(area_dict(r4._asdict())) # work with dict
Counter
As the name implies, Counter is used to count elements, and it is also in the package collections. According to the official documentation of Python, it is a subclass of dict type.
Enter a type of iterable, such as list, range, or a type of mapping, such as dict, defaultdict. Then Counter counts the elements in it.
What is more special is that Counter does not do special treatment for negative numbers, that is to say, it is allowed to test negative under special operations, and we will have examples later.
c = Counter() # a new, empty counter
c = Counter('gallahad') # a new counter from an iterable
print(c)
# Counter({'a': 3, 'l': 2, 'g': 1, 'h': 1, 'd': 1})
c = Counter({'red': 4, 'blue': 2}) # a new counter from a mapping
print(c)
# Counter({'red': 4, 'blue': 2})
c = Counter(cats=4, dogs=8) # a new counter from keyword args
print(c)
# Counter({'dogs': 8, 'cats': 4})
In addition to the basic counting function, it also supports 1 common related functions. For example:
Sort by frequency (most_common ([n])). Where n is an optional input, indicating the most frequent elements of the previous n and their frequency. All elements are returned by default. Output the element itself at frequency (elements ()). It will return the elements themselves, but the order of the elements is not the original, and the same elements will be output continuously. Between different elements, output in the order in which they appear, this is a feature provided by OrderedDict and dict after 3.7. Subtract the two Counter (substract (c)). It subtracts the number of occurrences of the corresponding element in the second counter from the first counter. For elements that appear only in the second coutner, they appear 0 times in the first counter by default.
c = Counter(a=4, b=2, c=0, d=-2)
sorted(c.elements())
# ['a', 'a', 'a', 'a', 'b', 'b']
Counter('abracadabra').most_common(3)
# [('a', 5), ('b', 2), ('r', 2)]
c1 = Counter(a=4, b=2, d=-2)
c2 = Counter(a=1, b=2, c=3, d=4)
c1.subtract(c2)
c1
# Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6})
For more reference information, please refer to the official documents:
https://docs.python.org/3/library/collections.html
These are the 4 very practical python built-in data structure details, more information about python built-in data structure please pay attention to other related articles on this site!