Four very practical python built in data structures

2021-11-02 01:52:26
OfStack

Directory array
defaultdict
Named Tuple
Counter

array

Python not only implements arrays using the built-in list, but also supports the specified type of native array array like the C language.
Obviously, because list can store various types of objects, while array only stores one specified native type, native array has smaller memory footprint than list when the amount of data is large.
Moreover, unlike C, array is not limited in size when defined, and it supports various common functions supported by list. In contrast, array of Python is more like vector of C + +.


from array import array
l = list(range(100))
a = array.fromlist(l)

print(l.__sizeof__(), a.__sizeof__())

At present, array has two limitations. First, it only supports integers, decimals, and unicode characters, and does not support multiple data types like vector for C + +. In addition, it is troublesome to specify types at present, so we need to use abbreviations corresponding to types instead of simple methods such as int and float.


a = array('i')
a.append(1)
a.append(4)

Type code	C Type	Python Type	Minimum size in bytes
'b'	signed char	int	1
'B'	unsigned char	int	1
'u'	wchar_t	Unicode character	2
'h'	signed short	int	2
'H'	unsigned short	int	2
'i'	signed int	int	2
'I'	unsigned int	int	2
'l'	signed long	int	4
'L'	unsigned long	int	4

For more detailed information, please refer to: https://docs.python.org/3. 8/library/array.html

defaultdict

The map of C + + automatically constructs a value with the default constructor of value type for the new key, while the default dict of Python throws an exception (except for assignment) for access to the non-existent key. This is because Python doesn't know the type of value, so it can't be constructed by default for us.
defaultdict requires us to specify a type at construction time, and then automatically initializes value as needed. In this way, we can use simple code to achieve many functions.

In the following code, I compare the function of grouping students according to the first letters of their last names with defaultdict and dict, and the function of classifying and counting.


import collections
students = ['Zhang San', 'Li Si', 'Zhou liu', 'Chen qi', 'Cheng ba']
# using defaultdict
dd = collections.defaultdict(list)
for s in students:
	key = s[0]
	dd[key].append(s)
print(dd)
# using original dict (method 1)
od = {}
for s in students:
	key = s[0]
	if key not in do:
		od[key] = []
	od[key].append(s)
print(od)

scores = ['A', 'B', 'C', 'A', 'A', 'B', 'C', 'B', 'A', 'A']
# using defaultdict
dd = collections.defaultdict(int)
for s in scores :
	dd[s] += 1
print(dd)
# using original dict (method 2)
od = collections.defaultdict(int)
for s in scores :
	if s not in do:
		do[s] = 1
	else:
		do[s] += 1
print(od)

Named Tuple

In programming practice, we often need to create a small data structure to integrate a set of related data, such as latitude and longitude of geographical coordinates, RGB values of colors or upper left and lower right coordinates of rectangular boxes, and complex ones such as constructing a set of parameters of a window.
In practice, we usually have three implementation methods:

Create 1 class for every 1 such data structure. The advantage is that data members can be accessed directly by name, and complex access logic and data manipulation are supported. The disadvantage is that you need to write corresponding classes and necessary functions, and manage files and reference relationships. Use tuple. The advantages are simple writing and high memory use efficiency. The disadvantage is that it can only be accessed by subscripts, which is poor in readability and prone to errors. Use dict, and use str as the name for the attribute. The advantage is that writing is relatively simple, and the names of variables are preserved. The disadvantage is that it is troublesome to use string to represent the name, and every 1 structure should save the string as the name, which wastes space.

nametuple of collections can directly construct a simple type with a name for us, which is similar to handwriting an class conveniently and quickly.
It should be noted that collections. nametuple is an factory function, which is used to help us create a type, not a concrete object of this type. When you create a type, you can specify the name of each attribute, which is then accessed using. And it also supports accessing using subscripts. At the same time, Named Tuple also supports the _ asdict function to convert the internal value into an dict.


# class
class Rect:
	def __init__(self, x1, y1, x2, y2):
		self.x1 = x1
		self.y1 = y1
		self.x2 = x2
		self.y2 = y2
		
def area_class(r):
	w = r.x2 - r.x1
	h = r.y2 - r.y1
	return w*h

r1 = Rect(1,3,5,5)
# <__main__.Rect object at 0x7fde252a87f0>
# to show its content, we need to implement __repr__(self) or __str__(self)
print(area_class(r1))

# tuple
def area_tuple(r):
	w = r[2]-r[0]
	h = r[3]-r[1]
	return w*h

r2 = (1,3,5,5)
print(r2)
# (1, 3, 5, 5)
print(area_tuple(r2))

# dict
def area_dict(r):
	w = r["x2"] - r["x1"]
	h = r["y2"] - r["y1"]
	return w*h

r3 = {"x1":1, "y1":3, "x2":5, "y2":5}
print(r3)
# {'x1': 1, 'y1': 3, 'x2': 5, 'y2': 5}
print(area_tuple(r3))

# named tuple
import collections
Rectangle = collections.namedtuple("Rectangle", ["x1", "y1", "x2", "y2"])

def area_namedtuple(r):
	w = r.x2 - r.x1
	y = r.y2 - r.y1
	return w*h

r4 = Rectangle(1,3,5,5)
print(r4)
# Rectangle(x1=1, y1=3, x2=5, y2=5)
x1,y2,x2,y2 = r4
print(x1,y2,x2,y2)
# 1 3 5 5
print(area_namedtuple(r4))
print(area_class(r4)) # work with "." grammar
print(area_tuple(r4)) # work with index
print(area_dict(r4._asdict())) # work with dict

Counter

As the name implies, Counter is used to count elements, and it is also in the package collections. According to the official documentation of Python, it is a subclass of dict type.
Enter a type of iterable, such as list, range, or a type of mapping, such as dict, defaultdict. Then Counter counts the elements in it.
What is more special is that Counter does not do special treatment for negative numbers, that is to say, it is allowed to test negative under special operations, and we will have examples later.


c = Counter()                           # a new, empty counter
c = Counter('gallahad')                 # a new counter from an iterable
print(c)
# Counter({'a': 3, 'l': 2, 'g': 1, 'h': 1, 'd': 1})
c = Counter({'red': 4, 'blue': 2})      # a new counter from a mapping
print(c)
# Counter({'red': 4, 'blue': 2})
c = Counter(cats=4, dogs=8)             # a new counter from keyword args
print(c)
# Counter({'dogs': 8, 'cats': 4})

In addition to the basic counting function, it also supports 1 common related functions. For example:

Sort by frequency (most_common ([n])). Where n is an optional input, indicating the most frequent elements of the previous n and their frequency. All elements are returned by default. Output the element itself at frequency (elements ()). It will return the elements themselves, but the order of the elements is not the original, and the same elements will be output continuously. Between different elements, output in the order in which they appear, this is a feature provided by OrderedDict and dict after 3.7. Subtract the two Counter (substract (c)). It subtracts the number of occurrences of the corresponding element in the second counter from the first counter. For elements that appear only in the second coutner, they appear 0 times in the first counter by default.


c = Counter(a=4, b=2, c=0, d=-2)
sorted(c.elements())
# ['a', 'a', 'a', 'a', 'b', 'b']
Counter('abracadabra').most_common(3)
# [('a', 5), ('b', 2), ('r', 2)]

c1 = Counter(a=4, b=2, d=-2)
c2 = Counter(a=1, b=2, c=3, d=4)
c1.subtract(c2)
c1
# Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6})

For more reference information, please refer to the official documents:

https://docs.python.org/3/library/collections.html

These are the 4 very practical python built-in data structure details, more information about python built-in data structure please pay attention to other related articles on this site!