An in depth understanding of NumPy's concise tutorial array 1

  • 2020-05-19 05:06:26
  • OfStack

My current job is to introduce NumPy into Pyston (an Python compiler/interpreter implemented by Dropbox). In the course of my work, I got into the NumPy source code, learned about its implementation, and submitted PR to fix bug of NumPy. In working with the NumPy source code and the developers of NumPy, I found that most of today's Chinese NumPy tutorials are translated or referenced in English, resulting in a number of omissions. For example, the broadcast function in the NumPy array, almost all Chinese documents are translated as "broadcast". And one of the developers of NumPy, broadcast is a compound -- native English speakers can s "broad" + "cast" = "cast (scatter, distribute) broadly, I guess "cast scatter distribute) broadly" probably is closer to the meaning" With this in mind, I plan to launch a project to write a series of tutorials based on my knowledge of NumPy usage and the source code level.

NumPy array

The NumPy array is a multidimensional array object called ndarray. It consists of two parts:

Actual data Metadata that describes the data

Most operations are for metadata only and do not change the underlying actual data.

A few things to know about the NumPy array:

The index of the NumPy array starts at 0. All elements in an NumPy array must have the same type.

NumPy array properties

Before going into the details of the NumPy array. Let's start with the basic properties of the NumPy array. The dimension of the NumPy array is called the rank (rank), the rank of the 1-dimensional array is 1, the rank of the 2-dimensional array is 2, and so on. In NumPy, each linear array is called one axis (axes), and the rank is really the number of axes. For example, a 2-dimensional array is equivalent to two 1-dimensional arrays, with each element in the first 1-dimensional array being a 1-dimensional array. So a 1-dimensional array is going to be the axis in NumPy, the first axis is going to be the underlying array, and the second axis is going to be the array in the underlying array. And the number of axes, the rank, is the dimension of the array.

The more important ndarray object properties in the NumPy array are:

ndarray.ndim: the number of dimensions (that is, the number of array axes) of an array, equal to its rank. The most common is a 2-dimensional array (matrix). ndarray.shape: the dimensions of an array. Is an integer tuple representing the size of an array on each dimension. For example, in a 2-dimensional array, the "number of rows" and "number of columns" represent the array. ndarray.shape returns a tuple whose length is the number of dimensions, the ndim attribute. ndarray.size: the total number of array elements equal to the product of tuple elements in the shape attribute. ndarray.dtype: represents an object of type of element in an array. dtype can be created or specified using the standard Python type. You can also use the data types provided by NumPy in the previous article. ndarray.itemsize: the byte size of each element in the array. For example, the array itemsiz attribute of an element type float64 has a value of 8(float64 takes up 64 bits and each byte length is 8, so 64/8 takes up 8 bytes), or the array item of an element type complex32 has a value of 4 (32/8). ndarray.data: a buffer containing the actual array elements. Since 1 normally gets the elements through the index of the array, this attribute is not usually required.

Create an array

Let's start with creating an array. There are many ways to create an array. You can use the array function to create arrays from regular Python lists and tuples. The array type created is derived from the element type in the original sequence.


>>> from numpy import *     
>>> a = array( [2,3,4] )    
>>> a 
  array([2, 3, 4]) 
>>> a.dtype 
  dtype('int32') 
>>> b = array([1.2, 3.5, 5.1])    
>>> b.dtype 
  dtype('float64') 

When created using the array function, the arguments must be a list enclosed in square brackets, and array cannot be called with multiple numeric values as arguments.


>>> a = array(1,2,3,4)  #  error  
>>> a = array([1,2,3,4]) #  correct  

You can use a dual sequence to represent a 2-dimensional array, a 3-dimensional array to represent a 3-dimensional array, and so on.


>>> b = array( [ (1.5,2,3), (4,5,6) ] )   
>>> b 
  array([[ 1.5, 2. , 3. ], 
      [ 4. , 5. , 6. ]]) 

You can explicitly specify the type of an element in an array at creation time


>>> c = array( [ [1,2], [3,4] ], dtype=complex) 
>>> c 
  array([[ 1.+0.j, 2.+0.j], 
     [ 3.+0.j, 4.+0.j]]) 

Typically, you start with an array whose elements are unknown and whose size is known. Therefore, NumPy provides functions to create arrays using placeholders. These functions help to satisfy the need of array expansion and reduce the high computation cost.

The function zeros is used to create an array of all zeros, the function ones is used to create an array of all ones, and the function empty is used to create an array of random contents that depend on the state of memory. The default array type created (dtype) is float64.

You can view the number of bytes taken up by an element in an array with an d.dtype.itemsize.


>>> d = zeros((3,4)) 
>>> d.dtype 
dtype('float64') 
>>> d 
array([[ 0., 0., 0., 0.], 
    [ 0., 0., 0., 0.], 
    [ 0., 0., 0., 0.]]) 
>>> d.dtype.itemsize 
8 

You can also specify the types of elements in an array


>>> ones( (2,3,4), dtype=int16 ) # Manually specify the element type in the array  
   array([[[1, 1, 1, 1], 
       [1, 1, 1, 1], 
       [1, 1, 1, 1]], 
    
       [[1, 1, 1, 1], 
       [1, 1, 1, 1], 
       [1, 1, 1, 1]]], dtype=int16) 
>>> empty((2,3)) 
   array([[ 2.65565858e-316,  0.00000000e+000,  0.00000000e+000], 
       [ 0.00000000e+000,  0.00000000e+000,  0.00000000e+000]]) 

NumPy provides a function similar to arange that returns an array in the form of a sequence:


>>> arange(10, 30, 5) 
  array([10, 15, 20, 25]) 

An arithmetic sequence starting at 10 with a difference of 5. This function accepts not only integers, but also floating point arguments:


>>> arange(0,2,0.5) 
  array([ 0. , 0.5, 1. , 1.5]) 

When arange USES floating point parameters, it is usually impossible to predict the number of elements obtained due to the limited precision of floating point Numbers. Therefore, it is better to use the function linspace to receive the number of elements we want instead of range to specify the step size. The usage of linespace is as follows and is described in detail in section 1 of generic functions.


>>> numpy.linspace(-1, 0, 5) 
    array([-1. , -0.75, -0.5 , -0.25, 0. ]) 

The elements of an array are accessed by subscripts. You can access a single 11 elements of an array by enclosing a subscript in square brackets, or you can access multiple elements of an array as slices. Section 1 covers slice access.

Data types in NumPy

For scientific calculations, the built-in integer, floating point, and complex Numbers in Python are not enough, so many data types have been added to NumPy. As follows:

Basic data types in NumPy

NumPy中的基本数据类型
名称 描述
bool 用1个字节存储的布尔类型(True或False)
inti 由所在平台决定其大小的整数(1般为int32或int64)
int8 1个字节大小,-128 至 127
int16 整数,-32768 至 32767
int32 整数,-2 ** 31 至 2 ** 32 -1
int64 整数,-2 ** 63 至 2 ** 63 - 1
uint8 无符号整数,0 至 255
uint16 无符号整数,0 至 65535
uint32 无符号整数,0 至 2 ** 32 - 1
uint64 无符号整数,0 至 2 ** 64 - 1
float16 半精度浮点数:16位,正负号1位,指数5位,精度10位
float32 单精度浮点数:32位,正负号1位,指数8位,精度23位
float64或float 双精度浮点数:64位,正负号1位,指数11位,精度52位
complex64 复数,分别用两个32位浮点数表示实部和虚部
complex128或complex 复数,分别用两个64位浮点数表示实部和虚部

NumPy type conversion method is as follows:


>>> float64(42) 
  42.0 
>>> int8(42.0) 
  42 
>>> bool(42) 
  True 
>>> bool(42.0) 
  True 
>>> float(True) 
  1.0 

The type of the parameter can be specified in many function arguments, but this type parameter is optional. As follows:


>>> a = array(1,2,3,4)  #  error  
>>> a = array([1,2,3,4]) #  correct  
0

The output array

When an array is output, NumPy is displayed with a specific layout in the form of a nested list:

Line 1 outputs from left to right Each row is output from top to bottom Each slice is separated from the next by a blank line A 1-dimensional array is printed as a row, 2 dimensions make up the matrix, and 3 dimensions make up the matrix list.

>>> a = array(1,2,3,4)  #  error  
>>> a = array([1,2,3,4]) #  correct  
1

reshape will be covered in the next article

If an array is too long, NumPy automatically omits the middle and prints only the ends:


>>> a = array(1,2,3,4)  #  error  
>>> a = array([1,2,3,4]) #  correct  
2

You can disable this behavior of NumPy and force the entire array to be printed by setting the printoptions parameter.


set_printoptions(threshold='nan') 

In this way, all the elements of the array are displayed on output.


Related articles: