numpy creates a neural network framework

2021-11-24 01:54:42
OfStack

Application method and design idea of catalogue neural network framework Project introduction
Introduction to the Framework

Application method and design idea of neural network framework

pytorch is basically imitated on the frame's own handwriting stand to learn the basic algorithms of neural network, such as forward propagation, back propagation, various layers and various activation functions Using object-oriented thinking to program, the idea is clear Students who want neural networks can refer to 1 The general framework of the code is clear, but it does not deny that there are ugly parts and poor imitation of pytorch

Project introduction

MINST_recognition:

Handwritten numeral recognition, using MINST data set

The accuracy can reach 93% after 30 rounds of training, and the accuracy can't continue to rise after about 500 rounds of training

RNN_sin_to_cos:

The cyclic neural network RNN is used to predict the curve of cos with the curve of sin

At present, there is still bug, which cannot be trained normally

Introduction to the Framework

The code related to the framework is placed in the mtorch folder Use process

Similar to pytorch, you need to define your own neural network, loss function, gradient descent optimization algorithm, and so on

In every round of training, the sample input is first obtained and then input into its own neural network to obtain the output. Then the predicted results and expected results are given to the loss function calculation loss, and the gradient is calculated by loss. Finally, the parameters of neural network are updated by the optimizer.

Better understood in combination with code:

The following is the body code for handwritten numeral recognition using the MINST dataset


	#  Define a network  define neural network
	class DigitModule(Module):
	    def __init__(self):
	        #  The calculation order will be in the order defined here 
	        sequential = Sequential([
	            layers.Linear2(in_dim=ROW_NUM * COLUM_NUM, out_dim=16, coe=2),
	            layers.Relu(16),
	            layers.Linear2(in_dim=16, out_dim=16, coe=2),
	            layers.Relu(16),
	            layers.Linear2(in_dim=16, out_dim=CLASS_NUM, coe=1),
	            layers.Sigmoid(CLASS_NUM)
	        ])
	        super(DigitModule, self).__init__(sequential)
	
	
	module = DigitModule()  #  Create a model  create module
	loss_func = SquareLoss(backward_func=module.backward)  #  Define loss function  define loss function
	optimizer = SGD(module, lr=learning_rate)  #  Define Optimizer  define optimizer
	
	
	for i in range(EPOCH_NUM):  #  Co-training EPOCH_NUM Wheel 
	    trainning_loss = 0  #  Calculation 1 Next current 1 Wheel-trained loss Value, you can have no 
	    for data in train_loader:  #  Traverse through all samples, train_loader Is an iterative object that holds all the data in the dataset 
	        imgs, targets = data  #  Split data into pictures and labels 
	        outputs = module(imgs)  #  Input the input value of the sample into its own neural network 
	        loss = loss_func(outputs, targets, transform=True)  #  Calculation loss / calculate loss
	        trainning_loss += loss.value
	        loss.backward()  #  Calculating gradient by back propagation  / calculate gradiant through back propagation
	        optimizer.step()  #  Adjust model parameters through optimizer  / adjust the weights of network through optimizer
	    if i % TEST_STEP == 0:  #  Every training TEST_STEP Wheel test 1 Under the results of current training 
	        show_effect(i, module, loss_func, test_loader, i // TEST_STEP)
	        print("{} turn finished, loss of train set = {}".format(i, trainning_loss))

Next, I will introduce the written classes one by one. These classes have the same name and function in pytorch, which is modeled after pytorch:

Class Module

Different from pytorch, there can only be one Sequential class (sequence), in which each layer and sequence of neural network are defined, and then passed to the constructor of Module class
Forward propagation: Call forward propagation of Sequential
Back propagation: Back propagation of calling Sequential
So far, most of the functionality of this class is the same as that of Sequential, except that it has a shell to ensure that it is the same as pytorch

lossfunction

There are different loss functions, and the constructor needs to specify the back propagation function of the neural network defined by itself
Calling the loss function returns an object of the Loss class, which records the loss value.
The back propagation computation gradient is achieved by calling the. backward () method of the Loss class
Internal mechanism:
Internally, it actually calls the back propagation function of the neural network defined by itself
It is also a poor imitation of pytorch, which is completely unnecessary. It is good to call it directly through Module

Optimizer:

At present, only the random gradient descent SGD is realized
The constructor argument is a self-defined Module. After the gradient has been calculated, call optimizer. step () to change the parameter values of each layer within Module
Internal mechanism:
At present, because there are only SGD1 algorithms, it is only a poor imitation for the time being
That is to say, Module. step () is called under 1, then Module calls Sequential. step (), and finally Sequential calls Layer. step () of each internal layer to realize update
The gradient value is calculated at loss. backward and stored in each layer

Class Layer

There are many different layers

Commonness
Forward propagation:

Accept 1 input for forward propagation calculation and output 1 output
Will save the input and use it in back propagation

Back propagation:

Accept the gradient of the forward propagation output, calculate the gradient of its own parameters (such as w and b in Linear) and save it
The output value is the gradient value of the forward propagation input, which is used to allow the upper layer 1 (which may not be) to continue the back propagation calculation
In this way, different layers can be assembled arbitrarily without hindering forward propagation and back propagation

. step method

Update its own parameter values (or not, such as activation layer and pooling layer)

Class Sequential

This class is also inherited from Layer and can be used as Layer 1

It assembles multiple layers into one in sequence, and calculates them in sequence when propagating forward and backward

Combined with its forward, backward methods to understand:


	def forward(self, x):
	    out = x
	    for layer in self.layers:
	        out = layer(out)
	    return out
	
	def backward(self, output_gradiant):
	    layer_num = len(self.layers)
	    delta = output_gradiant
	    for i in range(layer_num - 1, -1, -1):
	        #  Reverse traversal of layers ,  Back propagating the desired change 
	        delta = self.layers[i].backward(delta)
	
	def step(self, lr):
	    for layer in self.layers:
	        layer.step(lr)

RNN Class: Cyclic Neural Network Layer

Inherited from Layer, it is described separately under 1 because of its complicated content

The RNN consists of a fully connected layer Linear and an active layer

Forward propagation


    def forward(self, inputs):
	        """
	        :param inputs: input = (h0, x) h0.shape == (batch, out_dim) x.shape == (seq, batch, in_dim)
	        :return: outputs: outputs.shape == (seq, batch, out_dim)
	        """
	        h = inputs[0]  #  Input inputs Consists of two parts 
	        X = inputs[1]
	        if X.shape[2] != self.in_dim or h.shape[1] != self.out_dim:
	            #  Check whether there is a problem with the entered shape 
	            raise ShapeNotMatchException(self, "forward: wrong shape: h0 = {}, X = {}".format(h.shape, X.shape))
	
	        self.seq_len = X.shape[0]  #  Length of time series 
	        self.inputs = X  #  Save the input, and then use the back propagation 
	        output_list = []  #  Save the output at each point in time 
	        for x in X:
	            #  Traversing in time series input
	            # x.shape == (batch, in_dim), h.shape == (batch, out_dim)
	            h = self.activation(self.linear(np.c_[h, x]))
	            output_list.append(h)
	        self.outputs = np.stack(output_list, axis=0)  #  Convert a list to 1 Save the matrix 
	        return self.outputs

Back propagation


	def backward(self, output_gradiant):
	    """
	    :param output_gradiant: shape == (seq, batch, out_dim)
	    :return: input_gradiant
	    """
	    if output_gradiant.shape != self.outputs.shape:
	        #  Expect to get (seq, batch, out_dim) Shape 
	        raise ShapeNotMatchException(self, "__backward: expected {}, but we got "
	                                           "{}".format(self.outputs.shape, output_gradiant.shape))
	
	    input_gradients = []
	    #  Each time_step Virtual on weight_gradient,  The final average is the total weight_gradient
	    weight_gradients = np.zeros(self.linear.weights_shape())
	    bias_gradients = np.zeros(self.linear.bias_shape())
	    batch_size = output_gradiant.shape[1]
	
	    # total_gradient:  When propagating forward, it is x, h Synthesize 1 So back propagation also calculates the gradient of this large matrix and then splits it into x_grad, h_grad
	    total_gradient = np.zeros((batch_size, self.out_dim + self.in_dim))
	    h_gradient = None
	    
	    #  The gradient value of each time layer is calculated by traversing each time layer in reverse 
	    for i in range(self.seq_len - 1, -1, -1):
	        #  Forward propagation sequence : x, h -> z -> h
	        #  So the back propagation calculation order: h_grad -> z_grad -> x_grad, h_grad, w_grad, b_grad
	
	        # %%%%%%%%%%%%%% Average version %%%%%%%%%%%%%%%%%%%%%%%
	        # h_gradient = (output_gradiant[i] + total_gradient[:, 0:self.out_dim]) / 2
	        # %%%%%%%%%%%%%% Versions that do not calculate averages %%%%%%%%%%%%%%%%%%%%%%%
	        #   Calculation h_grad:  This 1 Point-in-time h_grad Object that includes the output grad And the previous time point grad Two parts 
	        h_gradient = output_gradiant[i] + total_gradient[:, 0:self.out_dim]  
	
	        # w_grad And b_grad Is in linear.backward() If you calculate it inside, you don't need to calculate it manually 
	        z_gradient = self.activation.backward(h_gradient)  #  Calculation z_grad
	        total_gradient = self.linear.backward(z_gradient)  #  Calculation x_grad And h_grad Gradient of composite large matrix 
	
	        # total_gradient  At the same time, it includes h And x Adj. gradient, shape == (batch, out_dim + in_dim)
	        x_gradient = total_gradient[:, self.out_dim:]
	
	        input_gradients.append(x_gradient)  
	        weight_gradients += self.linear.gradients["w"]
	        bias_gradients += self.linear.gradients["b"]
	
	    # %%%%%%%%%%%%%%%%%% Average version %%%%%%%%%%%%%%%%%%%%%%%
	    # self.linear.set_gradients(w=weight_gradients / self.seq_len, b=bias_gradients / self.seq_len)
	    # %%%%%%%%%%%%%%%%%% Versions that do not calculate averages %%%%%%%%%%%%%%%%%%%%%%%
	    self.linear.set_gradients(w=weight_gradients, b=bias_gradients)  #  Setting Gradient Values 
	    
	    list.reverse(input_gradients)  # input_gradients Is in reverse order, and it is required for the final output reverse1 Under 
	    print("sum(weight_gradients) = {}".format(np.sum(weight_gradients)))
	    
	    # np.stack Is to transform the list into 1 Matrix 
	    return np.stack(input_gradients), h_gradient

The above is numpy to create a neural network framework details, more information about numpy neural network please pay attention to other related articles on this site!