Solve some problems of pytorch model replication

  • 2021-09-16 07:19:58
  • OfStack

Direct use


model2=model1

It will appear that when model2 is updated, the weight of model1 will also be updated, which is different from its original purpose.

It was commented that the following could be used:


model2=copy.deepcopy(model1)

To achieve deep copy, there is no pytorch environment, which has not been tested. Who has tested it can tell me if it is useful.

Original method:

All model replication that you want to use can use the following methods.


torch.save(model, "net_params.pkl")
model5=Cnn(3,10)
model5=torch.load('net_params.pkl')

This writing does not affect the weight of the original model

Supplement: 1 pits encountered in pytorch model training process (continuously updated)

To train a model, it is mainly divided into several parts, as follows.

Data preprocessing

If you get started, you must practice with MNIST handwritten data set first.

pytorch help us to make a data generator module, which has Dataset, TensorDataset, DataLoader and other classes to create data entry.

Previously, dataset.from_generator () was used in tensorflow, and it is similar in pytorch. So far, I know that there are two ways to achieve it.

The first one inherits dataset defined by pytorch and rewrites the method. As follows, you get an DataLoader generator.


class MyDataset(Dataset):
 def __init__(self, data, labels):
 self.data = data
 self.labels = labels
 def __getitem__(self, index):
 return self.data[index], self.labels[index]
 def __len__(self):
 return len(self.labels)
 
train_dataset = MyDataset(train_data, train_label)
train_loader = DataLoader(dataset = train_dataset,
 batch_size = 1,
 shuffle = True)

The second is transformation, which first converts the prepared data into pytorch variables (or Tensor), then passes them into TensorDataset, and then constructs DataLoader.


X = torch.from_numpy(train_data).float()
Y = torch.from_numpy(train_label).float()
train_dataset = TensorDataset(X, Y)
 
train_loader = DataLoader(dataset = train_dataset,
 batch_size = 1,
 shuffle = True)
 #num_workers = 2)

Model definition


class Net(nn.Module):
 
 def __init__(self):
 super(Net, self).__init__()
 self.conv1 = nn.Conv2d(1, 6, 3)
 self.conv2 = nn.Conv2d(6 ,16, 3)
 
 self.fc1 = nn.Linear(400, 120)
 self.fc2 = nn.Linear(120, 84)
 self.fc3 = nn.Linear(84, 10)
 
 def forward(self, x):
 relu = F.relu(self.conv1(x))
 x = F.max_pool2d(relu, (2, 2))
 x = F.max_pool2d(F.relu(self.conv2(x)), 2)
 x = x.view(-1, self.num_flat_features(x))
 x = F.relu(self.fc1(x))
 x = F.relu(self.fc2(x))
 x = self.fc3(x)
 
 return x 
 def num_flat_features(self, x):
 size = x.size()[1:] # Except for batch_size Dimensions beyond 
 num_features = 1
 for s in size:
 num_features *= s
 return num_features

The training model must first define a network structure, as defined above, a forward propagation network. It includes convolution layer, full connection layer, maximum pooling layer, relu nonlinear activation layer (named by myself) and an view expansion, which flattens a multi-dimensional feature map into a one-dimensional one.

Where nn. Conv2d (in_channels, out_channels, kernel_size), the first parameter is the depth of input, the second is the depth of output, and the third is the size of convolution kernel.

F.max_pool2d (input, pool_size, pool_size), the second parameter is the pool session

nn.Linear(in_features, out_features)

x. view is a flat operation, but it is actually equivalent to reshape of numpy, so it is necessary to calculate the converted size.

Definition of loss function


import torch.optim as optim
 
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

After the model is defined, it means that when the input is given, the output result can be obtained. Then compare the difference between outputs and targets, which needs to be described by loss function.

Training network


for epoch in range(2): # loop over the dataset multiple times
 
 running_loss = 0.0
 for i, data in enumerate(trainloader, 0):
 # get the inputs; data is a list of [inputs, labels]
 inputs, labels = data
 
 # zero the parameter gradients
 optimizer.zero_grad()
 
 # forward + backward + optimize
 outputs = net(inputs)
 loss = criterion(outputs, labels)
 loss.backward()
 optimizer.step()
 
 # print statistics
 running_loss += loss.item()
 if i % 2000 == 1999: # print every 2000 mini-batches
  print('[%d, %5d] loss: %.3f' %
   (epoch + 1, i + 1, running_loss / 2000))
  running_loss = 0.0
 
print('Finished Training')

The above code is given in the official tutorial. What we have to do is to learn his ideas.

1. First, the number of epoch is 2, and each epoch will go through the whole training set once. running_loss is accumulated in each epoch, and the average loss is calculated once every 2000 batch data, and then print sets running_loss to 0 again.

2. Then, training is carried out in mini-batch. Before calculating the loss of each mini-batch, the gradients in the optimizer optimizer are emptied to prevent the gradients of different mini-batch from being accumulated to one. The update is divided into two steps: Step 1 calculates the loss function, then allocates the total loss to each layer, namely loss. backward (), and then updates the weight, namely optimizer. step (), using the optimizer.

Save model


PATH = '...'
torch.save(net.state_dict(), PATH)

Pit climbing summary

Generally speaking, the process is the above steps, but when I do it myself, I encounter many problems, the most important of which is that the requirements in the process of tensor propagation are unclear, resulting in many mistakes.

The first is the input data. The structure of batch data of pytorch default picture is (BATCH_SIZE, CHANNELS, IMG_H, IMG_W), so it is necessary to make some adjustments when generating data to meet the rules of BCHW.

Error messages such as "RuntimeError: Expected object of type Double but got scalar type Float for argument # 2 'mat2'" will often appear.

You can use x. double (), y. float (), z. long (), and so on.

RuntimeError: multi-target not supported. This error occurs in the loss function, and the cross entropy must be given priority for classification problems.


criterion = nn.CrossEntropyLoss()
loss = criterion(outputs, labels.long())# Where errors are reported 

This place will not report an error when I batch-size=1, but when batch-size > An error will be reported at 1 o'clock.

Check other people's code, we are basically and the official tutorial written in 1, using the official mnist data interface, the code is as follows. I don't want to start, because that means that the data format may be encapsulated and invisible, but the cost of tossing by myself is relatively high, so I tried it. It's really fragrant!


model2=copy.deepcopy(model1)
0

Print the data from the generator under 1, look at size under 1, and found that it was different from what I wrote myself. When batch_size=4, the data data.size () are all 4*1*28*28, which is the same; But labels. size () is different, I wrote the one_hot vector so it's 4*10, but it's 4.

Just print labels and see, sure enough, it's a single finger, such as tensor ([3, 2, 6, 2]).

However, the outputs of the model is still 4*10. It seems that the function nn. CrossEntropyLoss () can do calculation by itself, so he will report an error saying multi-target not supported, because lables. size () is wrong. Originally, there was only one number, but now it is 10 numbers, which is equivalent to being assigned 10 attributes.

So after slightly modifying the generator I wrote, there is no problem.

However, if you want to call data more freely, you still need to overload the object with some methods. If you use DataLoader defined by pytoch and enumerate, you will pass all the data once. If you use iter () to get an iterable object, next () cannot generate training data like tensorflow.

For example, if the above form is used, DataLoader gets a generator, and the generator objects in python are mainly determined by magic methods such as __next__ and __iter__.

The __iter__ method allows the instance to be called as follows, resulting in an iterable object, iterable, but it doesn't matter if you don't add it, because what is more important is the __next__ class method.

After writing the __next__ method, you can see that the phenomenon of overstepping is gone, and you can cycle through the data. Of course, you can also throw StopIteration to terminate the annotated part.


model2=copy.deepcopy(model1)
1

Related articles: