Solve the problem of tensorflow model parameter saving and loading

  • 2020-11-18 06:19:45
  • OfStack

Finally found the bug reason! Remember 1; Or not familiar with the platform caused by the reason!

Q: Why is it that two model objects run from 1 in the same file? When they are read directly and run separately, the trained model will be wrong, and there will always be 1 correct and 1 read wrong? It doesn't go wrong to train and reload the model directly from the same file, and even weirder, it doesn't go wrong to load the model from the object in the file, right?

model.py, which contains ModelV and ModelP, and modelP.py and modelV.py, which contain only ModelP and ModeV respectively, are used to train the model respectively, and then loaded in ES21en.py:


# -*- coding: utf8 -*-

import tensorflow as tf

class ModelV():

 def __init__(self):

  self.v1 = tf.Variable(66, name="v1")
  self.v2 = tf.Variable(77, name="v2")
  self.save_path = "model_v/model.ckpt"
  self.init = tf.global_variables_initializer()
  self.saver = tf.train.Saver()
  self.sess = tf.Session()

 def train(self):
  self.sess.run(self.init)
  print 'v2', self.v2.eval(self.sess)

  self.saver.save(self.sess, self.save_path)
  print "ModelV saved."

 def predict(self):

  all_vars = tf.trainable_variables()
  for v in all_vars:
   print(v.name)
  self.saver.restore(self.sess, self.save_path)
  print "ModelV restored."
  print 'v2', self.v2.eval(self.sess)
  print '------------------------------------------------------------------'

class ModelP():

 def __init__(self):

  self.p1 = tf.Variable(88, name="p1")
  self.p2 = tf.Variable(99, name="p2")
  self.save_path = "model_p/model.ckpt"
  self.init = tf.global_variables_initializer()
  self.saver = tf.train.Saver()
  self.sess = tf.Session()

 def train(self):
  self.sess.run(self.init)
  print 'p2', self.p2.eval(self.sess)

  self.saver.save(self.sess, self.save_path)
  print "ModelP saved."

 def predict(self):

  all_vars = tf.trainable_variables()
  for v in all_vars:
   print v.name
  self.saver.restore(self.sess, self.save_path)
  print "ModelP restored."
  print 'p2', self.p2.eval(self.sess)
  print '---------------------------------------------------------------------'


if __name__ == '__main__':
 v = ModelV()
 p = ModelP()
 v.predict()
 #v.train()
 p.predict() 
 #p.train()

Here tf.global_ES28en_initializer () is key! Although you are respectively in object ModelP and internal distribution and defined tf ModelV Variable (), namely v1 v2 and p1 p2, but for tf this module, these are global variables, can through the following code to view all the variables, you will find that with one file at the same time run ModelP and ModelV after initialization all print out one kind of variables, this is the key to the problem:


all_vars = tf.trainable_variables()
for v in all_vars:
 print(v.name)

Error. You can exchange the order in which modelP and modelV are initialized to see how the error messages change


v1:0
v2:0
p1:0
p2:0
ModelV restored.
v2 77
v1:0
v2:0
p1:0
p2:0
W tensorflow/core/framework/op_kernel.cc:975] Not found: Key v2 not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:975] Not found: Key v1 not found in checkpoint

In fact, when run separately, the model saves the correct parameters, because Variable only has v1 v2 or p1 p2 in one model. But when 1 file is run at the same time, the model parameters actually save v1 v2 p1 p24, because by default, the Saver created will save all the parameters directly. And Saver. restore () is the default (no Variable argument list) according to have the global model defined variable to load the corresponding parameter values, in ModelV. predict, according to the order (can be seen from the debug, should be in accordance with the parameters order 1 test) to look for in the model file corresponding key, at this time to find the corresponding v1 v2, load the success, but in ModelP. predict, In the model file of model_p, v1 and v2 cannot be found. Only p1 and p2 will report an error. However, p1 and p2 could not be found in the first load and no error was reported. The explanation could not be made, so it is still to be continued

Saver.save () and Saver.restore () are 1 pair, and only the parameters of the model are saved and loaded, respectively, but how do you know the structure of the model ? You have to define it and it has to match the saved model to load it;

If you want to load the model structure and model parameter values directly without defining the model, use


#  loading   Structure, that is,   Model parameters   Variable etc. 
new_saver = tf.train.import_meta_graph("model_v/model.ckpt.meta")
print "ModelV construct"
all_vars = tf.trainable_variables()
for v in all_vars:
 print v.name
 #print v.name,v.eval(self.sess) # v  Is not initialized and cannot be evaluated 
#  Load model   Parameters of the variable   the   value 
new_saver.restore(self.sess, tf.train.latest_checkpoint('model_v/'))
print "ModelV restored."
all_vars = tf.trainable_variables()
for v in all_vars:
 print v.name,v.eval(self.sess)

After loading structure, i.e., model parameter variables, etc., there will be a variable, but its value cannot be accessed because it has not been assigned, and then restore1 will get the value

The solution to the above error is the improved version of ES92en.py. tf. train. Saver can be parameterized, it can save the model parameters you want to save, if there are no parameters, it is likely to save tf. trainable_variables() all variable, and tf. trainable_variables() is obtained from tf globally, so just construct the corresponding tf. train. Saver when the model is saved and loaded. This will save and load the correct model


# -*- coding: utf8 -*-

import tensorflow as tf

class ModelV():

 def __init__(self):

  self.v1 = tf.Variable(66, name="v1")
  self.v2 = tf.Variable(77, name="v2")
  self.save_path = "model_v/model.ckpt"
  self.init = tf.global_variables_initializer()

  self.sess = tf.Session()

 def train(self):
  saver = tf.train.Saver([self.v1, self.v2])
  self.sess.run(self.init)
  print 'v2', self.v2.eval(self.sess)

  saver.save(self.sess, self.save_path)
  print "ModelV saved."

 def predict(self):
  saver = tf.train.Saver([self.v1, self.v2])
  all_vars = tf.trainable_variables()
  for v in all_vars:
   print v.name

  v_vars = [v for v in all_vars if v.name == 'v1:0' or v.name == 'v2:0']
  print "ModelV restored."
  saver.restore(self.sess, self.save_path)
  for v in v_vars:
   print v.name,v.eval(self.sess) 
  print 'v2', self.v2.eval(self.sess)
  print '------------------------------------------------------------------'

class ModelP():

 def __init__(self):

  self.p1 = tf.Variable(88, name="p1")
  self.p2 = tf.Variable(99, name="p2")
  self.save_path = "model_p/model.ckpt"
  self.init = tf.global_variables_initializer()
  self.sess = tf.Session()

 def train(self):
  saver = tf.train.Saver([self.p1, self.p2])
  self.sess.run(self.init)
  print 'p2', self.p2.eval(self.sess)

  saver.save(self.sess, self.save_path)
  print "ModelP saved."

 def predict(self):
  saver = tf.train.Saver([self.p1, self.p2])
  all_vars = tf.trainable_variables()
  p_vars = [v for v in all_vars if v.name == 'p1:0' or v.name == 'p2:0']
  for v in all_vars:
   print v.name
   #print v.name,v.eval(self.sess)
  saver.restore(self.sess, self.save_path)
  print "ModelP restored."
  for p in p_vars:
   print p.name,p.eval(self.sess)
  print 'p2', self.p2.eval(self.sess)
  print '----------------------------------------------------------'


if __name__ == '__main__':
 v = ModelV()
 p = ModelP()
 v.predict()
 #v.train()
 p.predict() 
 #p.train()

Summary: Saver is best constructed with the Variable parameter to ensure that the save and load perform correctly


Related articles: