numpy treatment of Nan in matrix: the method of average value is adopted

  • 2021-01-14 06:13:09
  • OfStack

Although we could replace all NaN with zeros, it would be a bad idea to do so without knowing what these values mean. If they're Kelvin, then setting them to zero is a bad strategy.

Here we replace the missing values with an average, which is based on those that are not NaN.


from numpy import *
datMat = mat([[1,2,3],[4,Nan,6]])
numFeat = shape(datMat)[1]
for i in range(numFeat):
  meanVal = mean(datMat[nonzero(~isnan(datMat[:,i].A))[0],i]) 
  #values that are not NaN (a number)
  datMat[nonzero(isnan(datMat[:,i].A))[0],i] = meanVal 
  #set NaN values to mean

Related articles: