numpy treatment of Nan in matrix: the method of average value is adopted
- 2021-01-14 06:13:09
- OfStack
Although we could replace all NaN with zeros, it would be a bad idea to do so without knowing what these values mean. If they're Kelvin, then setting them to zero is a bad strategy.
Here we replace the missing values with an average, which is based on those that are not NaN.
from numpy import *
datMat = mat([[1,2,3],[4,Nan,6]])
numFeat = shape(datMat)[1]
for i in range(numFeat):
meanVal = mean(datMat[nonzero(~isnan(datMat[:,i].A))[0],i])
#values that are not NaN (a number)
datMat[nonzero(isnan(datMat[:,i].A))[0],i] = meanVal
#set NaN values to mean