An example of naive Bayes classifier implemented by Python

  • 2020-06-23 01:00:33
  • OfStack

An example of THE NAIVE Bayes classifier implemented by Python is presented in this paper. To share for your reference, specific as follows:

Due to the need of work, I wrote a naive Bayes classifier.

Laplace smoothing is adopted for the attributes that do not appear to avoid the situation where the probability of the attributes that do not appear is zero and the whole conditional probability is zero.

The fundamentals of naive Bayes are easy to find on the web, but I'm not going to talk about it here, I'm going to just attach the code

Due to the need of work, I wrote a naive Bayes classifier. Laplace smoothing is adopted for the attributes that do not appear to avoid the situation where the probability of the attributes that do not appear is zero and the whole conditional probability is zero.


class NBClassify(object):
  def __init__(self, fillNa = 1):
    self.fillNa = 1
    pass
  def train(self, trainSet):
    #  Calculate the probabilities for each class 
    #  Save all tag , and the frequency of their occurrence 
    dictTag = {}
    for subTuple in trainSet:
      dictTag[str(subTuple[1])] = 1 if str(subTuple[1]) not in dictTag.keys() else dictTag[str(subTuple[1])] + 1
    #  Save each tag Probability of itself 
    tagProbablity = {}
    totalFreq = sum([value for value in dictTag.values()])
    for key, value in dictTag.items():
      tagProbablity[key] = value / totalFreq
    # print(tagProbablity)
    self.tagProbablity = tagProbablity
    ##############################################################################
    #  Calculate the conditional probability of the feature 
    #  Save basic information about feature attributes { Characteristics of the 1:{ value 1: appear 5 time ,  value 2: appear 1 time },  Characteristics of the 2:{ value 1: appear 1 time ,  value 2: appear 5 time }}
    dictFeaturesBase = {}
    for subTuple in trainSet:
      for key, value in subTuple[0].items():
        if key not in dictFeaturesBase.keys():
          dictFeaturesBase[key] = {value:1}
        else:
          if value not in dictFeaturesBase[key].keys():
            dictFeaturesBase[key][value] = 1
          else:
            dictFeaturesBase[key][value] += 1
    # dictFeaturesBase = {
      # ' professional ': {' The farmer ': 1, ' Teachers' ': 2, ' Construction workers ': 2, ' The nurse ': 1},
      # ' symptoms ': {' sneezing ': 3, ' Have a headache ': 3}
      # }
    dictFeatures = {}.fromkeys([key for key in dictTag])
    for key in dictFeatures.keys():
      dictFeatures[key] = {}.fromkeys([key for key in dictFeaturesBase])
    for key, value in dictFeatures.items():
      for subkey in value.keys():
        value[subkey] = {}.fromkeys([x for x in dictFeaturesBase[subkey].keys()])
    # dictFeatures = {
    #  ' Catch a cold  ': {' symptoms ': {' sneezing ': None, ' Have a headache ': None}, ' professional ': {' The nurse ': None, ' The farmer ': None, ' Construction workers ': None, ' Teachers' ': None}},
    #  ' A concussion ': {' symptoms ': {' sneezing ': None, ' Have a headache ': None}, ' professional ': {' The nurse ': None, ' The farmer ': None, ' Construction workers ': None, ' Teachers' ': None}},
    #  ' allergy  ': {' symptoms ': {' sneezing ': None, ' Have a headache ': None}, ' professional ': {' The nurse ': None, ' The farmer ': None, ' Construction workers ': None, ' Teachers' ': None}}
    #  }
    # initialise dictFeatures
    for subTuple in trainSet:
      for key, value in subTuple[0].items():
        dictFeatures[subTuple[1]][key][value] = 1 if dictFeatures[subTuple[1]][key][value] == None else dictFeatures[subTuple[1]][key][value] + 1
    # print(dictFeatures)
    #  Will be docile samples of items not included by None Instead of 1 A very small number, which means its probability is very small rather than zero 
    for tag, featuresDict in dictFeatures.items():
      for featureName, fetureValueDict in featuresDict.items():
        for featureKey, featureValues in fetureValueDict.items():
          if featureValues == None:
            fetureValueDict[featureKey] = 1
    #  The conditional probability of a feature is calculated from the characteristic frequency P(feature|tag)
    for tag, featuresDict in dictFeatures.items():
      for featureName, fetureValueDict in featuresDict.items():
        totalCount = sum([x for x in fetureValueDict.values() if x != None])
        for featureKey, featureValues in fetureValueDict.items():
          fetureValueDict[featureKey] = featureValues/totalCount if featureValues != None else None
    self.featuresProbablity = dictFeatures
    ##############################################################################
  def classify(self, featureDict):
    resultDict = {}
    #  Calculate each tag Conditional probability of 
    for key, value in self.tagProbablity.items():
      iNumList = []
      for f, v in featureDict.items():
        if self.featuresProbablity[key][f][v]:
          iNumList.append(self.featuresProbablity[key][f][v])
      conditionPr = 1
      for iNum in iNumList:
        conditionPr *= iNum
      resultDict[key] = value * conditionPr
    #  Contrast each tag Of the conditional probability of 
    resultList = sorted(resultDict.items(), key=lambda x:x[1], reverse=True)
    return resultList[0][0]
if __name__ == '__main__':
  trainSet = [
    ({" symptoms ":" sneezing ", " professional ":" The nurse "}, " Catch a cold  "),
    ({" symptoms ":" sneezing ", " professional ":" The farmer "}, " allergy  "),
    ({" symptoms ":" Have a headache ", " professional ":" Construction workers "}, " A concussion "),
    ({" symptoms ":" Have a headache ", " professional ":" Construction workers "}, " Catch a cold  "),
    ({" symptoms ":" sneezing ", " professional ":" Teachers' "}, " Catch a cold  "),
    ({" symptoms ":" Have a headache ", " professional ":" Teachers' "}, " A concussion "),
  ]
  monitor = NBClassify()
  # trainSet is something like that [(featureDict, tag), ]
  monitor.train(trainSet)
  #  A construction worker who sneezes 
  #  What is the probability of his catching a cold? 
  result = monitor.classify({" symptoms ":" sneezing ", " professional ":" Construction workers "})
  print(result)

Naive bayes algorithm in detail about the other, still can see this site front 1 piece https: / / www ofstack. com article / 129903. htm.

For more information about Python, please refer to Python Data Structure and Algorithm Tutorial, Python Encryption and Decryption Algorithm and Skills Summary, Python Coding skills Summary, Python Function Use Skills Summary, Python String Manipulation Skills Summary and Python Introductory and Advanced Classic Tutorial.

I hope this article has been helpful in Python programming.


Related articles: