ID3 decision tree function example based on Python

  • 2020-06-19 10:58:41
  • OfStack

An example of ID3 decision tree based on Python is presented. To share for your reference, specific as follows:

The ID3 algorithm is a kind of decision tree based on Occam's razor principle, which is to do more with less. ID3 algorithm, i.e. Iterative Dichotomiser 3, iteration 2 cross tree 3, is a decision tree algorithm invented by Ross Quinlan. The basis of this algorithm is the Occam razor principle mentioned above.

The following example is a decision tree based on ID3 idea to determine whether Marine biological data is fish


# coding=utf-8
import operator
from math import log
import time
def createDataSet():
  dataSet = [[1, 1, 'yes'],
        [1, 1, 'yes'],
        [1, 0, 'no'],
        [0, 1, 'no'],
        [0, 1, 'no'],
        [0,0,'maybe']]
  labels = ['no surfaceing', 'flippers']
  return dataSet, labels
#  Calculate the Shannon entropy 
def calcShannonEnt(dataSet):
  numEntries = len(dataSet)
  labelCounts = {}
  for feaVec in dataSet:
    currentLabel = feaVec[-1]
    if currentLabel not in labelCounts:
      labelCounts[currentLabel] = 0
    labelCounts[currentLabel] += 1
  shannonEnt = 0.0
  for key in labelCounts:
    prob = float(labelCounts[key]) / numEntries
    shannonEnt -= prob * log(prob, 2)
  return shannonEnt
def splitDataSet(dataSet, axis, value):
  retDataSet = []
  for featVec in dataSet:
    if featVec[axis] == value:
      reducedFeatVec = featVec[:axis]
      reducedFeatVec.extend(featVec[axis + 1:])
      retDataSet.append(reducedFeatVec)
  return retDataSet
def chooseBestFeatureToSplit(dataSet):
  numFeatures = len(dataSet[0]) - 1 #  Because at the end of the data set 1 Is the label 
  baseEntropy = calcShannonEnt(dataSet)
  bestInfoGain = 0.0
  bestFeature = -1
  for i in range(numFeatures):
    featList = [example[i] for example in dataSet]
    uniqueVals = set(featList)
    newEntropy = 0.0
    for value in uniqueVals:
      subDataSet = splitDataSet(dataSet, i, value)
      prob = len(subDataSet) / float(len(dataSet))
      newEntropy += prob * calcShannonEnt(subDataSet)
    infoGain = baseEntropy - newEntropy
    if infoGain > bestInfoGain:
      bestInfoGain = infoGain
      bestFeature = i
  return bestFeature
#  Because we recursively build the decision tree based on the cost of the attributes, there may be some cases where the last attributes run out, but the classification 
#  If the calculation is still incomplete, the node classification will be calculated by majority vote 
def majorityCnt(classList):
  classCount = {}
  for vote in classList:
    if vote not in classCount.keys():
      classCount[vote] = 0
    classCount[vote] += 1
  return max(classCount)
def createTree(dataSet, labels):
  classList = [example[-1] for example in dataSet]
  if classList.count(classList[0]) == len(classList): #  If the categories are the same, the division is stopped 
    return classList[0]
  if len(dataSet[0]) == 1: #  All the features are used up 
    return majorityCnt(classList)
  bestFeat = chooseBestFeatureToSplit(dataSet)
  bestFeatLabel = labels[bestFeat]
  myTree = {bestFeatLabel: {}}
  del (labels[bestFeat])
  featValues = [example[bestFeat] for example in dataSet]
  uniqueVals = set(featValues)
  for value in uniqueVals:
    subLabels = labels[:] #  The contents of the original list were copied in order not to change 1 Under the 
    myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet,
                                bestFeat, value), subLabels)
  return myTree
def main():
  data, label = createDataSet()
  t1 = time.clock()
  myTree = createTree(data, label)
  t2 = time.clock()
  print myTree
  print 'execute for ', t2 - t1
if __name__ == '__main__':
  main()

The operation results are as follows:


{'no surfaceing': {0: {'flippers': {0: 'maybe', 1: 'no'}}, 1: {'flippers': {0: 'no', 1: 'yes'}}}}
execute for 0.0103958394532

Finally, we can test the script 1. If we want to draw the generated decision tree with an image, we only need to define an plottree function in the script.

For more information about Python, please refer to Python Data Structure and Algorithm Tutorial, Python Encryption and Decryption Algorithm and Skills Summary, Python Coding skills Summary, Python Function Use Skills Summary, Python String Manipulation Skills Summary and Python Introduction and Advanced Classic Tutorial.

I hope this article has been helpful in Python programming.


Related articles: