Extend the tree given a path e.
Extend the tree given a path e.g. ((outlook, sunny), ...).
an existing path in the tree ((feature, value), ...)
This implementation of ID3 does not support continuous data.
This implementation of ID3 does not support continuous data.
the data vector to classify
Given a data vector z, classify it returning the class number (0, .
Given a data vector z, classify it returning the class number (0, ..., k-1) by following a decision path from the root to a leaf.
the data vector to classify
Extract column from matrix, filtering out values rows that are not on path.
Extract column from matrix, filtering out values rows that are not on path.
the feature to consider (e.g., 2 (Humidity))
Given a k-dimensional probability vector, compute its entropy (a measure of disorder).
Given a k-dimensional probability vector, compute its entropy (a measure of disorder).
the probability vector (e.g., (0, 1) -> 0, (.5, .5) -> 1)
http://en.wikipedia.org/wiki/Entropy_%28information_theory%29
Show the flaw by printing the error message.
Show the flaw by printing the error message.
the method where the error occurred
the error message
Given a feature column (e.
Given a feature column (e.g., 2 (Humidity)) and a value (e.g., 1 (High)) use the frequency of ocurrence the value for each classification (e.g., 0 (no), 1 (yes)) to estimate k probabilities. Also, determine the fraction of training cases where the feature has this value (e.g., fraction where Humidity is High = 7/14).
one of the possible values for this feature (e.g., 1 (High))
Compute the information gain due to using the values of a feature/attribute to distinguish the training cases (e.
Compute the information gain due to using the values of a feature/attribute to distinguish the training cases (e.g., how well does Humidity with its values Normal and High indicate whether one will play tennis).
the feature to consider (e.g., 2 (Humidity))
Find the most frequent classification.
Find the most frequent classification.
array of discrete classifications
Train the decsion tree.
Train the decsion tree.
This class implements a Decision Tree classifier using the ID3 algorithm. The classifier is trained using a data matrix x and a classification vector y. Each data vector in the matrix is classified into one of k classes numbered 0, ..., k-1. Each column in the matrix represents a feature (e.g., Humidity). The vc array gives the number of distinct values per feature (e.g., 2 for Humidity).