scalation.analytics

DecisionTreeC45

class DecisionTreeC45 extends ClassifierInt

The DecisionTreeC45 class implements a Decision Tree classifier using the C4.5 algorithm. The classifier is trained using a data matrix 'x' and a classification vector 'y'. Each data vector in the matrix is classified into one of 'k' classes numbered '0, ..., k-1'. Each column in the matrix represents a feature (e.g., Humidity). The 'vc' array gives the number of distinct values per feature (e.g., 2 for Humidity).

Linear Supertypes
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. DecisionTreeC45
  2. ClassifierInt
  3. Error
  4. Classifier
  5. AnyRef
  6. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DecisionTreeC45(x: MatrixI, y: VectorI, fn: Array[String], isCont: Array[Boolean], k: Int, cn: Array[String], vc: VectorI = null)

    x

    the data vectors stored as rows of a matrix

    y

    the class array, where y_i = class for row i of the matrix x

    fn

    the names for all features/variables

    isCont

    boolean value to indicate whether according feature is continuous

    k

    the number of classes

    cn

    the names for all classes

    vc

    the value count array indicating number of distinct values per feature

Type Members

  1. class Node extends AnyRef

    Class that contains information for a tree node.

Value Members

  1. final def !=(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  5. def buildTree(opt: (Int, Double)): Unit

    Given the next most distinguishing feature/attribute, extend the decision tree.

    Given the next most distinguishing feature/attribute, extend the decision tree.

    opt

    the optimal feature and its gain

  6. def calThreshold(f: Int): Unit

    Given a continuous feature, adjust its threshold to improve gain.

    Given a continuous feature, adjust its threshold to improve gain.

    f

    the feature index to consider

  7. def classify(z: VectorD): (Int, String)

    Given a data vector z, classify it returning the class number (0, .

    Given a data vector z, classify it returning the class number (0, ..., k-1) by following a decision path from the root to a leaf.

    z

    the data vector to classify (some continuous features)

    Definition Classes
    DecisionTreeC45ClassifierIntClassifier
  8. def classify(z: VectorI): (Int, String)

    Given a data vector z, classify it returning the class number (0, .

    Given a data vector z, classify it returning the class number (0, ..., k-1) by following a decision path from the root to a leaf.

    z

    the data vector to classify (purely discrete features)

    Definition Classes
    DecisionTreeC45Classifier
  9. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  10. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  12. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. def flaw(method: String, message: String): Unit

    Show the flaw by printing the error message.

    Show the flaw by printing the error message.

    method

    the method where the error occurred

    message

    the error message

    Definition Classes
    Error
  14. def frequency(fCol: VectorI, value: Int, cont: Boolean = false, thres: Double = 0): (Double, VectorD)

    Given a feature column (e.

    Given a feature column (e.g., 2 (Humidity)) and a value (e.g., 1 (High)) use the frequency of ocurrence the value for each classification (e.g., 0 (no), 1 (yes)) to estimate k probabilities. Also, determine the fraction of training cases where the feature has this value (e.g., fraction where Humidity is High = 7/14).

    fCol

    a feature column to consider (e.g., Humidity)

    value

    one of the possible values for this feature (e.g., 1 (High))

    cont

    indicates whether is calculating continuous feature

    thres

    threshold for continuous feature

  15. def gain(f: Int): Double

    Compute the information gain due to using the values of a feature/attribute to distinguish the training cases (e.

    Compute the information gain due to using the values of a feature/attribute to distinguish the training cases (e.g., how well does Humidity with its values Normal and High indicate whether one will play tennis).

    f

    the feature to consider (e.g., 2 (Humidity))

  16. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  17. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  18. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  19. val m: Int

    the number of data vectors in training-set (# rows)

    the number of data vectors in training-set (# rows)

    Attributes
    protected
    Definition Classes
    ClassifierInt
  20. val md: Double

    the training-set size as a Double

    the training-set size as a Double

    Attributes
    protected
    Definition Classes
    ClassifierInt
  21. val n: Int

    the number of features/variables (# columns)

    the number of features/variables (# columns)

    Attributes
    protected
    Definition Classes
    ClassifierInt
  22. val nd: Double

    the feature-set size as a Double

    the feature-set size as a Double

    Attributes
    protected
    Definition Classes
    ClassifierInt
  23. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  24. def nextXY(f: Int, value: Int): (MatrixI, VectorI)

    Return new x matrix and y array for next step of constructing decision tree.

    Return new x matrix and y array for next step of constructing decision tree.

    f

    the feature index

    value

    one of the features values

  25. final def notify(): Unit

    Definition Classes
    AnyRef
  26. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  27. def printTree: Unit

    Print out the decision tree using Breadth First Search (BFS).

  28. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  29. def test(xx: MatrixI, yy: VectorI): Double

    Test the quality of the training with a test-set and return the fraction of correct classifications.

    Test the quality of the training with a test-set and return the fraction of correct classifications.

    xx

    the integer-valued test vectors stored as rows of a matrix

    yy

    the test classification vector, where yy_i = class for row i of xx

    Definition Classes
    ClassifierInt
  30. def toString(): String

    Definition Classes
    AnyRef → Any
  31. def train(): Unit

    Train the classifier, i.

    Train the classifier, i.e., determine which feature provides the most information gain and select it as the root of the decision tree.

    Definition Classes
    DecisionTreeC45Classifier
  32. def vc_default: VectorI

    Return default values for binary input data (value count (vc) set to 2).

    Return default values for binary input data (value count (vc) set to 2).

    Definition Classes
    ClassifierInt
  33. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  34. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  35. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  36. val x: MatrixI

    the data vectors stored as rows of a matrix

  37. val y: VectorI

    the class array, where y_i = class for row i of the matrix x

Inherited from ClassifierInt

Inherited from Error

Inherited from Classifier

Inherited from AnyRef

Inherited from Any

Ungrouped