DecisionTree

Compute the accuracy of the classification, i.e., the fraction of correct classifications. Note, the correct classifications tp_i are in the main diagonal of the confusion matrix.

Attributes

Inherited from:: FitC

Add multiple child nodes to the tree via branchs from node 'n'.

Value parameters

n: the parent node
vc: the branch value and child node, repeatable

Attributes

Inherited from:: DecisionTree

Add child node c to the tree via branch v from node n.

Value parameters

c: the child node
n: the parent node
v: the branch value from the parent node

Attributes

Inherited from:: DecisionTree

Add the root node to the tree.

Value parameters

r: the root node of the tree

Attributes

Inherited from:: DecisionTree

Perform backward elimination to find the least predictive variable to remove from the existing model, returning the variable to eliminate, the new parameter vector and the new Quality of Fit (QoF). May be called repeatedly.

Value parameters

cols: the columns of matrix x currently included in the existing model
first: first variable to consider for elimination (default (1) assume intercept x_0 will be in any model)
idx_q: index of Quality of Fit (QoF) to use for comparing quality

Attributes

See also: Fit for index of QoF measures.
Inherited from:: Classifier

Perform backward elimination to find the least predictive variables to remove from the full model, returning the variables left and the new Quality of Fit (QoF) measures for all steps.

Value parameters

cross: whether to include the cross-validation QoF measure
first: first variable to consider for elimination
idx_q: index of Quality of Fit (QoF) to use for comparing quality

Attributes

See also: Fit for index of QoF measures.
Inherited from:: Classifier

Of all the pruning candidates, find the one with the least gain.

Value parameters

can: the nodes that are canidates for pruning

Attributes

Inherited from:: DecisionTree

Build a sub-model that is restricted to the given columns of the data matrix. Override for models that support feature selection.

Value parameters

x_cols: the columns that the new model is restricted to

Attributes

Inherited from:: Classifier

Calculate the entropy of the tree as the weighted average over the list of nodes (defaults to leaves).

Value parameters

nodes: the nodes to compute the weighted entropy over

Attributes

Inherited from:: DecisionTree

Find candidate nodes that may be pruned, i.e., those that are parents of leaf nodes, restricted to those that don't have any children that are themselves internal nodes.

Attributes

Inherited from:: DecisionTree

Given a continuous data vector z, classify it returning the class number (0, ..., k-1) with the highest relative posterior probability. Return the best class, its name and its relative probability.

Value parameters

z: the data vector to classify

Attributes

Inherited from:: Classifier

Given a discrete data vector z, classify it returning the class number (0, ..., k-1) with the highest relative posterior probability. Return the best class, its name and its relative probability.

Value parameters

z: the data vector to classify

Attributes

Inherited from:: Classifier

Clear the total cummulative confusion matrix.

Attributes

Inherited from:: FitC

Compare the actual class y vector versus the predicted class yp vector, returning the confusion matrix cmat, which for k = 2 is yp 0 1 ---------- y 0 | tn fp | 1 | fn tp | ---------- Note: ScalaTion's confusion matrix is Actual × Predicted, but to swap the position of actual y (rows) with predicted yp (columns) simply use cmat.transpose, the transpose of cmat.

Value parameters

y_: the actual class values/labels for full (y) or test (y_e) dataset
yp: the predicted class values/labels

Attributes

See also: www.dataschool.io/simple-guide-to-confusion-matrix-terminology
Inherited from:: FitC

Contract the actual class y_ vector versus the predicted class yp vector.

Value parameters

y_: the actual class values/labels for full (y) or test (y_e) dataset
yp: the predicted class values/labels

Attributes

Inherited from:: FitC

Attributes

Inherited from:: Classifier

Diagnose the health of the model by computing the Quality of Fit (QoF) measures, from the error/residual vector and the predicted & actual responses. Requires the actual and predicted responses to be non-negative integers. Must override when there negative responses.

Value parameters

y_: the actual response/output vector to use (test/full)
yp: the predicted response/output vector (test/full)

Attributes

Inherited from:: FitC

Diagnose the health of the model by computing the Quality of Fit (QoF) measures, from the error/residual vector and the predicted & actual responses. For some models the instances may be weighted.

Value parameters

w: the weights on the instances (defaults to null)
y_: the actual response/output vector to use (test/full)
yp: the predicted response/output vector (test/full)

Attributes

Definition Classes: FitC -> FitM
Inherited from:: FitC

Compute the F1-measure, i.e., the harmonic mean of the precision and recall.

Value parameters

p: the precision
r: the recall

Attributes

Inherited from:: FitC

Compute the micro-F1-measure vector, i.e., the harmonic mean of the precision and recall.

Attributes

Inherited from:: FitC

Return the Quality of Fit (QoF) measures corresponding to the labels given above in the fitLabel method.

Attributes

Inherited from:: FitC

Return the labels for the Quality of Fit (QoF) measures. Override to add additional QoF measures.

Attributes

Inherited from:: FitC

Return the Quality of Fit (QoF) vector micro-measures, i.e., measures for each class.

Attributes

Inherited from:: FitC

Perform forward selection to find the most predictive variable to add the existing model, returning the variable to add and the new model. May be called repeatedly.

Value parameters

cols: the columns of matrix x currently included in the existing model
idx_q: index of Quality of Fit (QoF) to use for comparing quality

Attributes

See also: Fit for index of QoF measures.
Inherited from:: Classifier

Perform forward selection to find the most predictive variables to have in the model, returning the variables added and the new Quality of Fit (QoF) measures for all steps.

Value parameters

cross: whether to include the cross-validation QoF measure
idx_q: index of Quality of Fit (QoF) to use for comparing quality

Attributes

See also: Fit for index of QoF measures.
Inherited from:: Classifier

Return the feature/variable names.

Attributes

Inherited from:: Classifier

Return the used data matrix x. Mainly for derived classes where x is expanded from the given columns in x_, e.g., SymbolicRegression.quadratic adds squared columns.

Attributes

Inherited from:: Classifier

Return the used response vector y. Mainly for derived classes where y is transformed, e.g., TranRegression, Regression4TS.

Attributes

Inherited from:: Classifier

Return the help string that describes the Quality of Fit (QoF) measures provided by the FitC class. Override to correspond to fitLabel.

Attributes

Inherited from:: FitC

Return the hyper-parameters.

Attributes

Inherited from:: Classifier

Compute Cohen's kappa coefficient that measures agreement between actual y and predicted yp classifications.

Value parameters

y_: the actual response/output vector to use (test/full)
yp: the predicted response/output vector (test/full)

Attributes

See also: en.wikipedia.org/wiki/Cohen%27s_kappa
Inherited from:: FitC

Given a continuous data vector z, classify it returning the class number (0, ..., k-1) with the highest relative posterior probability. Return the best class, its name and its relative log-probability. This method adds "positive log probabilities" to avoids underflow. To recover q relative probability compute 2^(-q) where q is a plog.

Value parameters

z: the data vector to classify

Attributes

Inherited from:: Classifier

Given a discrete data vector z, classify it returning the class number (0, ..., k-1) with the highest relative posterior probability. Return the best class, its name and its relative log-probability. This method adds "positive log probabilities" to avoids underflow. To recover q relative probability compute 2^(-q) where q is a plog.

Value parameters

z: the data vector to classify

Attributes

Inherited from:: Classifier

Determine whether all the children of node n are leaf nodes.

Value parameters

n: the node in question

Attributes

Inherited from:: DecisionTree

Attributes

Inherited from:: Classifier

Predict the integer value of y = f(z) by computing the product of the class probabilities p_y and all the conditional probabilities P(X_j = z_j | y = c) and returning the class with the highest relative probability. This method adds "positive log probabilities" to avoids underflow. To recover q relative probability compute 2^(-q) where q is a plog.

Value parameters

z: the new vector to predict

Attributes

Inherited from:: Classifier

As part of tree pruning, turn an internal node into a leaf.

Value parameters

n: the node to turn into a leaf (pruning all nodes below it)

Attributes

Inherited from:: DecisionTree

Return the number of terms/parameters in the model, e.g., b_0 + b_1 x_1 + b_2 x_2 has three terms.

Attributes

Inherited from:: Classifier

Compute the micro-precision, micro-recall and micro-specificity vectors which have elements for each class i in {0, 1, ... k-1}. Precision is the fraction classified as true that are actually true. Recall (sensitivity) is the fraction of the actually true that are classified as true. Specificity is the fraction of the actually false that are classified as false. Note, for k = 2, ordinary precision p, recall r and specificity s will correspond to the last elements in the pv, rv and sv micro vectors.

Attributes

Inherited from:: FitC

Predict the value of y = f(z) by evaluating the model equation. Single output models return Double, while multi-output models return VectorD.

Value parameters

z: the new vector to predict

Attributes

Inherited from:: Classifier

Predict the value of vector y = f(x_) using matrix x_

Value parameters

x_: the matrix to use for making predictions, one for each row

Attributes

Inherited from:: Classifier

Auxilliary predict method facilitating recursion for VectorI.

Value parameters

n: the current node in the tree
z: the data vector to classify

Attributes

Inherited from:: DecisionTree

Auxilliary classify method facilitating recursion for VectorD.

Value parameters

n: the current node in the tree
z: the data vector to classify

Attributes

Inherited from:: DecisionTree

Print the decision tree using 'prinT' method from Node class.

Attributes

Inherited from:: DecisionTree

Compute the Efron's pseudo R-squared value. Override to McFadden's, etc.

Value parameters

p1: the first parameter
p2: the second parameter

Attributes

Inherited from:: FitC

Attributes

Inherited from:: FitM

Return the coefficient of determination (R^2). Must call diagnose first.

Attributes

Inherited from:: FitM

Return a basic report on a trained and tested model.

Value parameters

ftVec: the vector of qof values produced by the FitC trait

Attributes

Definition Classes: Classifier -> Model
Inherited from:: Classifier

Return a basic report on a trained and tested multi-variate model.

Value parameters

ftMat: the matrix of qof values produced by the Fit trait

Attributes

Inherited from:: Model

Return the vector of residuals/errors.

Attributes

Inherited from:: Classifier

Perform feature selection to find the most predictive variables to have in the model, returning the variables added and the new Quality of Fit (QoF) measures for all steps.

Value parameters

cross: whether to include the cross-validation QoF measure
idx_q: index of Quality of Fit (QoF) to use for comparing quality
tech: the feature selection technique to apply

Attributes

See also: Fit for index of QoF measures.
Inherited from:: Classifier

Return the sum of the squares for error (sse). Must call diagnose first.

Attributes

Inherited from:: FitM

Perform stepwise regression to find the most predictive variables to have in the model, returning the variables left and the new Quality of Fit (QoF) measures for all steps. At each step it calls forwardSel and backwardElim and takes the best of the two actions. Stops when neither action yields improvement.

Value parameters

cross: whether to include the cross-validation QoF measure
idx_q: index of Quality of Fit (QoF) to use for comparing quality

Attributes

See also: Fit for index of QoF measures.
Inherited from:: Classifier

Test/evaluate the model's Quality of Fit (QoF) and return the predictions and QoF vectors. This may include the importance of its parameters (e.g., if 0 is in a parameter's confidence interval, it is a candidate for removal from the model). Extending traits and classess should implement various diagnostics for the test and full (training + test) datasets.

Value parameters

x_: the testiing/full data/input matrix (impl. classes may default to x)
y_: the testiing/full response/output vector (impl. classes may default to y)

Attributes

Inherited from:: Classifier

Return the indices for the test-set.

Value parameters

n_test: the size of test-set
rando: whether to select indices randomly or in blocks

Attributes

See also: scalation.mathstat.TnT_Split
Inherited from:: Classifier

Return the confusion matrix for k = 2 as a tuple (tn, fp, fn, tp).

Value parameters

con: the confusion matrix (defaults to cmat)

Attributes

Inherited from:: FitC

Return a copy of the total cumulative confusion matrix tcmat and clear tcmat.

Attributes

Inherited from:: FitC

Train the model 'y_ = f(x_) + e' on a given dataset, by optimizing the model parameters in order to minimize error '||e||' or maximize log-likelihood 'll'.

Value parameters

x_: the training/full data/input matrix (impl. classes may default to x)
y_: the training/full response/output vector (impl. classes may default to y)

Attributes

Inherited from:: Classifier

The train2 method should work like the train method, but should also optimize hyper-parameters (e.g., shrinkage or learning rate). Only implementing classes needing this capability should override this method.

Value parameters

x_: the training/full data/input matrix (defaults to full x)
y_: the training/full response/output vector (defaults to full y)

Attributes

Inherited from:: Classifier

Train and test the predictive model y_ = f(x_) + e and report its QoF and plot its predictions.

Value parameters

x_: the training/full data/input matrix (defaults to full x)
xx: the testing/full data/input matrix (defaults to full x)
y_: the training/full response/output vector (defaults to full y)
yy: the testing/full response/output vector (defaults to full y)

Attributes

Inherited from:: Classifier

Attributes

Inherited from:: Classifier

Compute the Variance Inflation Factor (VIF) for each variable to test for multi-collinearity by regressing x_j against the rest of the variables. A VIF over 50 indicates that over 98% of the variance of x_j can be predicted from the other variables, so x_j may be a candidate for removal from the model. Note: override this method to use a superior regression technique.

Value parameters

skip: the number of columns of x at the beginning to skip in computing VIF

Attributes

Inherited from:: Classifier

DecisionTree_ID3

Value parameters

Attributes

Members list

Type members

Inherited classlikes

Value parameters

Attributes

Value members

Concrete methods

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Inherited methods

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Attributes

Attributes

Attributes

Attributes

Value parameters

Attributes

Value parameters

Attributes

Attributes

Attributes

Attributes

Attributes

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes