The Classifier
trait provides a framework for multiple predictive analytics techniques, e.g., NaiveBayes
. x is multi-dimensional [1, x_1, ... x_k]. Fit the parameter vector analog p_y, the response probability mass function (pmf)
Value parameters
- cname
-
the names/labels for each class
- fname
-
the feature/variable names (if null, use x_j's)
- hparam
-
the hyper-parameters for the model
- k
-
the number of classes (categorical response values)
- x
-
the input/data m-by-n matrix
- y
-
the response/output m-vector (class values where y(i) = class for instance i)
Attributes
- Companion
- object
- Graph
-
- Supertypes
- Known subtypes
-
class BaggingTreesclass RandomForestclass DecisionTree_C45class DecisionTree_C45wpclass DecisionTree_ID3class DecisionTree_ID3wpclass HiddenMarkovclass KNN_Classifierclass LinDiscAnalyisclass NaiveBayesclass NaiveBayesRclass NeuralNet_Class_3Lclass NullModelclass SimpleLDAclass SimpleLogisticRegressionclass LogisticRegressionclass SupportVectorMachineclass TANBayesShow all
Members list
Type members
Classlikes
The BestStep
is used to record the best improvement step found so far.
The BestStep
is used to record the best improvement step found so far.
Value parameters
- col
-
the column/variable to ADD/REMOVE for this step
- mod
-
the model including selected features/variables for this step
- qof
-
the Quality of Fit (QoF) for this step
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Value members
Abstract methods
Test the predictive model y_ = f(x_) + e and return its predictions and QoF vector. Testing may be in-sample (on the full dataset) or out-of-sample (on the testing set) as determined by the parameters passed in. Note: must call train before test.
Test the predictive model y_ = f(x_) + e and return its predictions and QoF vector. Testing may be in-sample (on the full dataset) or out-of-sample (on the testing set) as determined by the parameters passed in. Note: must call train before test.
Value parameters
- x_
-
the testing/full data/input matrix (defaults to full x)
- y_
-
the testing/full response/output vector (defaults to full y)
Attributes
Concrete methods
Perform backward elimination to find the least predictive variable to remove from the existing model, returning the variable to eliminate, the new parameter vector and the new Quality of Fit (QoF). May be called repeatedly.
Perform backward elimination to find the least predictive variable to remove from the existing model, returning the variable to eliminate, the new parameter vector and the new Quality of Fit (QoF). May be called repeatedly.
Value parameters
- cols
-
the columns of matrix x currently included in the existing model
- first
-
first variable to consider for elimination (default (1) assume intercept x_0 will be in any model)
- idx_q
-
index of Quality of Fit (QoF) to use for comparing quality
Attributes
- See also
-
Fit
for index of QoF measures.
Perform backward elimination to find the least predictive variables to remove from the full model, returning the variables left and the new Quality of Fit (QoF) measures for all steps.
Perform backward elimination to find the least predictive variables to remove from the full model, returning the variables left and the new Quality of Fit (QoF) measures for all steps.
Value parameters
- cross
-
whether to include the cross-validation QoF measure
- first
-
first variable to consider for elimination
- idx_q
-
index of Quality of Fit (QoF) to use for comparing quality
Attributes
- See also
-
Fit
for index of QoF measures.
Build a sub-model that is restricted to the given columns of the data matrix. Override for models that support feature selection.
Build a sub-model that is restricted to the given columns of the data matrix. Override for models that support feature selection.
Value parameters
- x_cols
-
the columns that the new model is restricted to
Attributes
Given a discrete data vector z, classify it returning the class number (0, ..., k-1) with the highest relative posterior probability. Return the best class, its name and its relative probability.
Given a discrete data vector z, classify it returning the class number (0, ..., k-1) with the highest relative posterior probability. Return the best class, its name and its relative probability.
Value parameters
- z
-
the data vector to classify
Attributes
Given a continuous data vector z, classify it returning the class number (0, ..., k-1) with the highest relative posterior probability. Return the best class, its name and its relative probability.
Given a continuous data vector z, classify it returning the class number (0, ..., k-1) with the highest relative posterior probability. Return the best class, its name and its relative probability.
Value parameters
- z
-
the data vector to classify
Attributes
Perform forward selection to find the most predictive variable to add the existing model, returning the variable to add and the new model. May be called repeatedly.
Perform forward selection to find the most predictive variable to add the existing model, returning the variable to add and the new model. May be called repeatedly.
Value parameters
- cols
-
the columns of matrix x currently included in the existing model
- idx_q
-
index of Quality of Fit (QoF) to use for comparing quality
Attributes
- See also
-
Fit
for index of QoF measures.
Perform forward selection to find the most predictive variables to have in the model, returning the variables added and the new Quality of Fit (QoF) measures for all steps.
Perform forward selection to find the most predictive variables to have in the model, returning the variables added and the new Quality of Fit (QoF) measures for all steps.
Value parameters
- cross
-
whether to include the cross-validation QoF measure
- idx_q
-
index of Quality of Fit (QoF) to use for comparing quality
Attributes
- See also
-
Fit
for index of QoF measures.
Return the feature/variable names.
Return the feature/variable names.
Attributes
Return the used data matrix x. Mainly for derived classes where x is expanded from the given columns in x_, e.g., SymbolicRegression.quadratic
adds squared columns.
Return the used data matrix x. Mainly for derived classes where x is expanded from the given columns in x_, e.g., SymbolicRegression.quadratic
adds squared columns.
Attributes
Return the used response vector y. Mainly for derived classes where y is transformed, e.g., TranRegression
, Regression4TS
.
Return the used response vector y. Mainly for derived classes where y is transformed, e.g., TranRegression
, Regression4TS
.
Attributes
Return the hyper-parameters.
Return the hyper-parameters.
Attributes
Given a discrete data vector z, classify it returning the class number (0, ..., k-1) with the highest relative posterior probability. Return the best class, its name and its relative log-probability. This method adds "positive log probabilities" to avoids underflow. To recover q relative probability compute 2^(-q) where q is a plog.
Given a discrete data vector z, classify it returning the class number (0, ..., k-1) with the highest relative posterior probability. Return the best class, its name and its relative log-probability. This method adds "positive log probabilities" to avoids underflow. To recover q relative probability compute 2^(-q) where q is a plog.
Value parameters
- z
-
the data vector to classify
Attributes
Given a continuous data vector z, classify it returning the class number (0, ..., k-1) with the highest relative posterior probability. Return the best class, its name and its relative log-probability. This method adds "positive log probabilities" to avoids underflow. To recover q relative probability compute 2^(-q) where q is a plog.
Given a continuous data vector z, classify it returning the class number (0, ..., k-1) with the highest relative posterior probability. Return the best class, its name and its relative log-probability. This method adds "positive log probabilities" to avoids underflow. To recover q relative probability compute 2^(-q) where q is a plog.
Value parameters
- z
-
the data vector to classify
Attributes
Predict the integer value of y = f(z) by computing the product of the class probabilities p_y and all the conditional probabilities P(X_j = z_j | y = c) and returning the class with the highest relative probability. This method adds "positive log probabilities" to avoids underflow. To recover q relative probability compute 2^(-q) where q is a plog.
Predict the integer value of y = f(z) by computing the product of the class probabilities p_y and all the conditional probabilities P(X_j = z_j | y = c) and returning the class with the highest relative probability. This method adds "positive log probabilities" to avoids underflow. To recover q relative probability compute 2^(-q) where q is a plog.
Value parameters
- z
-
the new vector to predict
Attributes
Return the number of terms/parameters in the model, e.g., b_0 + b_1 x_1 + b_2 x_2 has three terms.
Return the number of terms/parameters in the model, e.g., b_0 + b_1 x_1 + b_2 x_2 has three terms.
Attributes
Return the vector of parameter values analog, the estimate of the response pmf.
Return the vector of parameter values analog, the estimate of the response pmf.
Attributes
Predict the value of y = f(z) by evaluating the model equation. Single output models return Double
, while multi-output models return VectorD
.
Predict the value of y = f(z) by evaluating the model equation. Single output models return Double
, while multi-output models return VectorD
.
Value parameters
- z
-
the new vector to predict
Attributes
Predict the integer value of y = f(z) by selecting the most probable class. Override as needed.
Predict the integer value of y = f(z) by selecting the most probable class. Override as needed.
Value parameters
- z
-
the new vector to predict
Attributes
Predict the value of vector y = f(x_) using matrix x_
Predict the value of vector y = f(x_) using matrix x_
Value parameters
- x_
-
the matrix to use for making predictions, one for each row
Attributes
Return a basic report on a trained and tested model.
Return a basic report on a trained and tested model.
Value parameters
- ftVec
-
the vector of qof values produced by the
FitC
trait
Attributes
- Definition Classes
Return the vector of residuals/errors.
Return the vector of residuals/errors.
Attributes
Perform feature selection to find the most predictive variables to have in the model, returning the variables added and the new Quality of Fit (QoF) measures for all steps.
Perform feature selection to find the most predictive variables to have in the model, returning the variables added and the new Quality of Fit (QoF) measures for all steps.
Value parameters
- cross
-
whether to include the cross-validation QoF measure
- idx_q
-
index of Quality of Fit (QoF) to use for comparing quality
- tech
-
the feature selection technique to apply
Attributes
- See also
-
Fit
for index of QoF measures.
Perform stepwise regression to find the most predictive variables to have in the model, returning the variables left and the new Quality of Fit (QoF) measures for all steps. At each step it calls forwardSel and backwardElim and takes the best of the two actions. Stops when neither action yields improvement.
Perform stepwise regression to find the most predictive variables to have in the model, returning the variables left and the new Quality of Fit (QoF) measures for all steps. At each step it calls forwardSel and backwardElim and takes the best of the two actions. Stops when neither action yields improvement.
Value parameters
- cross
-
whether to include the cross-validation QoF measure
- idx_q
-
index of Quality of Fit (QoF) to use for comparing quality
Attributes
- See also
-
Fit
for index of QoF measures.
Test/evaluate the model's Quality of Fit (QoF) and return the predictions and QoF vectors. This may include the importance of its parameters (e.g., if 0 is in a parameter's confidence interval, it is a candidate for removal from the model). Extending traits and classess should implement various diagnostics for the test and full (training + test) datasets.
Test/evaluate the model's Quality of Fit (QoF) and return the predictions and QoF vectors. This may include the importance of its parameters (e.g., if 0 is in a parameter's confidence interval, it is a candidate for removal from the model). Extending traits and classess should implement various diagnostics for the test and full (training + test) datasets.
Value parameters
- x_
-
the testiing/full data/input matrix (impl. classes may default to x)
- y_
-
the testiing/full response/output vector (impl. classes may default to y)
Attributes
Return the indices for the test-set.
Return the indices for the test-set.
Value parameters
- n_test
-
the size of test-set
- rando
-
whether to select indices randomly or in blocks
Attributes
- See also
-
scalation.mathstat.TnT_Split
Train a classification model y_ = f(x_) + e where x_ is the data/input matrix and y_ is the response/output vector. These arguments default to the full dataset x and y, but may be restricted to a training dataset. Training involves estimating the model parameters or pmf. This implementation simply computes the class/prior probabilities. Most models will need to override this method.
Train a classification model y_ = f(x_) + e where x_ is the data/input matrix and y_ is the response/output vector. These arguments default to the full dataset x and y, but may be restricted to a training dataset. Training involves estimating the model parameters or pmf. This implementation simply computes the class/prior probabilities. Most models will need to override this method.
Value parameters
- x_
-
the training/full data/input matrix (defaults to full x)
- y_
-
the training/full response/output vector (defaults to full y)
Attributes
Train the model 'y_ = f(x_) + e' on a given dataset, by optimizing the model parameters in order to minimize error '||e||' or maximize log-likelihood 'll'.
Train the model 'y_ = f(x_) + e' on a given dataset, by optimizing the model parameters in order to minimize error '||e||' or maximize log-likelihood 'll'.
Value parameters
- x_
-
the training/full data/input matrix (impl. classes may default to x)
- y_
-
the training/full response/output vector (impl. classes may default to y)
Attributes
The train2 method should work like the train method, but should also optimize hyper-parameters (e.g., shrinkage or learning rate). Only implementing classes needing this capability should override this method.
The train2 method should work like the train method, but should also optimize hyper-parameters (e.g., shrinkage or learning rate). Only implementing classes needing this capability should override this method.
Value parameters
- x_
-
the training/full data/input matrix (defaults to full x)
- y_
-
the training/full response/output vector (defaults to full y)
Attributes
Train and test the predictive model y_ = f(x_) + e and report its QoF and plot its predictions.
Train and test the predictive model y_ = f(x_) + e and report its QoF and plot its predictions.
Value parameters
- x_
-
the training/full data/input matrix (defaults to full x)
- xx
-
the testing/full data/input matrix (defaults to full x)
- y_
-
the training/full response/output vector (defaults to full y)
- yy
-
the testing/full response/output vector (defaults to full y)
Attributes
Compute the Variance Inflation Factor (VIF) for each variable to test for multi-collinearity by regressing x_j against the rest of the variables. A VIF over 50 indicates that over 98% of the variance of x_j can be predicted from the other variables, so x_j may be a candidate for removal from the model. Note: override this method to use a superior regression technique.
Compute the Variance Inflation Factor (VIF) for each variable to test for multi-collinearity by regressing x_j against the rest of the variables. A VIF over 50 indicates that over 98% of the variance of x_j can be predicted from the other variables, so x_j may be a candidate for removal from the model. Note: override this method to use a superior regression technique.
Value parameters
- skip
-
the number of columns of x at the beginning to skip in computing VIF
Attributes
Inherited fields
The optional reference to an ontological concept