Predictor

scalation.modeling.Predictor
See thePredictor companion object
trait Predictor(x: MatrixD, y: VectorD, var fname: Array[String], hparam: HyperParameter) extends Model

The Predictor trait provides a framwork for multiple predictive analytics techniques, e.g., Regression. x is multi-dimensional [1, x_1, ... x_k]. Fit the parameter vector b in for example the regression equation y = b dot x + e = b_0 + b_1 * x_1 + ... b_k * x_k + e

Value parameters

fname

the feature/variable names (if null, use x_j's)

hparam

the hyper-parameters for the model

x

the input/data m-by-n matrix (augment with a first column of ones to include intercept in model)

y

the response/output m-vector

Attributes

Companion
object
Graph
Supertypes
trait Model
class Object
trait Matchable
class Any
Known subtypes

Members list

Type members

Classlikes

case class BestStep(col: Int, qof: VectorD, mod: Predictor & Fit)

The BestStep is used to record the best improvement step found so far.

The BestStep is used to record the best improvement step found so far.

Value parameters

col

the column/variable to ADD/REMOVE for this step

mod

the model including selected features/variables for this step

qof

the Quality of Fit (QoF) for this step

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all

Value members

Abstract methods

def buildModel(x_cols: MatrixD): Predictor & Fit

Build a sub-model that is restricted to the given columns of the data matrix. Must be implemented for models that support feature selection. Otherwise, use @see `NoBuildModel

Build a sub-model that is restricted to the given columns of the data matrix. Must be implemented for models that support feature selection. Otherwise, use @see `NoBuildModel

Value parameters

x_cols

the columns that the new model is restricted to

Attributes

def test(x_: MatrixD, y_: VectorD): (VectorD, VectorD)

Test the predictive model y_ = f(x_) + e and return its predictions and QoF vector. Testing may be in-sample (on the full dataset) or out-of-sample (on the testing set) as determined by the parameters passed in. Note: must call train before test.

Test the predictive model y_ = f(x_) + e and return its predictions and QoF vector. Testing may be in-sample (on the full dataset) or out-of-sample (on the testing set) as determined by the parameters passed in. Note: must call train before test.

Value parameters

x_

the testing/full data/input matrix (defaults to full x)

y_

the testing/full response/output vector (defaults to full y)

Attributes

def train(x_: MatrixD, y_: VectorD): Unit

Train a predictive model y_ = f(x_) + e where x_ is the data/input matrix and y_ is the response/output vector. These arguments default to the full dataset x and y, but may be restricted to a training dataset. Training involves estimating the model parameters b.

Train a predictive model y_ = f(x_) + e where x_ is the data/input matrix and y_ is the response/output vector. These arguments default to the full dataset x and y, but may be restricted to a training dataset. Training involves estimating the model parameters b.

Value parameters

x_

the training/full data/input matrix (defaults to full x)

y_

the training/full response/output vector (defaults to full y)

Attributes

Concrete methods

def backwardElim(cols: LinkedHashSet[Int], idx_q: Int, first: Int): BestStep

Perform backward elimination to find the least predictive variable to remove from the existing model, returning the variable to eliminate, the new parameter vector and the new Quality of Fit (QoF). May be called repeatedly.

Perform backward elimination to find the least predictive variable to remove from the existing model, returning the variable to eliminate, the new parameter vector and the new Quality of Fit (QoF). May be called repeatedly.

Value parameters

cols

the columns of matrix x currently included in the existing model

first

first variable to consider for elimination (default (1) assume intercept x_0 will be in any model)

idx_q

index of Quality of Fit (QoF) to use for comparing quality

Attributes

See also

Fit for index of QoF measures.

def backwardElimAll(idx_q: Int, first: Int, cross: Boolean): (LinkedHashSet[Int], MatrixD)

Perform backward elimination to find the least predictive variables to remove from the full model, returning the variables left and the new Quality of Fit (QoF) measures for all steps.

Perform backward elimination to find the least predictive variables to remove from the full model, returning the variables left and the new Quality of Fit (QoF) measures for all steps.

Value parameters

cross

whether to include the cross-validation QoF measure

first

first variable to consider for elimination

idx_q

index of Quality of Fit (QoF) to use for comparing quality

Attributes

See also

Fit for index of QoF measures.

def crossValidate(k: Int, rando: Boolean): Array[Statistic]
def forwardSel(cols: LinkedHashSet[Int], idx_q: Int): BestStep

Perform forward selection to find the most predictive variable to add the existing model, returning the variable to add and the new model. May be called repeatedly.

Perform forward selection to find the most predictive variable to add the existing model, returning the variable to add and the new model. May be called repeatedly.

Value parameters

cols

the columns of matrix x currently included in the existing model

idx_q

index of Quality of Fit (QoF) to use for comparing quality

Attributes

See also

Fit for index of QoF measures.

def forwardSelAll(idx_q: Int, cross: Boolean): (LinkedHashSet[Int], MatrixD)

Perform forward selection to find the most predictive variables to have in the model, returning the variables added and the new Quality of Fit (QoF) measures for all steps.

Perform forward selection to find the most predictive variables to have in the model, returning the variables added and the new Quality of Fit (QoF) measures for all steps.

Value parameters

cross

whether to include the cross-validation QoF measure

idx_q

index of Quality of Fit (QoF) to use for comparing quality

Attributes

See also

Fit for index of QoF measures.

Return the best model found from feature selection.

Return the best model found from feature selection.

Attributes

def getFname: Array[String]

Return the feature/variable names.

Return the feature/variable names.

Attributes

def getX: MatrixD

Return the used data matrix x. Mainly for derived classes where x is expanded from the given columns in x_, e.g., SymbolicRegression.quadratic adds squared columns.

Return the used data matrix x. Mainly for derived classes where x is expanded from the given columns in x_, e.g., SymbolicRegression.quadratic adds squared columns.

Attributes

def getY: VectorD

Return the used response vector y. Mainly for derived classes where y is transformed, e.g., TranRegression, ARX.

Return the used response vector y. Mainly for derived classes where y is transformed, e.g., TranRegression, ARX.

Attributes

Return the hyper-parameters.

Return the hyper-parameters.

Attributes

def importance(cols: Array[Int], rSq: MatrixD): Array[(Int, Double)]

Return the relative importance of selected variables, ordered highest to lowest, rescaled so the highest is one.

Return the relative importance of selected variables, ordered highest to lowest, rescaled so the highest is one.

Value parameters

cols

the selected columns/features/variables

rSq

the matrix R^2 values (stand in for sse)

Attributes

def numTerms: Int

Return the number of terms/parameters in the model, e.g., b_0 + b_1 x_1 + b_2 x_2 has three terms.

Return the number of terms/parameters in the model, e.g., b_0 + b_1 x_1 + b_2 x_2 has three terms.

Attributes

Return the vector of parameter/coefficient values.

Return the vector of parameter/coefficient values.

Attributes

def predict(z: VectorD): Double

Predict the value of y = f(z) by evaluating the formula y = b dot z, e.g., (b_0, b_1, b_2) dot (1, z_1, z_2). Must override when using transformations, e.g., ExpRegression.

Predict the value of y = f(z) by evaluating the formula y = b dot z, e.g., (b_0, b_1, b_2) dot (1, z_1, z_2). Must override when using transformations, e.g., ExpRegression.

Value parameters

z

the new vector to predict

Attributes

def predict(x_: MatrixD): VectorD

Predict the value of vector y = f(x_, b), e.g., x_ * b for Regression. May override for efficiency.

Predict the value of vector y = f(x_, b), e.g., x_ * b for Regression. May override for efficiency.

Value parameters

x_

the matrix to use for making predictions, one for each row

Attributes

def resetBest(): Unit

Reset the best-step to default

Reset the best-step to default

Attributes

Return the vector of residuals/errors.

Return the vector of residuals/errors.

Attributes

def selectFeatures(tech: SelectionTech, idx_q: Int, cross: Boolean): (LinkedHashSet[Int], MatrixD)

Perform feature selection to find the most predictive variables to have in the model, returning the variables added and the new Quality of Fit (QoF) measures for all steps.

Perform feature selection to find the most predictive variables to have in the model, returning the variables added and the new Quality of Fit (QoF) measures for all steps.

Value parameters

cross

whether to include the cross-validation QoF measure

idx_q

index of Quality of Fit (QoF) to use for comparing quality

tech

the feature selection technique to apply

Attributes

See also

Fit for index of QoF measures.

def stepRegressionAll(idx_q: Int, cross: Boolean): (LinkedHashSet[Int], MatrixD)

Perform stepwise regression to find the most predictive variables to have in the model, returning the variables left and the new Quality of Fit (QoF) measures for all steps. At each step it calls forwardSel and backwardElim and takes the best of the two actions. Stops when neither action yields improvement.

Perform stepwise regression to find the most predictive variables to have in the model, returning the variables left and the new Quality of Fit (QoF) measures for all steps. At each step it calls forwardSel and backwardElim and takes the best of the two actions. Stops when neither action yields improvement.

Value parameters

cross

whether to include the cross-validation QoF measure

idx_q

index of Quality of Fit (QoF) to use for comparing quality

Attributes

See also

Fit for index of QoF measures.

inline def testIndices(n_test: Int, rando: Boolean): IndexedSeq[Int]

Return the indices for the test-set.

Return the indices for the test-set.

Value parameters

n_test

the size of test-set

rando

whether to select indices randomly or in blocks

Attributes

See also

scalation.mathstat.TnT_Split

def train2(x_: MatrixD, y_: VectorD): Unit

The train2 method should work like the train method, but should also optimize hyper-parameters (e.g., shrinkage or learning rate). Only implementing classes needing this capability should override this method.

The train2 method should work like the train method, but should also optimize hyper-parameters (e.g., shrinkage or learning rate). Only implementing classes needing this capability should override this method.

Value parameters

x_

the training/full data/input matrix (defaults to full x)

y_

the training/full response/output vector (defaults to full y)

Attributes

def trainNtest(x_: MatrixD, y_: VectorD)(xx: MatrixD, yy: VectorD): (VectorD, VectorD)

Train and test the predictive model y_ = f(x_) + e and report its QoF and plot its predictions. Return the predictions and QoF. FIX - currently must override if y is transformed, @see TranRegression

Train and test the predictive model y_ = f(x_) + e and report its QoF and plot its predictions. Return the predictions and QoF. FIX - currently must override if y is transformed, @see TranRegression

Value parameters

x_

the training/full data/input matrix (defaults to full x)

xx

the testing/full data/input matrix (defaults to full x)

y_

the training/full response/output vector (defaults to full y)

yy

the testing/full response/output vector (defaults to full y)

Attributes

def validate(rando: Boolean, ratio: Double)(idx: IndexedSeq[Int]): VectorD
def vif(skip: Int): VectorD

Compute the Variance Inflation Factor (VIF) for each variable to test for multi-collinearity by regressing x_j against the rest of the variables. A VIF over 50 indicates that over 98% of the variance of x_j can be predicted from the other variables, so x_j may be a candidate for removal from the model. Note: override this method to use a superior regression technique.

Compute the Variance Inflation Factor (VIF) for each variable to test for multi-collinearity by regressing x_j against the rest of the variables. A VIF over 50 indicates that over 98% of the variance of x_j can be predicted from the other variables, so x_j may be a candidate for removal from the model. Note: override this method to use a superior regression technique.

Value parameters

skip

the number of columns of x at the beginning to skip in computing VIF

Attributes

Inherited methods

def report(ftMat: MatrixD): String

Return a basic report on a trained and tested multi-variate model.

Return a basic report on a trained and tested multi-variate model.

Value parameters

ftMat

the matrix of qof values produced by the Fit trait

Attributes

Inherited from:
Model
def report(ftVec: VectorD): String

Return a basic report on a trained and tested model.

Return a basic report on a trained and tested model.

Value parameters

ftVec

the vector of qof values produced by the Fit trait

Attributes

Inherited from:
Model

Inherited fields

var modelConcept: URI

The optional reference to an ontological concept

The optional reference to an ontological concept

Attributes

Inherited from:
Model
var modelName: String

The name for the model (or modeling technique).

The name for the model (or modeling technique).

Attributes

Inherited from:
Model