Packages

abstract class PredictorMat extends Fit with Predictor

The PredictorMat abstract class supports multiple predictor analytics, such as Regression. In this case, 'x' is multi-dimensional [1, x_1, ... x_k]. Fit the parameter vector 'b' in for example the regression equation

y = b dot x + e = b_0 + b_1 * x_1 + ... b_k * x_k + e

Note, "protected val" arguments required by ResponseSurface.

Linear Supertypes
Predictor, Model, Fit, Error, QoF, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. PredictorMat
  2. Predictor
  3. Model
  4. Fit
  5. Error
  6. QoF
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new PredictorMat(x: MatriD, y: VectoD, fname: Strings, hparam: HyperParameter)

    x

    the input/data m-by-n matrix (augment with a first column of ones to include intercept in model)

    y

    the response/output m-vector

    fname

    the feature/variable names (if null, use x_j's)

    hparam

    the hyper-parameters for the model

Abstract Value Members

  1. abstract def buildModel(x_cols: MatriD): PredictorMat

    Build a sub-model that is restricted to the given columns of the data matrix.

    Build a sub-model that is restricted to the given columns of the data matrix.

    x_cols

    the columns that the new model is restricted to

  2. abstract def train(x_: MatriD = x, y_: VectoD = y): PredictorMat

    Train a predictive model 'y_ = f(x_) + e' where 'x_' is the data/input matrix and 'y_' is the response/output vector.

    Train a predictive model 'y_ = f(x_) + e' where 'x_' is the data/input matrix and 'y_' is the response/output vector. These arguments default to the full dataset 'x' and 'y', but may be restricted to a training dataset. Training involves estimating the model parameters 'b'.

    x_

    the training/full data/input matrix (defaults to full x)

    y_

    the training/full response/output vector (defaults to full y)

    Definition Classes
    PredictorMatModel

Concrete Value Members

  1. def analyze(x_: MatriD = x, y_: VectoD = y, x_e: MatriD = x, y_e: VectoD = y): PredictorMat

    Analyze a dataset using this model using ordinary training with the 'train' method.

    Analyze a dataset using this model using ordinary training with the 'train' method.

    x_

    the training/full data/input matrix

    y_

    the training/full response/output vector

    x_e

    the test/full data/input matrix

    y_e

    the test/full response/output vector

    Definition Classes
    PredictorMatPredictor
  2. def backwardElim(cols: Set[Int], index_q: Int = index_rSqBar, first: Int = 1): (Int, PredictorMat)

    Perform backward elimination to find the least predictive variable to remove from the existing model, returning the variable to eliminate, the new parameter vector and the new Quality of Fit (QoF).

    Perform backward elimination to find the least predictive variable to remove from the existing model, returning the variable to eliminate, the new parameter vector and the new Quality of Fit (QoF). May be called repeatedly.

    cols

    the columns of matrix x currently included in the existing model

    index_q

    index of Quality of Fit (QoF) to use for comparing quality

    first

    first variable to consider for elimination (default (1) assume intercept x_0 will be in any model)

    See also

    Fit for index of QoF measures.

  3. def backwardElimAll(index_q: Int = index_rSqBar, first: Int = 1, cross: Boolean = true): (Set[Int], MatriD)

    Perform backward elimination to find the least predictive variables to remove from the full model, returning the variables left and the new Quality of Fit (QoF) measures for all steps.

    Perform backward elimination to find the least predictive variables to remove from the full model, returning the variables left and the new Quality of Fit (QoF) measures for all steps.

    index_q

    index of Quality of Fit (QoF) to use for comparing quality

    first

    first variable to consider for elimination

    cross

    whether to include the cross-validation QoF measure

    See also

    Fit for index of QoF measures.

  4. def corrMatrix(xx: MatriD = x): MatriD

    Return the correlation matrix for the columns in data matrix 'xx'.

    Return the correlation matrix for the columns in data matrix 'xx'.

    xx

    the data matrix shose correlation matrix is sought

    Definition Classes
    PredictorMatPredictor
  5. def crossValidate(k: Int = 10, rando: Boolean = true): Array[Statistic]
  6. def diagnose(e: VectoD, yy: VectoD, yp: VectoD, w: VectoD = null, ym_: Double = noDouble): Unit

    Diagnose the health of the model by computing the Quality of Fit (QoF) measures, from the error/residual vector and the predicted & actual responses.

    Diagnose the health of the model by computing the Quality of Fit (QoF) measures, from the error/residual vector and the predicted & actual responses. For some models the instances may be weighted.

    e

    the m-dimensional error/residual vector (yy - yp)

    yy

    the actual response/output vector to use (test/full)

    yp

    the predicted response/output vector (test/full)

    w

    the weights on the instances (defaults to null)

    ym_

    the mean of the actual response/output vector to use (training/full)

    Definition Classes
    FitQoF
    See also

    Regression_WLS

  7. def eval(ym: Double, y_e: VectoD, yp: VectoD): PredictorMat

    Compute the error (difference between actual and predicted) and useful diagnostics for the test dataset.

    Compute the error (difference between actual and predicted) and useful diagnostics for the test dataset. Requires predicted responses to be passed in.

    ym

    the training/full mean actual response/output vector

    y_e

    the test/full actual response/output vector

    yp

    the test/full predicted response/output vector

  8. def eval(x_e: MatriD = x, y_e: VectoD = y): PredictorMat

    Compute the error (difference between actual and predicted) and useful diagnostics for the test dataset.

    Compute the error (difference between actual and predicted) and useful diagnostics for the test dataset.

    x_e

    the test/full data/input matrix (defualts to full x)

    y_e

    the test/full response/output vector (defualts to full y)

    Definition Classes
    PredictorMatModel
  9. def f_(z: Double): String

    Format a double value.

    Format a double value.

    z

    the double value to format

    Definition Classes
    QoF
  10. def fit: VectoD

    Return the Quality of Fit (QoF) measures corresponding to the labels given above in the 'fitLabel' method.

    Return the Quality of Fit (QoF) measures corresponding to the labels given above in the 'fitLabel' method. Note, if 'sse > sst', the model introduces errors and the 'rSq' may be negative, otherwise, R^2 ('rSq') ranges from 0 (weak) to 1 (strong). Override to add more quality of fit measures.

    Definition Classes
    FitQoF
  11. def fitLabel: Seq[String]

    Return the labels for the Quality of Fit (QoF) measures.

    Return the labels for the Quality of Fit (QoF) measures. Override to add additional QoF measures.

    Definition Classes
    FitQoF
  12. def fitMap: Map[String, String]

    Build a map of quality of fit measures (use of LinkedHashMap makes it ordered).

    Build a map of quality of fit measures (use of LinkedHashMap makes it ordered).

    Definition Classes
    QoF
  13. final def flaw(method: String, message: String): Unit
    Definition Classes
    Error
  14. def forwardSel(cols: Set[Int], index_q: Int = index_rSqBar): (Int, PredictorMat)

    Perform forward selection to find the most predictive variable to add the existing model, returning the variable to add and the new model.

    Perform forward selection to find the most predictive variable to add the existing model, returning the variable to add and the new model. May be called repeatedly.

    cols

    the columns of matrix x currently included in the existing model

    index_q

    index of Quality of Fit (QoF) to use for comparing quality

    Definition Classes
    PredictorMatPredictor
    See also

    Fit for index of QoF measures.

  15. def forwardSelAll(index_q: Int = index_rSqBar, cross: Boolean = true): (Set[Int], MatriD)

    Perform forward selection to find the most predictive variables to have in the model, returning the variables added and the new Quality of Fit (QoF) measures for all steps.

    Perform forward selection to find the most predictive variables to have in the model, returning the variables added and the new Quality of Fit (QoF) measures for all steps.

    index_q

    index of Quality of Fit (QoF) to use for comparing quality

    cross

    whether to include the cross-validation QoF measure

    See also

    Fit for index of QoF measures.

  16. def getX: MatriD

    Return the 'used' data matrix 'x'.

    Return the 'used' data matrix 'x'. Mainly for derived classes where 'x' is expanded from the given columns in 'x_', e.g., QuadRegression add squared columns.

    Definition Classes
    PredictorMatPredictor
  17. def getY: VectoD

    Return the 'used' response vector 'y'.

    Return the 'used' response vector 'y'. Mainly for derived classes where 'y' is transformed, e.g., TranRegression, Regression4TS.

    Definition Classes
    PredictorMatPredictor
  18. def help: String

    Return the help string that describes the Quality of Fit (QoF) measures provided by the Fit class.

    Return the help string that describes the Quality of Fit (QoF) measures provided by the Fit class. Override to correspond to 'fitLabel'.

    Definition Classes
    FitQoF
  19. def hparameter: HyperParameter

    Return the hyper-parameters.

    Return the hyper-parameters.

    Definition Classes
    PredictorMatModel
  20. def ll(ms: Double = mse0, s2: Double = sig2e, m2: Int = m): Double

    The log-likelihood function times -2.

    The log-likelihood function times -2. Override as needed.

    ms

    raw Mean Squared Error

    s2

    MLE estimate of the population variance of the residuals

    Definition Classes
    Fit
    See also

    www.stat.cmu.edu/~cshalizi/mreg/15/lectures/06/lecture-06.pdf

    www.wiley.com/en-us/Introduction+to+Linear+Regression+Analysis%2C+5th+Edition-p-9780470542811 Section 2.11

  21. val modelConcept: URI

    An optional reference to an ontological concept

    An optional reference to an ontological concept

    Definition Classes
    Model
  22. def modelName: String

    An optional name for the model (or modeling technique)

    An optional name for the model (or modeling technique)

    Definition Classes
    Model
  23. def mse_: Double

    Return the mean of squares for error (sse / df._2).

    Return the mean of squares for error (sse / df._2). Must call diagnose first.

    Definition Classes
    Fit
  24. def parameter: VectoD

    Return the vector of parameter/coefficient values.

    Return the vector of parameter/coefficient values.

    Definition Classes
    PredictorMatModel
  25. def predict(z: MatriD = x): VectoD

    Predict the value of 'y = f(z)' by evaluating the formula 'y = b dot z', for each row of matrix 'z'.

    Predict the value of 'y = f(z)' by evaluating the formula 'y = b dot z', for each row of matrix 'z'.

    z

    the new matrix to predict

    Definition Classes
    PredictorMatPredictor
  26. def predict(z: VectoD): Double

    Predict the value of 'y = f(z)' by evaluating the formula 'y = b dot z', e.g., '(b_0, b_1, b_2) dot (1, z_1, z_2)'.

    Predict the value of 'y = f(z)' by evaluating the formula 'y = b dot z', e.g., '(b_0, b_1, b_2) dot (1, z_1, z_2)'.

    z

    the new vector to predict

    Definition Classes
    PredictorMatPredictor
  27. def predict(z: VectoI): Double

    Given a new discrete data/input vector 'z', predict the 'y'-value of 'f(z)'.

    Given a new discrete data/input vector 'z', predict the 'y'-value of 'f(z)'.

    z

    the vector to use for prediction

    Definition Classes
    Predictor
  28. def report: String

    Return a basic report on the trained model.

    Return a basic report on the trained model.

    Definition Classes
    PredictorMatModel
    See also

    'summary' method for more details

  29. def resetDF(df_update: PairD): Unit

    Reset the degrees of freedom to the new updated values.

    Reset the degrees of freedom to the new updated values. For some models, the degrees of freedom is not known until after the model is built.

    df_update

    the updated degrees of freedom (model, error)

    Definition Classes
    Fit
  30. def residual: VectoD

    Return the vector of residuals/errors.

    Return the vector of residuals/errors.

    Definition Classes
    PredictorMatPredictor
  31. def reverse(a: MatriD): MatriD

    Return a matrix that is in reverse row order of the given matrix 'a'.

    Return a matrix that is in reverse row order of the given matrix 'a'.

    a

    the given matrix

  32. def stepRegressionAll(index_q: Int = index_rSqBar, cross: Boolean = true): (Set[Int], MatriD)

    Perform stepwise regression to find the most predictive variables to have in the model, returning the variables left and the new Quality of Fit (QoF) measures for all steps.

    Perform stepwise regression to find the most predictive variables to have in the model, returning the variables left and the new Quality of Fit (QoF) measures for all steps. At each step it calls 'forwardSel' and 'backwardElim' and takes the best of the two actions. Stops when neither action yields improvement.

    index_q

    index of Quality of Fit (QoF) to use for comparing quality

    cross

    whether to include the cross-validation QoF measure

    See also

    Fit for index of QoF measures.

  33. def summary: String

    Compute and return summary diagostics for the regression model.

  34. def summary(b: VectoD, stdErr: VectoD, vf: VectoD, show: Boolean = false): String

    Produce a summary report with diagnostics for each predictor 'x_j' and the overall quality of fit.

    Produce a summary report with diagnostics for each predictor 'x_j' and the overall quality of fit.

    b

    the parameters/coefficients for the model

    vf

    the Variance Inflation Factors (VIFs)

    show

    flag indicating whether to print the summary

    Definition Classes
    Fit
  35. def test(modelName: String, doPlot: Boolean = true): Unit

    Test the model on the full dataset (i.e., train and evaluate on full dataset).

    Test the model on the full dataset (i.e., train and evaluate on full dataset).

    modelName

    the name of the model being tested

    doPlot

    whether to plot the actual vs. predicted response

    Definition Classes
    Predictor
  36. def train2(x_: MatriD = x, y_: VectoD = y): PredictorMat

    Train a predictive model 'y_ = f(x_) + e' where 'x_' is the data/input matrix and 'y_' is the response/output vector.

    Train a predictive model 'y_ = f(x_) + e' where 'x_' is the data/input matrix and 'y_' is the response/output vector. These arguments default to the full dataset 'x' and 'y', but may be restricted to a training dataset. Training involves estimating the model parameters 'b'. The 'train2' method should work like the 'train' method, but should also optimize hyper-parameters (e.g., shrinkage or learning rate). Only implementing classes needing this capability should implement this method.

    x_

    the training/full data/input matrix (defaults to full x)

    y_

    the training/full response/output vector (defaults to full y)

  37. def vif(skip: Int = 1): VectoD

    Compute the Variance Inflation Factor 'VIF' for each variable to test for multi-collinearity by regressing 'x_j' against the rest of the variables.

    Compute the Variance Inflation Factor 'VIF' for each variable to test for multi-collinearity by regressing 'x_j' against the rest of the variables. A VIF over 10 indicates that over 90% of the variance of 'x_j' can be predicted from the other variables, so 'x_j' may be a candidate for removal from the model. Note: override this method to use a superior regression technique.

    skip

    the number of columns of x at the beginning to skip in computing VIF