Packages

c

scalation.analytics

RidgeRegression

class RidgeRegression extends PredictorMat

The RidgeRegression class supports multiple linear ridge regression. In this case, 'x' is multi-dimensional [x_1, ... x_k]. Ridge regression puts a penalty on the L2 norm of the parameters b to reduce the chance of them taking on large values that may lead to less robust models. Both the input matrix 'x' and the response vector 'y' are centered (zero mean). Fit the parameter vector 'b' in the regression equation

y = b dot x + e = b_1 * x_1 + ... b_k * x_k + e

where 'e' represents the residuals (the part not explained by the model). Use Least-Squares (minimizing the residuals) to solve for the parameter vector 'b' using the regularized Normal Equations:

b = fac.solve (.) with regularization x.t * x + λ * I

Five factorization techniques are provided:

'QR' // QR Factorization: slower, more stable (default) 'Cholesky' // Cholesky Factorization: faster, less stable (reasonable choice) 'SVD' // Singular Value Decomposition: slowest, most robust 'LU' // LU Factorization: similar, but better than inverse 'Inverse' // Inverse/Gaussian Elimination, classical textbook technique

See also

statweb.stanford.edu/~tibs/ElemStatLearn/

Linear Supertypes
PredictorMat, Error, Predictor, Fit, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. RidgeRegression
  2. PredictorMat
  3. Error
  4. Predictor
  5. Fit
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new RidgeRegression(x: MatriD, y: VectoD, lambda_: Double = 0.1, technique: RegTechnique = Cholesky)

    x

    the centered input/design m-by-n matrix NOT augmented with a first column of ones

    y

    the centered response m-vector

    lambda_

    the shrinkage parameter (0 => OLS) in the penalty term 'lambda * b dot b'

    technique

    the technique used to solve for b in (x.t*x + lambda*I)*b = x.t*y

Type Members

  1. type Fac_QR = Fac_QR_H[MatriD]

Value Members

  1. def backwardElim(cols: Set[Int]): (Int, VectoD, VectoD)

    Perform backward elimination to remove the least predictive variable from the existing model, returning the variable to eliminate, the new parameter vector, the new quality of fit.

    Perform backward elimination to remove the least predictive variable from the existing model, returning the variable to eliminate, the new parameter vector, the new quality of fit. May be called repeatedly. FIX - update implementation

    cols

    the columns of matrix x to be included in the existing model

  2. def coefficient: VectoD

    Return the vector of coefficient/parameter values.

    Return the vector of coefficient/parameter values.

    Definition Classes
    Predictor
  3. def crossVal(k: Int = 10): Unit

    Perform 'k'-fold cross-validation.

    Perform 'k'-fold cross-validation.

    k

    the number of folds

    Definition Classes
    RidgeRegressionPredictorMat
  4. def crossValidate(algor: (MatriD, VectoD) ⇒ PredictorMat, k: Int = 10): Array[Statistic]
    Definition Classes
    PredictorMat
  5. val df: (Double, Double)
    Definition Classes
    Fit
  6. def diagnose(e: VectoD, w: VectoD = null, yp: VectoD = null): Unit

    Given the error/residual vector, compute the quality of fit measures.

    Given the error/residual vector, compute the quality of fit measures.

    e

    the corresponding m-dimensional error vector (y - yp)

    w

    the weights on the instances

    yp

    the predicted response vector (x * b)

    Definition Classes
    Fit
  7. def eval(xx: MatriD, yy: VectoD): Unit

    Compute the error and useful diagnostics for the test dataset.

    Compute the error and useful diagnostics for the test dataset.

    xx

    the test data matrix

    yy

    the test response vector

    Definition Classes
    PredictorMatPredictor
  8. def eval(): Unit

    Compute the error and useful diagnostics for the entire dataset.

    Compute the error and useful diagnostics for the entire dataset.

    Definition Classes
    PredictorMatPredictor
  9. def f_(z: Double): String

    Format a double value.

    Format a double value.

    z

    the double value to format

    Definition Classes
    Fit
  10. def fit: VectoD

    Return the quality of fit including 'sst', 'sse', 'mse0', rmse', 'mae', 'rSq', 'df._2', 'rBarSq', 'fStat', 'aic', 'bic'.

    Return the quality of fit including 'sst', 'sse', 'mse0', rmse', 'mae', 'rSq', 'df._2', 'rBarSq', 'fStat', 'aic', 'bic'. Note, if 'sse > sst', the model introduces errors and the 'rSq' may be negative, otherwise, R^2 ('rSq') ranges from 0 (weak) to 1 (strong). Note that 'rSq' is the number 5 measure. Override to add more quality of fit measures.

    Definition Classes
    Fit
  11. def fitLabel: Seq[String]

    Return the labels for the quality of fit measures.

    Return the labels for the quality of fit measures. Override to add more quality of fit measures.

    Definition Classes
    Fit
  12. def fitMap: Map[String, String]

    Build a map of quality of fit measures (use of LinedHashMap makes it ordered).

    Build a map of quality of fit measures (use of LinedHashMap makes it ordered). Override to add more quality of fit measures.

    Definition Classes
    Fit
  13. final def flaw(method: String, message: String): Unit
    Definition Classes
    Error
  14. def forwardSel(cols: Set[Int]): (Int, VectoD, VectoD)

    Perform forward selection to add the most predictive variable to the existing model, returning the variable to add, the new parameter vector and the new quality of fit.

    Perform forward selection to add the most predictive variable to the existing model, returning the variable to add, the new parameter vector and the new quality of fit. May be called repeatedly.

    cols

    the columns of matrix x included in the existing model

  15. def gcv(yy: VectoD): Double

    Find an optimal value for the shrinkage parameter 'λ' using Generalized Cross Validation (GCV).

    Find an optimal value for the shrinkage parameter 'λ' using Generalized Cross Validation (GCV).

    yy

    the response vector

  16. val index_rSq: Int
    Definition Classes
    Fit
  17. def mse_: Double

    Return the mean of squares for error (sse / df._2).

    Return the mean of squares for error (sse / df._2). Must call diagnose first.

    Definition Classes
    Fit
  18. def predict(z: MatriD): VectoD

    Predict the value of 'y = f(z)' by evaluating the formula 'y = b dot z', for each row of matrix 'z'.

    Predict the value of 'y = f(z)' by evaluating the formula 'y = b dot z', for each row of matrix 'z'.

    z

    the new matrix to predict

    Definition Classes
    PredictorMat
  19. def predict(z: VectoD): Double

    Predict the value of 'y = f(z)' by evaluating the formula 'y = b dot z', e.g., '(b_0, b_1, b_2) dot (1, z_1, z_2)'.

    Predict the value of 'y = f(z)' by evaluating the formula 'y = b dot z', e.g., '(b_0, b_1, b_2) dot (1, z_1, z_2)'.

    z

    the new vector to predict

    Definition Classes
    PredictorMatPredictor
  20. def predict(z: VectoI): Double

    Given a new discrete data vector z, predict the y-value of f(z).

    Given a new discrete data vector z, predict the y-value of f(z).

    z

    the vector to use for prediction

    Definition Classes
    Predictor
  21. def residual: VectoD

    Return the vector of residuals/errors.

    Return the vector of residuals/errors.

    Definition Classes
    Predictor
  22. def sumCoeff(b: VectoD, stdErr: VectoD = null): String

    Produce the summary report portion for the cofficients.

    Produce the summary report portion for the cofficients.

    b

    the parameters/coefficients for the model

    Definition Classes
    Fit
  23. def summary(): Unit

    Compute diagostics for the regression model.

    Compute diagostics for the regression model.

    Definition Classes
    PredictorMat
  24. def summary(b: VectoD, stdErr: VectoD = null): String

    Produce a summary report with diagnostics for each predictor 'x_j' and the overall quality of fit.

    Produce a summary report with diagnostics for each predictor 'x_j' and the overall quality of fit.

    b

    the parameters/coefficients for the model

    Definition Classes
    Fit
  25. def train(yy: VectoD = y): RidgeRegression

    Train the predictor by fitting the parameter vector (b-vector) in the multiple regression equation

    Train the predictor by fitting the parameter vector (b-vector) in the multiple regression equation

    yy = b dot x + e = [b_1, ... b_k] dot [x_1, ... x_k] + e

    using the least squares method.

    yy

    the response vector

    Definition Classes
    RidgeRegressionPredictorMatPredictor
  26. def train(): PredictorMat

    Given a set of data vectors 'x's and their corresponding responses 'y's, passed into the implementing class, train the prediction function 'y = f(x)' by fitting its parameters.

    Given a set of data vectors 'x's and their corresponding responses 'y's, passed into the implementing class, train the prediction function 'y = f(x)' by fitting its parameters.

    Definition Classes
    PredictorMat
  27. def vif: VectoD

    Compute the Variance Inflation Factor 'VIF' for each variable to test for multi-collinearity by regressing 'xj' against the rest of the variables.

    Compute the Variance Inflation Factor 'VIF' for each variable to test for multi-collinearity by regressing 'xj' against the rest of the variables. A VIF over 10 indicates that over 90% of the variance of 'xj' can be predicted from the other variables, so 'xj' is a candidate for removal from the model.

  28. def xtx_λI(λ: Double): Unit

    Compute x.t * x + λI.

    Compute x.t * x + λI.

    λ

    the shrinkage parameter