class RidgeRegression extends PredictorMat
The RidgeRegression
class supports multiple linear ridge regression.
In this case, 'x' is multi-dimensional [x_1, ... x_k]. Ridge regression puts
a penalty on the L2 norm of the parameters b to reduce the chance of them taking
on large values that may lead to less robust models. Both the input matrix 'x'
and the response vector 'y' are centered (zero mean). Fit the parameter vector
'b' in the regression equation
y = b dot x + e = b_1 * x_1 + ... b_k * x_k + e
where 'e' represents the residuals (the part not explained by the model). Use Least-Squares (minimizing the residuals) to solve for the parameter vector 'b' using the regularized Normal Equations:
b = fac.solve (.) with regularization x.t * x + λ * I
Five factorization techniques are provided:
'QR' // QR Factorization: slower, more stable (default) 'Cholesky' // Cholesky Factorization: faster, less stable (reasonable choice) 'SVD' // Singular Value Decomposition: slowest, most robust 'LU' // LU Factorization: similar, but better than inverse 'Inverse' // Inverse/Gaussian Elimination, classical textbook technique
- See also
statweb.stanford.edu/~tibs/ElemStatLearn/
- Alphabetic
- By Inheritance
- RidgeRegression
- PredictorMat
- Predictor
- Model
- Fit
- Error
- QoF
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new RidgeRegression(x: MatriD, y: VectoD, fname_: Strings = null, hparam: HyperParameter = RidgeRegression.hp, technique: RegTechnique.RegTechnique = Cholesky)
- x
the centered data/input m-by-n matrix NOT augmented with a first column of ones
- y
the centered response/output m-vector
- fname_
the feature/variable names
- hparam
the shrinkage hyper-parameter, lambda (0 => OLS) in the penalty term 'lambda * b dot b'
- technique
the technique used to solve for b in (x.t*x + lambda*I)*b = x.t*y
Type Members
- type Fac_QR = Fac_QR_H[MatriD]
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def analyze(x_: MatriD = x, y_: VectoD = y, x_e: MatriD = x, y_e: VectoD = y): PredictorMat
Analyze a dataset using this model using ordinary training with the 'train' method.
Analyze a dataset using this model using ordinary training with the 'train' method.
- x_
the training/full data/input matrix
- y_
the training/full response/output vector
- x_e
the test/full data/input matrix
- y_e
the test/full response/output vector
- Definition Classes
- PredictorMat → Predictor
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- var b: VectoD
- Attributes
- protected
- Definition Classes
- PredictorMat
- def backwardElim(cols: Set[Int], index_q: Int = index_rSqBar, first: Int = 1): (Int, PredictorMat)
Perform backward elimination to find the least predictive variable to remove from the existing model, returning the variable to eliminate, the new parameter vector and the new Quality of Fit (QoF).
Perform backward elimination to find the least predictive variable to remove from the existing model, returning the variable to eliminate, the new parameter vector and the new Quality of Fit (QoF). May be called repeatedly.
- cols
the columns of matrix x currently included in the existing model
- index_q
index of Quality of Fit (QoF) to use for comparing quality
- first
first variable to consider for elimination (default (1) assume intercept x_0 will be in any model)
- Definition Classes
- PredictorMat
- See also
Fit
for index of QoF measures.
- def backwardElimAll(index_q: Int = index_rSqBar, first: Int = 1, cross: Boolean = true): (Set[Int], MatriD)
Perform backward elimination to find the least predictive variables to remove from the full model, returning the variables left and the new Quality of Fit (QoF) measures for all steps.
Perform backward elimination to find the least predictive variables to remove from the full model, returning the variables left and the new Quality of Fit (QoF) measures for all steps.
- index_q
index of Quality of Fit (QoF) to use for comparing quality
- first
first variable to consider for elimination
- cross
whether to include the cross-validation QoF measure
- Definition Classes
- PredictorMat
- See also
Fit
for index of QoF measures.
- def buildModel(x_cols: MatriD): RidgeRegression
Build a sub-model that is restricted to the given columns of the data matrix.
Build a sub-model that is restricted to the given columns of the data matrix.
- x_cols
the columns that the new model is restricted to
- Definition Classes
- RidgeRegression → PredictorMat
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native() @HotSpotIntrinsicCandidate()
- def corrMatrix(xx: MatriD = x): MatriD
Return the correlation matrix for the columns in data matrix 'xx'.
Return the correlation matrix for the columns in data matrix 'xx'.
- xx
the data matrix shose correlation matrix is sought
- Definition Classes
- PredictorMat → Predictor
- def crossValidate(k: Int = 10, rando: Boolean = true): Array[Statistic]
- Definition Classes
- PredictorMat
- def diagnose(e: VectoD, yy: VectoD, yp: VectoD, w: VectoD = null, ym_: Double = noDouble): Unit
Diagnose the health of the model by computing the Quality of Fit (QoF) measures, from the error/residual vector and the predicted & actual responses.
Diagnose the health of the model by computing the Quality of Fit (QoF) measures, from the error/residual vector and the predicted & actual responses. For some models the instances may be weighted.
- e
the m-dimensional error/residual vector (yy - yp)
- yy
the actual response/output vector to use (test/full)
- yp
the predicted response/output vector (test/full)
- w
the weights on the instances (defaults to null)
- ym_
the mean of the actual response/output vector to use (training/full)
- var e: VectoD
- Attributes
- protected
- Definition Classes
- PredictorMat
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def eval(ym: Double, y_e: VectoD, yp: VectoD): PredictorMat
Compute the error (difference between actual and predicted) and useful diagnostics for the test dataset.
Compute the error (difference between actual and predicted) and useful diagnostics for the test dataset. Requires predicted responses to be passed in.
- ym
the training/full mean actual response/output vector
- y_e
the test/full actual response/output vector
- yp
the test/full predicted response/output vector
- Definition Classes
- PredictorMat
- def eval(x_e: MatriD = x, y_e: VectoD = y): PredictorMat
Compute the error (difference between actual and predicted) and useful diagnostics for the test dataset.
Compute the error (difference between actual and predicted) and useful diagnostics for the test dataset.
- x_e
the test/full data/input matrix (defualts to full x)
- y_e
the test/full response/output vector (defualts to full y)
- Definition Classes
- PredictorMat → Model
- def f_(z: Double): String
Format a double value.
- def findLambda: (Double, Double)
Find an optimal value for the shrinkage parameter 'λ' using Cross Validation to minimize 'sse_cv'.
Find an optimal value for the shrinkage parameter 'λ' using Cross Validation to minimize 'sse_cv'. The search starts with the low default value for 'λ' doubles it with each iteration, returning the minimum 'λ' and it corresponding cross-validated 'sse'.
- def findLambda2(xx: MatriD = x, yy: VectoD = y): Double
Find an optimal value for the shrinkage parameter 'λ' using Training to minimize 'sse'.
Find an optimal value for the shrinkage parameter 'λ' using Training to minimize 'sse'. FIX - try other QoF measures, e.g., sse_cv
- xx
the data/input matrix (full or test)
- yy
the response/output vector (full or test)
- def fit: VectoD
Return the Quality of Fit (QoF) measures corresponding to the labels given above in the 'fitLabel' method.
Return the Quality of Fit (QoF) measures corresponding to the labels given above in the 'fitLabel' method. Note, if 'sse > sst', the model introduces errors and the 'rSq' may be negative, otherwise, R^2 ('rSq') ranges from 0 (weak) to 1 (strong). Override to add more quality of fit measures.
- def fitLabel: Seq[String]
Return the labels for the Quality of Fit (QoF) measures.
- def fitMap: Map[String, String]
Build a map of quality of fit measures (use of
LinkedHashMap
makes it ordered).Build a map of quality of fit measures (use of
LinkedHashMap
makes it ordered).- Definition Classes
- QoF
- final def flaw(method: String, message: String): Unit
- Definition Classes
- Error
- var fname: Strings
- Attributes
- protected
- Definition Classes
- PredictorMat
- def forwardSel(cols: Set[Int], index_q: Int = index_rSqBar): (Int, PredictorMat)
Perform forward selection to find the most predictive variable to add the existing model, returning the variable to add and the new model.
Perform forward selection to find the most predictive variable to add the existing model, returning the variable to add and the new model. May be called repeatedly.
- cols
the columns of matrix x currently included in the existing model
- index_q
index of Quality of Fit (QoF) to use for comparing quality
- Definition Classes
- PredictorMat → Predictor
- See also
Fit
for index of QoF measures.
- def forwardSelAll(index_q: Int = index_rSqBar, cross: Boolean = true): (Set[Int], MatriD)
Perform forward selection to find the most predictive variables to have in the model, returning the variables added and the new Quality of Fit (QoF) measures for all steps.
Perform forward selection to find the most predictive variables to have in the model, returning the variables added and the new Quality of Fit (QoF) measures for all steps.
- index_q
index of Quality of Fit (QoF) to use for comparing quality
- cross
whether to include the cross-validation QoF measure
- Definition Classes
- PredictorMat
- See also
Fit
for index of QoF measures.
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
- def getX: MatriD
Return the 'used' data matrix 'x'.
Return the 'used' data matrix 'x'. Mainly for derived classes where 'x' is expanded from the given columns in 'x_', e.g.,
QuadRegression
add squared columns.- Definition Classes
- PredictorMat → Predictor
- def getY: VectoD
Return the 'used' response vector 'y'.
Return the 'used' response vector 'y'. Mainly for derived classes where 'y' is transformed, e.g.,
TranRegression
,Regression4TS
.- Definition Classes
- PredictorMat → Predictor
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
- def help: String
Return the help string that describes the Quality of Fit (QoF) measures provided by the
Fit
class. - def hparameter: HyperParameter
Return the hyper-parameters.
Return the hyper-parameters.
- Definition Classes
- PredictorMat → Model
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- val k: Int
- Attributes
- protected
- Definition Classes
- PredictorMat
- def lambda_: Double
Return the value of the shrinkage parameter 'lambda'.
- def ll(ms: Double = mse0, s2: Double = sig2e, m2: Int = m): Double
The log-likelihood function times -2.
The log-likelihood function times -2. Override as needed.
- ms
raw Mean Squared Error
- s2
MLE estimate of the population variance of the residuals
- Definition Classes
- Fit
- See also
www.stat.cmu.edu/~cshalizi/mreg/15/lectures/06/lecture-06.pdf
www.wiley.com/en-us/Introduction+to+Linear+Regression+Analysis%2C+5th+Edition-p-9780470542811 Section 2.11
- val m: Int
- Attributes
- protected
- Definition Classes
- PredictorMat
- val modelConcept: URI
An optional reference to an ontological concept
An optional reference to an ontological concept
- Definition Classes
- Model
- def modelName: String
An optional name for the model (or modeling technique)
An optional name for the model (or modeling technique)
- Definition Classes
- Model
- def mse_: Double
Return the mean of squares for error (sse / df._2).
Return the mean of squares for error (sse / df._2). Must call diagnose first.
- Definition Classes
- Fit
- val n: Int
- Attributes
- protected
- Definition Classes
- PredictorMat
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
- def parameter: VectoD
Return the vector of parameter/coefficient values.
Return the vector of parameter/coefficient values.
- Definition Classes
- PredictorMat → Model
- def predict(z: MatriD = x): VectoD
Predict the value of 'y = f(z)' by evaluating the formula 'y = b dot z', for each row of matrix 'z'.
Predict the value of 'y = f(z)' by evaluating the formula 'y = b dot z', for each row of matrix 'z'.
- z
the new matrix to predict
- Definition Classes
- PredictorMat → Predictor
- def predict(z: VectoD): Double
Predict the value of 'y = f(z)' by evaluating the formula 'y = b dot z', e.g., '(b_0, b_1, b_2) dot (1, z_1, z_2)'.
Predict the value of 'y = f(z)' by evaluating the formula 'y = b dot z', e.g., '(b_0, b_1, b_2) dot (1, z_1, z_2)'.
- z
the new vector to predict
- Definition Classes
- PredictorMat → Predictor
- def predict(z: VectoI): Double
Given a new discrete data/input vector 'z', predict the 'y'-value of 'f(z)'.
Given a new discrete data/input vector 'z', predict the 'y'-value of 'f(z)'.
- z
the vector to use for prediction
- Definition Classes
- Predictor
- def report: String
Return a basic report on the trained model.
Return a basic report on the trained model.
- Definition Classes
- PredictorMat → Model
- See also
'summary' method for more details
- def resetDF(df_update: PairD): Unit
Reset the degrees of freedom to the new updated values.
Reset the degrees of freedom to the new updated values. For some models, the degrees of freedom is not known until after the model is built.
- df_update
the updated degrees of freedom (model, error)
- Definition Classes
- Fit
- def residual: VectoD
Return the vector of residuals/errors.
Return the vector of residuals/errors.
- Definition Classes
- PredictorMat → Predictor
- def reverse(a: MatriD): MatriD
Return a matrix that is in reverse row order of the given matrix 'a'.
Return a matrix that is in reverse row order of the given matrix 'a'.
- a
the given matrix
- Definition Classes
- PredictorMat
- var sig2e: Double
- Attributes
- protected
- Definition Classes
- Fit
- def stepRegressionAll(index_q: Int = index_rSqBar, cross: Boolean = true): (Set[Int], MatriD)
Perform stepwise regression to find the most predictive variables to have in the model, returning the variables left and the new Quality of Fit (QoF) measures for all steps.
Perform stepwise regression to find the most predictive variables to have in the model, returning the variables left and the new Quality of Fit (QoF) measures for all steps. At each step it calls 'forwardSel' and 'backwardElim' and takes the best of the two actions. Stops when neither action yields improvement.
- index_q
index of Quality of Fit (QoF) to use for comparing quality
- cross
whether to include the cross-validation QoF measure
- Definition Classes
- PredictorMat
- See also
Fit
for index of QoF measures.
- def summary: String
Compute and return summary diagostics for the regression model.
Compute and return summary diagostics for the regression model.
- Definition Classes
- PredictorMat
- def summary(b: VectoD, stdErr: VectoD, vf: VectoD, show: Boolean = false): String
Produce a summary report with diagnostics for each predictor 'x_j' and the overall quality of fit.
Produce a summary report with diagnostics for each predictor 'x_j' and the overall quality of fit.
- b
the parameters/coefficients for the model
- vf
the Variance Inflation Factors (VIFs)
- show
flag indicating whether to print the summary
- Definition Classes
- Fit
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def test(modelName: String, doPlot: Boolean = true): Unit
Test the model on the full dataset (i.e., train and evaluate on full dataset).
Test the model on the full dataset (i.e., train and evaluate on full dataset).
- modelName
the name of the model being tested
- doPlot
whether to plot the actual vs. predicted response
- Definition Classes
- Predictor
- def toString(): String
- Definition Classes
- AnyRef → Any
- def train(x_: MatriD, y_: VectoD): RidgeRegression
Train the predictor by fitting the parameter vector (b-vector) in the multiple regression equation
Train the predictor by fitting the parameter vector (b-vector) in the multiple regression equation
yy = b dot x + e = [b_1, ... b_k] dot [x_1, ... x_k] + e
using the least squares method.
- x_
the data/input matrix
- y_
the response/ouput vector
- Definition Classes
- RidgeRegression → PredictorMat → Model
- def train2(x_: MatriD = x, y_: VectoD = y): PredictorMat
Train a predictive model 'y_ = f(x_) + e' where 'x_' is the data/input matrix and 'y_' is the response/output vector.
Train a predictive model 'y_ = f(x_) + e' where 'x_' is the data/input matrix and 'y_' is the response/output vector. These arguments default to the full dataset 'x' and 'y', but may be restricted to a training dataset. Training involves estimating the model parameters 'b'. The 'train2' method should work like the 'train' method, but should also optimize hyper-parameters (e.g., shrinkage or learning rate). Only implementing classes needing this capability should implement this method.
- x_
the training/full data/input matrix (defaults to full x)
- y_
the training/full response/output vector (defaults to full y)
- Definition Classes
- PredictorMat
- def vif(skip: Int = 1): VectoD
Compute the Variance Inflation Factor 'VIF' for each variable to test for multi-collinearity by regressing 'x_j' against the rest of the variables.
Compute the Variance Inflation Factor 'VIF' for each variable to test for multi-collinearity by regressing 'x_j' against the rest of the variables. A VIF over 10 indicates that over 90% of the variance of 'x_j' can be predicted from the other variables, so 'x_j' may be a candidate for removal from the model. Note: override this method to use a superior regression technique.
- skip
the number of columns of x at the beginning to skip in computing VIF
- Definition Classes
- PredictorMat
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- val x: MatriD
- Attributes
- protected
- Definition Classes
- PredictorMat
- val y: VectoD
- Attributes
- protected
- Definition Classes
- PredictorMat
Deprecated Value Members
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated