The ANCOVA
class supports ANalysis of COVAraiance (ANCOVA).
A General Linear Model (GLM) can be developed using the GLM
trait and object
(see below).
A General Linear Model (GLM) can be developed using the GLM
trait and object
(see below). The implementation currently supports univariate models with
multivariate models (where each response is a vector) planned for the future.
This version uses parallel processing to speed up execution.
It provides factory methods for the following special types of GLMs:
Regression
- multiple linear regression,
RidgeRegression
- robust multiple linear regression,
TranRegression
- transformed (e.g., log) multiple linear regression,
PolyRegression
- polynomial regression,
TrigRegression
- trigonometric regression
ResponseSurface
- response surface regression,
ANCOVA
- GLM form of ANalysis of COVAriance.
The following special types are excluded since they do not utilize large matrices.
SimpleRegression
- simple linear regression,
ANOVA
- GLM form of ANalysis Of VAriance,
The NaiveBayes
class implements an Integer-Based Naive Bayes Classifier,
which is a commonly used such classifier for discrete input data.
The NaiveBayes
class implements an Integer-Based Naive Bayes Classifier,
which is a commonly used such classifier for discrete input data. The
classifier is trained using a data matrix 'x' and a classification vector 'y'.
Each data vector in the matrix is classified into one of 'k' classes numbered
0, ..., k-1. Prior probabilities are calculated based on the population of
each class in the training-set. Relative posterior probabilities are computed
by multiplying these by values computed using conditional probabilities. The
classifier is naive, because it assumes feature independence and therefore
simply multiplies the conditional probabilities.
This version uses parallel processing to speed up execution.
The NaiveBayesR
class implements a Gaussian Naive Bayes Classifier, which
is the most commonly used such classifier for continuous input data.
The NaiveBayesR
class implements a Gaussian Naive Bayes Classifier, which
is the most commonly used such classifier for continuous input data. The
classifier is trained using a data matrix 'x' and a classification vector 'y'.
Each data vector in the matrix is classified into one of 'k' classes numbered
0, ..., k-1. Prior probabilities are calculated based on the population of
each class in the training-set. Relative posterior probabilities are computed
by multiplying these by values computed using conditional density functions
based on the Normal (Gaussian) distribution. The classifier is naive, because
it assumes feature independence and therefore simply multiplies the conditional
densities.
This version uses parallel processing to speed up execution.
The PolyRegression
class supports polynomial regression.
The PolyRegression
class supports polynomial regression. In this case,
't' is expanded to [1, t, t2 ... tk]. Fit the parameter vector 'b' in the
regression equation
y = b dot x + e = b_0 + b_1 * t + b_2 * t2 ... b_k * tk + e
where 'e' represents the residuals (the part not explained by the model). Use Least-Squares (minimizing the residuals) to fit the parameter vector
b = x_pinv * y
where 'x_pinv' is the pseudo-inverse.
www.ams.sunysb.edu/~zhu/ams57213/Team3.pptx
The Regression
class supports multiple linear regression.
The Regression
class supports multiple linear regression. In this case,
'x' is multi-dimensional [1, x_1, ... x_k]. Fit the parameter vector 'b' in
the regression equation
y = b dot x + e = b_0 + b_1 * x_1 + ... b_k * x_k + e
where 'e' represents the residuals (the part not explained by the model). Use Least-Squares (minimizing the residuals) to fit the parameter vector
b = x_pinv * y [ alternative: b = solve (y) ]
where 'x_pinv' is the pseudo-inverse. Three techniques are provided:
Fac_QR // QR Factorization: slower, more stable (default) Fac_Cholesky // Cholesky Factorization: faster, less stable (reasonable choice) Inverse // Inverse/Gaussian Elimination, classical textbook technique (outdated)
This version uses parallel processing to speed up execution.
see.stanford.edu/materials/lsoeldsee263/05-ls.pdf
The ResponseSurface
class uses multiple regression to fit a quadratic/cubic
surface to the data.
The ResponseSurface
class uses multiple regression to fit a quadratic/cubic
surface to the data. For example in 2D, the quadratic regression equation is
y = b dot x + e = [b_0, ... b_k] dot [1, x_0, x_02, x_1, x_0*x_1, x_12] + e
scalation.metamodel.QuadraticFit
The RidgeRegression
class supports multiple linear regression.
The RidgeRegression
class supports multiple linear regression. In this
case, 'x' is multi-dimensional [x_1, ... x_k]. Both the input matrix 'x' and
the response vector 'y' are centered (zero mean). Fit the parameter vector
'b' in the regression equation
y = b dot x + e = b_1 * x_1 + ... b_k * x_k + e
where 'e' represents the residuals (the part not explained by the model). Use Least-Squares (minimizing the residuals) to fit the parameter vector
b = x_pinv * y [ alternative: b = solve (y) ]
where 'x_pinv' is the pseudo-inverse. Three techniques are provided:
Fac_QR // QR Factorization: slower, more stable (default) Fac_Cholesky // Cholesky Factorization: faster, less stable (reasonable choice) Inverse // Inverse/Gaussian Elimination, classical textbook technique (outdated)
This version uses parallel processing to speed up execution. see http://statweb.stanford.edu/~tibs/ElemStatLearn/
The TranRegression
class supports transformed multiple linear regression.
The TranRegression
class supports transformed multiple linear regression.
In this case, 'x' is multi-dimensional [1, x_1, ... x_k]. Fit the parameter
vector 'b' in the transformed regression equation
transform (y) = b dot x + e = b_0 + b_1 * x_1 + b_2 * x_2 ... b_k * x_k + e
where 'e' represents the residuals (the part not explained by the model) and 'transform' is the function (defaults to log) used to transform the response vector 'y'. Use Least-Squares (minimizing the residuals) to fit the parameter vector
b = x_pinv * y
where 'x_pinv' is the pseudo-inverse.
www.ams.sunysb.edu/~zhu/ams57213/Team3.pptx
The TrigRegression
class supports trigonometric regression.
The TrigRegression
class supports trigonometric regression. In this case,
't' is expanded to [1, sin (wt), cos (wt), sin (2wt), cos (2wt), ...].
Fit the parameter vector 'b' in the regression equation
y = b dot x + e = b_0 + b_1 sin (wt) + b_2 cos (wt) + b_3 sin (2wt) + b_4 cos (2wt) + ... + e
where 'e' represents the residuals (the part not explained by the model). Use Least-Squares (minimizing the residuals) to fit the parameter vector
b = x_pinv * y
where 'x_pinv' is the pseudo-inverse. http://link.springer.com/article/10.1023%2FA%3A1022436007242#page-1
The ANCOVATest
object tests the ANCOVA
class using the following
regression equation.
The ANCOVATest
object tests the ANCOVA
class using the following
regression equation.
y = b dot x = b_0 + b_1*x_1 + b_2*x_2 + b_3*d_1 + b_4*d_2
The GLM
object makes the GLM
trait's methods directly available.
The GLM
object makes the GLM
trait's methods directly available.
This approach (using traits and objects) allows the methods to also be inherited.
The GLMTest
object tests the GLM
object using the following regression
equation.
The GLMTest
object tests the GLM
object using the following regression
equation.
y = b dot x = b_0 + b_1*x_1 + b_2*x_2 + b_3*d_1 + b_4*d_2
NaiveBayes
is the companion object for the NaiveBayes
class.
The NaiveBayesRTest
object is used to test the 'NaiveBayesR' class.
The NaiveBayesRTest
object is used to test the 'NaiveBayesR' class.
* Ex: Classify whether a person is male (M) or female (F) based on the measured features.
http://en.wikipedia.org/wiki/Naive_Bayes_classifier
The NaiveBayesTest
object is used to test the 'NaiveBayes' class.
The NaiveBayesTest
object is used to test the 'NaiveBayes' class.
* Ex: Classify whether a car is more likely to be stolen (1) or not (1).
http://www.inf.u-szeged.hu/~ormandi/ai2/06-naiveBayes-example.pdf
The NaiveBayesTest2
object is used to test the 'NaiveBayes' class.
The NaiveBayesTest2
object is used to test the 'NaiveBayes' class.
Given whether a person is Fast and/or Strong, classify them as making C = 1
or not making C = 0 the football team.
The PolyRegressionTest
object tests PolyRegression
class using the following
regression equation.
The PolyRegressionTest
object tests PolyRegression
class using the following
regression equation.
y = b dot x = b_0 + b_1*t + b_2*t^2.
The RegressionTest
object tests Regression
class using the following
regression equation.
The RegressionTest
object tests Regression
class using the following
regression equation.
y = b dot x = b_0 + b_1*x_1 + b_2*x_2.
Test regression and backward elimination.
http://statmaster.sdu.dk/courses/st111/module03/index.html
The RegressionTest2
object tests Regression
class using the following
regression equation.
The RegressionTest2
object tests Regression
class using the following
regression equation.
y = b dot x = b_0 + b_1*x1 + b_2*x_2.
Test regression using QR Decomposition and Gaussian Elimination for computing the pseudo-inverse.
The RegressionTest3
object tests the multi-colinearity method in the
Regression
class using the following regression equation.
The RegressionTest3
object tests the multi-colinearity method in the
Regression
class using the following regression equation.
y = b dot x = b_0 + b_1*x_1 + b_2*x_2 + b_3*x_3 + b_4 * x_4
online.stat.psu.edu/online/development/stat501/data/bloodpress.txt
online.stat.psu.edu/online/development/stat501/12multicollinearity/05multico_vif.html
The ResponseSurfaceTest
object is used to test the ResponseSurface
class.
The RidgeRegression
companion object is used to center the input matrix 'x'.
The RidgeRegression
companion object is used to center the input matrix 'x'.
This is done by subtracting the column means from each value.
The RidgeRegressionTest
object tests RidgeRegression
class using the following
regression equation.
The RidgeRegressionTest
object tests RidgeRegression
class using the following
regression equation.
y = b dot x = b_1*x_1 + b_2*x_2.
Test regression and backward elimination.
http://statmaster.sdu.dk/courses/st111/module03/index.html
The RidgeRegressionTest2
object tests RidgeRegression
class using the following
regression equation.
The RidgeRegressionTest2
object tests RidgeRegression
class using the following
regression equation.
y = b dot x = b_1*x1 + b_2*x_2.
Test regression using QR Decomposition and Gaussian Elimination for computing the pseudo-inverse.
The RidgeRegressionTest3
object tests the multi-colinearity method in the
RidgeRegression
class using the following regression equation.
The RidgeRegressionTest3
object tests the multi-colinearity method in the
RidgeRegression
class using the following regression equation.
y = b dot x = b_1*x_1 + b_2*x_2 + b_3*x_3 + b_4 * x_4
online.stat.psu.edu/online/development/stat501/data/bloodpress.txt
online.stat.psu.edu/online/development/stat501/12multicollinearity/05multico_vif.html
The TranRegressionTest
object tests TranRegression
class using the following
regression equation.
The TranRegressionTest
object tests TranRegression
class using the following
regression equation.
log (y) = b dot x = b_0 + b_1*x_1 + b_2*x_2.
The TrigRegressionTest
object tests TrigRegression
class using the following
regression equation.
The TrigRegressionTest
object tests TrigRegression
class using the following
regression equation.
y = b dot x = b_0 + b_1*t + b_2*t^2.
The
ANCOVA
class supports ANalysis of COVAraiance (ANCOVA). It allows the addition of a categorical treatment variable 't' into a multiple linear regression. This is done by introducing dummy variables 'dj' to distinguish the treatment level. The problem is again to fit the parameter vector 'b' in the augmented regression equationy = b dot x + e = b0 + b_1 * x_1 + b_2 * x_2 + ... b_k * x_k + b_k+1 * d_1 + b_k+2 * d_2 + ... b_k+l * d_l + e
where 'e' represents the residuals (the part not explained by the model). Use Least-Squares (minimizing the residuals) to fit the parameter vector
b = x_pinv * y
where 'x_pinv' is the pseudo-inverse.
see.stanford.edu/materials/lsoeldsee263/05-ls.pdf