class GMM extends Classifier
The GMM
class is used for univariate Gaussian Mixture Models. Given a
sample, thought to be generated according to 'k' Normal distributions, estimate
the values for the 'mu' and 'sig2' parameters for the Normal distributions.
Given a new value, determine which class (0, ..., k-1) it is most likely to
have come from.
FIX: need a class for multivariate Gaussian Mixture Models.
FIX: need to adapt for clustering.
-----------------------------------------------------------------------------
- Alphabetic
- By Inheritance
- GMM
- Classifier
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
GMM(x: VectoD, k: Int = 3)
- x
the data vector
- k
the number of components in the mixture
Value Members
-
def
classify(z: VectoI): (Int, String, Double)
Classify the first point in vector 'z'.
Classify the first point in vector 'z'.
- z
the vector to be classified.
- Definition Classes
- GMM → Classifier
-
def
classify(z: VectoD): (Int, String, Double)
Classify the first point in vector 'z'.
Classify the first point in vector 'z'.
- z
the vector to be classified.
- Definition Classes
- GMM → Classifier
-
def
crossValidate(nx: Int = 10, show: Boolean = false): Double
Test the accuracy of the classified results by cross-validation, returning the accuracy.
Test the accuracy of the classified results by cross-validation, returning the accuracy. The "test data" starts at 'testStart' and ends at 'testEnd', the rest of the data is "training data'. FIX - should return a StatVector
- nx
the number of crosses and cross-validations (defaults to 10x).
- show
the show flag (show result from each iteration)
- Definition Classes
- Classifier
-
def
crossValidateRand(nx: Int = 10, show: Boolean = false): Double
Test the accuracy of the classified results by cross-validation, returning the accuracy.
Test the accuracy of the classified results by cross-validation, returning the accuracy. This version of cross-validation relies on "subtracting" frequencies from the previously stored global data to achieve efficiency. FIX - are the comments correct? FIX - should return a StatVector
- nx
number of crosses and cross-validations (defaults to 10x).
- show
the show flag (show result from each iteration)
- Definition Classes
- Classifier
-
def
exp_step(): Unit
Execute the Expectation (E) Step in the EM algoithm.
-
def
fit(y: VectoI, yp: VectoI, k: Int = 2): VectoD
Return the quality of fit including 'acc', 'prec', 'recall', 'kappa'.
Return the quality of fit including 'acc', 'prec', 'recall', 'kappa'. Override to add more quality of fit measures.
- y
the actual class labels
- yp
the precicted class labels
- k
the number of class labels
- Definition Classes
- Classifier
- See also
ConfusionMat
medium.com/greyatom/performance-metrics-for-classification-problems-in-machine-learning-part-i-b085d432082b
-
def
fitLabel: Seq[String]
Return the labels for the fit.
Return the labels for the fit. Override when necessary.
- Definition Classes
- Classifier
-
def
max_step(): Unit
Execute the Maximumization (M) Step in the EM algoithm.
-
def
reset(): Unit
Reset ...
Reset ... FIX
- Definition Classes
- GMM → Classifier
-
def
size: Int
Return the size of the feature set.
Return the size of the feature set.
- Definition Classes
- GMM → Classifier
-
def
test(itest: IndexedSeq[Int]): Double
Test ...
-
def
test(testStart: Int, testEnd: Int): Double
Test the quality of the training with a test dataset and return the fraction of correct classifications.
Test the quality of the training with a test dataset and return the fraction of correct classifications. Can be used when the dataset is randomized so that the testing/training part of a dataset corresponds to simple slices of vectors and matrices.
- testStart
the beginning of test region (inclusive).
- testEnd
the end of test region (exclusive).
- Definition Classes
- Classifier
-
def
train(itest: IndexedSeq[Int]): GMM
Train the model to determine values for the parameter vectors 'mu' and 'sig2'.
Train the model to determine values for the parameter vectors 'mu' and 'sig2'.
- itest
the indices of test data
- Definition Classes
- GMM → Classifier
-
def
train(): Classifier
Train the classifier by computing the probabilities from a training dataset of data vectors and their classifications.
Train the classifier by computing the probabilities from a training dataset of data vectors and their classifications. Must be implemented in any extending class. Can be used when the whole dataset is used for training.
- Definition Classes
- Classifier
-
def
train(testStart: Int, testEnd: Int): Classifier
Train the classifier by computing the probabilities from a training dataset of data vectors and their classifications.
Train the classifier by computing the probabilities from a training dataset of data vectors and their classifications. Must be implemented in any extending class. Can be used when the dataset is randomized so that the training part of a dataset corresponds to simple slices of vectors and matrices.
- testStart
starting index of test region (inclusive) used in cross-validation
- testEnd
ending index of test region (exclusive) used in cross-validation
- Definition Classes
- Classifier