Compute the counts for each interval in the histogram.
Compute the counts for each interval in the histogram.
the vector for feature j given class c.
the number intervals
Calculate statistics (sample mean and sample variance) for each class by feature.
Check the correlation of the feature vectors (fea).
Check the correlation of the feature vectors (fea). If the correlations are too high, the independence assumption may be dubious.
Given a continuous data vector z, classify it returning the class number (0, .
Given a continuous data vector z, classify it returning the class number (0, ..., k-1) with the highest relative posterior probability.
the data vector to classify
Given a new discrete (integer-valued) data vector 'z', determine which class it belongs to, by first converting it to a vector of doubles.
Given a new discrete (integer-valued) data vector 'z', determine which class it belongs to, by first converting it to a vector of doubles.
the vector to classify
Show the flaw by printing the error message.
Show the flaw by printing the error message.
the method where the error occurred
the error message
the number of data vectors in training-set (# rows)
the number of data vectors in training-set (# rows)
the training-set size as a Double
the training-set size as a Double
the number of features/variables (# columns)
the number of features/variables (# columns)
the feature-set size as a Double
the feature-set size as a Double
Test the quality of the training with a test-set and return the fraction of correct classifications.
Test the quality of the training with a test-set and return the fraction of correct classifications.
the real-valued test vectors stored as rows of a matrix
the test classification vector, where yy_i = class for row i of xx
Train the classifier, i.
Train the classifier, i.e., calculate statistics and create conditional density (cd) functions. Assumes that conditional densities follow the Normal (Gaussian) distribution.
The
NaiveBayes
class implements a Gaussian Naive Bayes Classifier, which is the most commonly used such classifier for continuous input data. The classifier is trained using a data matrix 'x' and a classification vector 'y'. Each data vector in the matrix is classified into one of 'k' classes numbered 0, ..., k-1. Prior probabilities are calculated based on the population of each class in the training-set. Relative posterior probabilities are computed by multiplying these by values computed using conditional density functions based on the Normal (Gaussian) distribution. The classifier is naive, because it assumes feature independence and therefore simply multiplies the conditional densities.