the integer-valued data vectors stored as rows of a matrix
the class vector, where y_i = class for row i of the matrix x
the names for all features/variables
the number of classes
the names for all classes
the value count (number of distinct values) for each feature
use m-estimates (me == 0 => regular MLE estimates)
Check the correlation of the feature vectors (fea).
Check the correlation of the feature vectors (fea). If the correlations are too high, the independence assumption may be dubious.
Given a discrete data vector 'z', classify it returning the class number (0, ..., k-1) with the highest relative posterior probability.
Given a discrete data vector 'z', classify it returning the class number (0, ..., k-1) with the highest relative posterior probability.
the data vector to classify
Given a new continuous data vector 'z', determine which class it belongs to, by first rounding it to an integer-valued vector.
Given a new continuous data vector 'z', determine which class it belongs to, by first rounding it to an integer-valued vector.
the vector to classify
Show the flaw by printing the error message.
Show the flaw by printing the error message.
the method where the error occurred
the error message
Count the frequencies for 'y' having class 'i' and 'x' for cases 0, 1, ...
the number of data vectors in training-set (# rows)
the number of data vectors in training-set (# rows)
the training-set size as a Double
the training-set size as a Double
the number of features/variables (# columns)
the number of features/variables (# columns)
the feature-set size as a Double
the feature-set size as a Double
Test the quality of the training with a test-set and return the fraction of correct classifications.
Test the quality of the training with a test-set and return the fraction of correct classifications.
the integer-valued test vectors stored as rows of a matrix
the test classification vector, where yy_i = class for row i of xx
Train the classifier by computing the probabilities for C, and the conditional probabilities for X_j.
Train the classifier by computing the probabilities for C, and the conditional probabilities for X_j.
Return default values for binary input data (value count (vc) set to 2).
Return default values for binary input data (value count (vc) set to 2).
The
NaiveBayes
class implements an Integer-Based Naive Bayes Classifier, which is a commonly used such classifier for discrete input data. The classifier is trained using a data matrix 'x' and a classification vector 'y'. Each data vector in the matrix is classified into one of 'k' classes numbered 0, ..., k-1. Prior probabilities are calculated based on the population of each class in the training-set. Relative posterior probabilities are computed by multiplying these by values computed using conditional probabilities. The classifier is naive, because it assumes feature independence and therefore simply multiplies the conditional probabilities. This version uses parallel processing to speed up execution.