the vectors/points of classified data stored as rows of a matrix
the classification of each vector in x
the names for all features/variables
the number of classes
the names for all classes
the number of nearest neighbors to consider
Given a new point/vector 'z', determine which class it belongs to (i.
Given a new point/vector 'z', determine which class it belongs to (i.e., the class getting the most votes from its 'knn' nearest neighbors.
the vector to classify
Given a new discrete (integer-valued) data vector 'z', determine which class it belongs to, by first converting it to a vector of doubles.
Given a new discrete (integer-valued) data vector 'z', determine which class it belongs to, by first converting it to a vector of doubles.
the vector to classify
Compute a distance metric between vectors/points u and v.
Compute a distance metric between vectors/points u and v.
the first vector/point
the second vector/point
Show the flaw by printing the error message.
Show the flaw by printing the error message.
the method where the error occurred
the error message
Find the knn nearest neighbors (top-knn) to vector 'z'.
Find the knn nearest neighbors (top-knn) to vector 'z'.
the vector to be classified
the number of data vectors in training-set (# rows)
the number of data vectors in training-set (# rows)
the training-set size as a Double
the training-set size as a Double
the number of features/variables (# columns)
the number of features/variables (# columns)
the feature-set size as a Double
the feature-set size as a Double
Remove the most distant neighbor and add new neighbor 'i'.
Remove the most distant neighbor and add new neighbor 'i'. Maintain the 'topK' nearest neighbors in sorted order farthest to nearest.
Test the quality of the training with a test-set and return the fraction of correct classifications.
Test the quality of the training with a test-set and return the fraction of correct classifications.
the real-valued test vectors stored as rows of a matrix
the test classification vector, where yy_i = class for row i of xx
Training involves resetting the data structures before each classification.
Training involves resetting the data structures before each classification. KNN uses lazy training, so most of it is done during classification.
The
KNN_Classifier
class is used to classify a new vector 'z' into one of 'c' classes. It works by finding its 'k' nearest neighbors. These neighbors essentially vote according to their classification. The class with most votes is selected as the classification of 'z'. Using a distance metric, the 'k' vectors nearest to 'z' are found in the training data, which is stored row-wise in the data matrix 'x'. The corresponding classifications are given in the vector 'y', such the classification for vector 'x(i)' is given by 'y(i)'.