the vectors/points to be clustered stored as rows of a matrix
the number of clusters to make
the random number stream (to vary the clusters made)
true indicates use the primary technique for initiating the clustering
Assign each vector/point to a random cluster.
Assign each vector/point to a random cluster. Primary technique for initiating the clustering.
Calculate the centroids based on current assignment of points to clusters.
Given a new point/vector y, determine which cluster it belongs to (i.e., the cluster whose centroid it is closest to.
Given a new point/vector y, determine which cluster it belongs to (i.e., the cluster whose centroid it is closest to.
the vector to classify
Iteratively recompute clusters until the assignment of points does not change, returning the final cluster assignment vector.
Iteratively recompute clusters until the assignment of points does not change, returning the final cluster assignment vector.
Compute a distance metric between vectors/points u and v.
Compute a distance metric between vectors/points u and v.
the first vector/point
the second vector/point
Show the flaw by printing the error message.
Show the flaw by printing the error message.
the method where the error occurred
the error message
Get the name of the i-th cluster.
Get the name of the i-th cluster.
Set the names for the clusters.
Randomly pick vectors/points to serve as the initial k centroids (cent).
Randomly pick vectors/points to serve as the initial k centroids (cent). Secondary technique for initiating the clustering.
Reassign each vector/point to the cluster with the closest centroid.
Reassign each vector/point to the cluster with the closest centroid. Indicate done, if no points changed clusters (for stopping rule).
The
KMeansClustering
class cluster several vectors/points using k-means clustering. Either (1) randomly assign points to 'k' clusters or (2) randomly pick 'k' points as initial centroids (technique (1) to work better and is the primary technique). Iteratively, reassign each point to the cluster containing the closest centroid. Stop when there are no changes to the clusters.