The Clusterer
trait provides a common framework for several clustering algorithms.
Attributes
- Companion
- object
- Graph
-
- Supertypes
-
class Objecttrait Matchableclass Any
- Known subtypes
-
class HierClustererclass KMeansClustererclass KMeansClusterer2class KMeansClustererHWclass KMeansClustererPPclass KMeansPPClustererclass MarkovClustererShow all
Members list
Value members
Abstract methods
Return the centroids (a centroid is the mean of points in a cluster). Should only be called after train.
Return the centroids (a centroid is the mean of points in a cluster). Should only be called after train.
Attributes
Given a new point/vector z, determine which cluster it belongs to.
Given a new point/vector z, determine which cluster it belongs to.
Value parameters
- z
-
the vector to classify
Attributes
Return the cluster assignments. Should only be called after 'train'.
Return the cluster assignments. Should only be called after 'train'.
Attributes
Return the sizes (number of points within) of the clusters. Should only be called after train.
Return the sizes (number of points within) of the clusters. Should only be called after train.
Attributes
Given a set of points/vectors, put them in clusters, returning the cluster assignments. A basic goal is to minimize the sum of squared errors (sse) in terms of squared distances of points in the cluster to its centroid.
Given a set of points/vectors, put them in clusters, returning the cluster assignments. A basic goal is to minimize the sum of squared errors (sse) in terms of squared distances of points in the cluster to its centroid.
Attributes
Concrete methods
Calculate the centroids based on current assignment of points to clusters and update the 'cent' matrix that stores the centroids in its rows.
Calculate the centroids based on current assignment of points to clusters and update the 'cent' matrix that stores the centroids in its rows.
Value parameters
- cent
-
the matrix holding the centroids in its rows
- sz
-
the sizes of the clusters (number of points)
- to_c
-
the cluster assignment array
- x
-
the data matrix holding the points {x_i = x(i)} in its rows
Attributes
Check to see if the sum of squared errors is optimum.
Check to see if the sum of squared errors is optimum.
Value parameters
- opt
-
the known (from human/oracle) optimum
- to_c
-
the cluster assignments
- x
-
the data matrix holding the points
Attributes
Compute the distances between vector/point 'u' and the points stored as rows in matrix 'cn'
Compute the distances between vector/point 'u' and the points stored as rows in matrix 'cn'
Value parameters
- cn
-
the matrix holding several centroids
- kc_
-
the number of centroids so far
- u
-
the given vector/point (u = x_i)
Attributes
Return whether the centroids have been initialized.
Return whether the centroids have been initialized.
Attributes
Return the name of the 'c'-th cluster.
Return the name of the 'c'-th cluster.
Value parameters
- c
-
the c-th cluster
Attributes
Set the names for the clusters.
Set the names for the clusters.
Value parameters
- nm
-
the array of names
Attributes
Set the random stream to 's'. Method must be called in implemeting classes before creating any random generators.
Set the random stream to 's'. Method must be called in implemeting classes before creating any random generators.
Value parameters
- s
-
the new value for the random number stream
Attributes
Compute the sum of squared errors within all clusters, where error is indicated by e.g., the distance from a point to its centroid.
Compute the sum of squared errors within all clusters, where error is indicated by e.g., the distance from a point to its centroid.
Value parameters
- to_c
-
the cluster assignments
- x
-
the data matrix holding the points
Attributes
Compute the sum of squared errors from the points in cluster 'c' to the cluster's centroid.
Compute the sum of squared errors from the points in cluster 'c' to the cluster's centroid.
Value parameters
- c
-
the current cluster
- to_c
-
the cluster assignments
- x
-
the data matrix holding the points