Packages

class KMeansClusterer extends Clusterer with Error

The KMeansClusterer class cluster several vectors/points using k-means clustering. Either (1) randomly assign points to 'k' clusters or (2) randomly pick 'k' points as initial centroids (technique (1) to work better and is the primary technique). Iteratively, reassign each point to the cluster containing the closest centroid. Stop when there are no changes to the clusters. -----------------------------------------------------------------------------

Linear Supertypes
Error, Clusterer, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. KMeansClusterer
  2. Error
  3. Clusterer
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new KMeansClusterer(x: MatrixD, k: Int, s: Int = 0, primary: Boolean = true, remote: Boolean = true, post: Boolean = true)

    x

    the vectors/points to be clustered stored as rows of a matrix

    k

    the number of clusters to make

    s

    the random number stream (to vary the clusters made)

    primary

    true indicates use the primary technique for initiating the clustering

    remote

    whether to take a maximally remote or a randomly selected point

    post

    whether to perform post processing by randomly swapping points to reduce error

Value Members

  1. def assign(): Unit

    Randomly assign each vector/point 'x(i)' to a random cluster.

    Randomly assign each vector/point 'x(i)' to a random cluster. Primary technique for initiating the clustering.

  2. def calcCentroids(): Unit

    Calculate the centroids based on current assignment of points to clusters.

  3. def centroids(): MatrixD

    Return the centroids.

    Return the centroids. Should only be called after cluster ().

    Definition Classes
    KMeansClustererClusterer
  4. def checkOpt(opt: Double): Boolean

    Check to see if the sum of squared errors is optimum.

    Check to see if the sum of squared errors is optimum.

    opt

    the known (from human/oracle) optimum

  5. def classify(y: VectorD): Int

    Given a new point/vector 'y', determine which cluster it belongs to, i.e., the cluster whose centroid it is closest to.

    Given a new point/vector 'y', determine which cluster it belongs to, i.e., the cluster whose centroid it is closest to.

    y

    the vector to classify

    Definition Classes
    KMeansClustererClusterer
  6. def cluster(): Array[Int]

    Iteratively recompute clusters until the assignment of points does not change, returning the final cluster assignment vector.

    Iteratively recompute clusters until the assignment of points does not change, returning the final cluster assignment vector.

    Definition Classes
    KMeansClustererClusterer
  7. def csize(): VectorI

    Return the sizes of the centroids.

    Return the sizes of the centroids. Should only be called after cluster ().

    Definition Classes
    KMeansClustererClusterer
  8. def distance(u: VectorD, v: VectorD): Double

    Compute a distance metric (e.g., distance squared) between vectors/points 'u' and 'v'.

    Compute a distance metric (e.g., distance squared) between vectors/points 'u' and 'v'. Override this methods to use a different metric, e.g., 'norm' - the Euclidean distance, 2-norm 'norm1' - the Manhattan distance, 1-norm

    u

    the first vector/point

    v

    the second vector/point

    Definition Classes
    Clusterer
  9. final def flaw(method: String, message: String): Unit
    Definition Classes
    Error
  10. def getName(i: Int): String

    Get the name of the i-th cluster.

    Get the name of the i-th cluster.

    Definition Classes
    Clusterer
  11. def name_(n: Array[String]): Unit

    Set the names for the clusters.

    Set the names for the clusters.

    n

    the array of names

    Definition Classes
    Clusterer
  12. def pickCentroids(): Unit

    Randomly pick vectors/points to serve as the initial 'k' centroids (cent).

    Randomly pick vectors/points to serve as the initial 'k' centroids (cent). Secondary technique for initiating the clustering.

  13. def reassign(): Boolean

    Reassign each vector/point to the cluster with the closest centroid.

    Reassign each vector/point to the cluster with the closest centroid. Indicate done, if no points changed clusters (for stopping rule).

  14. def sse(c: Int): Double

    Compute the sum of squared errors (distance squared) from all points in cluster 'c' to the cluster's centroid.

    Compute the sum of squared errors (distance squared) from all points in cluster 'c' to the cluster's centroid.

    c

    the current cluster

  15. def sse(x: MatrixD): Double

    Compute the sum of squared errors within the clusters, where error is indicated by e.g., the distance from a point to its centroid.

    Compute the sum of squared errors within the clusters, where error is indicated by e.g., the distance from a point to its centroid.

    Definition Classes
    Clusterer
  16. var tc1: Double
  17. var tc2: Double