trait Clusterer extends AnyRef

The Clusterer trait provides a common framework for several clustering algorithms.

See also

package.scala for 'distance' function

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Clusterer
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Abstract Value Members

  1. abstract def centroids: MatriD

    Return the centroids (a centroid is the mean of points in a cluster).

    Return the centroids (a centroid is the mean of points in a cluster). Should only be called after 'train'.

  2. abstract def classify(z: VectoD): Int

    Given a new point/vector z, determine which cluster it belongs to.

    Given a new point/vector z, determine which cluster it belongs to.

    z

    the vector to classify

  3. abstract def cluster: Array[Int]

    Return the cluster assignments.

    Return the cluster assignments. Should only be called after 'train'.

  4. abstract def csize: VectoI

    Return the sizes (number of points within) of the clusters.

    Return the sizes (number of points within) of the clusters. Should only be called after 'train'.

  5. abstract def train(): Clusterer

    Given a set of points/vectors, put them in clusters, returning the cluster assignments.

    Given a set of points/vectors, put them in clusters, returning the cluster assignments. A basic goal is to minimize the sum of squared errors (sse) in terms of squared distances of points in the cluster to its centroid.

Concrete Value Members

  1. def calcCentroids(x: MatriD, to_c: Array[Int], sz: VectoI, cent: MatriD): Unit

    Calculate the centroids based on current assignment of points to clusters and update the 'cent' matrix that stores the centroids in its rows.

    Calculate the centroids based on current assignment of points to clusters and update the 'cent' matrix that stores the centroids in its rows.

    x

    the data matrix holding the points {x_i = x(i)} in its rows

    to_c

    the cluster assignment array

    sz

    the sizes of the clusters (number of points)

    cent

    the matrix holding the centroids in its rows

  2. def checkOpt(x: MatriD, to_c: Array[Int], opt: Double): Boolean

    Check to see if the sum of squared errors is optimum.

    Check to see if the sum of squared errors is optimum.

    x

    the data matrix holding the points

    to_c

    the cluster assignments

    opt

    the known (from human/oracle) optimum

  3. def distance(u: VectoD, cn: MatriD, kc_: Int = -1): VectoD

    Compute the distances between vector/point 'u' and the points stored as rows in matrix 'cn'

    Compute the distances between vector/point 'u' and the points stored as rows in matrix 'cn'

    u

    the given vector/point (u = x_i)

    cn

    the matrix holding several centroids

    kc_

    the number of centroids so far

  4. def initCentroids(): Boolean
  5. def name(c: Int): String

    Return the name of the 'c'-th cluster.

    Return the name of the 'c'-th cluster.

    c

    the c-th cluster

  6. def name_(nm: Strings): Unit

    Set the names for the clusters.

    Set the names for the clusters.

    nm

    the array of names

  7. def setStream(s: Int): Unit

    Set the random stream to 's'.

    Set the random stream to 's'. Method must be called in implemeting classes before creating any random generators.

    s

    the new value for the random number stream

  8. def sse(x: MatriD, c: Int, to_c: Array[Int]): Double

    Compute the sum of squared errors from the points in cluster 'c' to the cluster's centroid.

    Compute the sum of squared errors from the points in cluster 'c' to the cluster's centroid.

    x

    the data matrix holding the points

    c

    the current cluster

    to_c

    the cluster assignments

  9. def sse(x: MatriD, to_c: Array[Int]): Double

    Compute the sum of squared errors within all clusters, where error is indicated by e.g., the distance from a point to its centroid.

    Compute the sum of squared errors within all clusters, where error is indicated by e.g., the distance from a point to its centroid.

    x

    the data matrix holding the points

    to_c

    the cluster assignments

  10. def sst(x: MatriD): Double

    Compute the sum of squares total for all the points from the mean.

    Compute the sum of squares total for all the points from the mean.

    x

    the data matrix holding the points