c

scalation.analytics.clusterer

KMeansClustererPP

class KMeansClustererPP extends KMeansClustererHW

The KMeansClustererPP class cluster several vectors/points using the Hartigan-Wong algorithm.

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. KMeansClustererPP
  2. KMeansClustererHW
  3. KMeansClusterer
  4. Error
  5. Clusterer
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new KMeansClustererPP(x: MatriD, k: Int, flags: Array[Boolean] = Array (false, false))

    x

    the vectors/points to be clustered stored as rows of a matrix

    k

    the number of clusters to make

    flags

    the flags used to adjust the algorithm

Value Members

  1. def calcCentroids(x: MatriD, to_c: Array[Int], sz: VectoI, cent: MatriD): Unit

    Calculate the centroids based on current assignment of points to clusters and update the 'cent' matrix that stores the centroids in its rows.

    Calculate the centroids based on current assignment of points to clusters and update the 'cent' matrix that stores the centroids in its rows.

    x

    the data matrix holding the points {x_i = x(i)} in its rows

    to_c

    the cluster assignment array

    sz

    the sizes of the clusters (number of points)

    cent

    the matrix holding the centroids in its rows

    Definition Classes
    Clusterer
  2. def centroids: MatriD

    Return the centroids.

    Return the centroids. Should only be called after train.

    Definition Classes
    KMeansClustererClusterer
  3. def checkOpt(x: MatriD, to_c: Array[Int], opt: Double): Boolean

    Check to see if the sum of squared errors is optimum.

    Check to see if the sum of squared errors is optimum.

    x

    the data matrix holding the points

    to_c

    the cluster assignments

    opt

    the known (from human/oracle) optimum

    Definition Classes
    Clusterer
  4. def classify(z: VectoD): Int

    Given a new point/vector 'z', determine which cluster it belongs to, i.e., the cluster whose centroid it is closest to.

    Given a new point/vector 'z', determine which cluster it belongs to, i.e., the cluster whose centroid it is closest to.

    z

    the vector to classify

    Definition Classes
    KMeansClustererClusterer
  5. def cluster: Array[Int]

    Return the cluster assignment vector.

    Return the cluster assignment vector. Should only be called after train.

    Definition Classes
    KMeansClustererClusterer
  6. def csize: VectoI

    Return the sizes of the centroids.

    Return the sizes of the centroids. Should only be called after train.

    Definition Classes
    KMeansClustererClusterer
  7. def distance(u: VectoD, cn: MatriD, kc_: Int = -1): VectoD

    Compute the distances between vector/point 'u' and the points stored as rows in matrix 'cn'

    Compute the distances between vector/point 'u' and the points stored as rows in matrix 'cn'

    u

    the given vector/point (u = x_i)

    cn

    the matrix holding several centroids

    kc_

    the number of centroids so far

    Definition Classes
    Clusterer
  8. def distance2(u: VectoD, cent: MatriD, cc: Int): VectoD

    Compute the adjusted distance to point 'u' according to the R2 value described in the Hartigan-Wong algorithm.

    Compute the adjusted distance to point 'u' according to the R2 value described in the Hartigan-Wong algorithm.

    u

    the point in question

    cent

    the matrix holding the centroids

    cc

    the current cluster for point u

    Definition Classes
    KMeansClustererHW
  9. val flags: Array[Boolean]
    Definition Classes
    KMeansClusterer
  10. final def flaw(method: String, message: String): Unit
    Definition Classes
    Error
  11. def initCentroids(): Boolean

    Initialize the centroids according to the k-means++ technique.

    Initialize the centroids according to the k-means++ technique.

    Definition Classes
    KMeansClustererPPClusterer
  12. def name(c: Int): String

    Return the name of the 'c'-th cluster.

    Return the name of the 'c'-th cluster.

    c

    the c-th cluster

    Definition Classes
    Clusterer
  13. def name_(nm: Strings): Unit

    Set the names for the clusters.

    Set the names for the clusters.

    nm

    the array of names

    Definition Classes
    Clusterer
  14. def setStream(s: Int): Unit

    Set the random stream to 's'.

    Set the random stream to 's'. Method must be called in implemeting classes before creating any random generators.

    s

    the new value for the random number stream

    Definition Classes
    Clusterer
  15. def show(l: Int): Unit

    Show the state of the algorithm at iteration 'l'.

    Show the state of the algorithm at iteration 'l'.

    l

    the current iteration

    Definition Classes
    KMeansClusterer
  16. def sse(x: MatriD, c: Int, to_c: Array[Int]): Double

    Compute the sum of squared errors from the points in cluster 'c' to the cluster's centroid.

    Compute the sum of squared errors from the points in cluster 'c' to the cluster's centroid.

    x

    the data matrix holding the points

    c

    the current cluster

    to_c

    the cluster assignments

    Definition Classes
    Clusterer
  17. def sse(x: MatriD, to_c: Array[Int]): Double

    Compute the sum of squared errors within all clusters, where error is indicated by e.g., the distance from a point to its centroid.

    Compute the sum of squared errors within all clusters, where error is indicated by e.g., the distance from a point to its centroid.

    x

    the data matrix holding the points

    to_c

    the cluster assignments

    Definition Classes
    Clusterer
  18. def sst(x: MatriD): Double

    Compute the sum of squares total for all the points from the mean.

    Compute the sum of squares total for all the points from the mean.

    x

    the data matrix holding the points

    Definition Classes
    Clusterer
  19. def train(): KMeansClusterer

    Iteratively recompute clusters until the assignment of points does not change.

    Iteratively recompute clusters until the assignment of points does not change. Initialize by randomly assigning points to 'k' clusters.

    Definition Classes
    KMeansClustererClusterer
  20. def update_pmf(c: Int): Discrete

    Update the probability mass function (pmf) used for picking the next centroid.

    Update the probability mass function (pmf) used for picking the next centroid. The farther 'x_i' is from any existing centroid, the higher its probability. Return the corresponding distance-derived random variate generator.

    c

    the current centroid index