GapStatistic

scalation.modeling.clustering.GapStatistic
object GapStatistic

Attributes

See also

web.stanford.edu/~hastie/Papers/gap.pdf

Graph
Supertypes
class Object
trait Matchable
class Any
Self type

Members list

Value members

Concrete methods

def cumDistance(x: MatrixD, cl: Clusterer, clustr: Array[Int], k: Int): VectorD

Compute a sum of pairwise distances between points in each cluster (in one direction).

Compute a sum of pairwise distances between points in each cluster (in one direction).

Value parameters

cl

the Clusterer use to compute the distance metric

clustr

the cluster assignments

k

the number of clusters

x

the vectors/points to be clustered stored as rows of a matrix

Attributes

def kMeansPP(x: MatrixD, kMax: Int, algo: Algorithm, b: Int, useSVD: Boolean, plot: Boolean): (KMeansPPClusterer, Array[Int], Int)

Return a KMeansPPClusterer clustering on the given points with an optimal number of clusters k chosen using the Gap statistic.

Return a KMeansPPClusterer clustering on the given points with an optimal number of clusters k chosen using the Gap statistic.

Value parameters

algo

the reassignment aslgorithm used by KMeansPPClusterer

b

the number of reference distributions to create (default = 1)

kMax

the upper bound on the number of clusters

plot

whether or not to plot the logs of the within-SSEs (default = false)

useSVD

use SVD to account for the shape of the points (default = true)

x

the vectors/points to be clustered stored as rows of a matrix

Attributes

def reference(x: MatrixD, useSVD: Boolean, stream: Int): MatrixD

Compute a reference distribution based on a set of points.

Compute a reference distribution based on a set of points.

Value parameters

s

the random number stream (to vary the clusters made)

useSVD

use SVD to account for the shape of the points (default = true)

x

the vectors/points to be clustered stored as rows of a matrix

Attributes

def withinSSE(x: MatrixD, cl: Clusterer, clustr: Array[Int], k: Int): Double

Compute the within sum of squared errors in terms of distances between between points within a cluster (in one direction).

Compute the within sum of squared errors in terms of distances between between points within a cluster (in one direction).

Value parameters

cl

the Clusterer use to compute the distance metric

clustr

the cluster assignments

k

the number of clusters

x

the vectors/points to be clustered stored as rows of a matrix

Attributes