object GapStatistic
The GapStatistic
object is used to help determine the optimal number
of clusters for a clusterer by comparing results to a reference
distribution.
-----------------------------------------------------------------------------
- See also
web.stanford.edu/~hastie/Papers/gap.pdf
- Alphabetic
- By Inheritance
- GapStatistic
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )
-
def
cumDistance(x: MatrixD, cl: Clusterer, clustr: Array[Int], k: Int): VectorD
Compute a sum of pairwise distances between points in each cluster (in one direction).
Compute a sum of pairwise distances between points in each cluster (in one direction).
- x
the vectors/points to be clustered stored as rows of a matrix
- cl
the
Clusterer
use to compute the distance metric- clustr
the cluster assignments
- k
the number of clusters
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
kMeansPP(x: MatrixD, kMax: Int, algo: Algorithm = HARTIGAN, b: Int = 1, useSVD: Boolean = true, plot: Boolean = false): (KMeansPPClusterer, Array[Int], Int)
Return a
KMeansPPClusterer
clustering on the given points with an optimal number of clustersk
chosen using the Gap statistic.Return a
KMeansPPClusterer
clustering on the given points with an optimal number of clustersk
chosen using the Gap statistic.- x
the vectors/points to be clustered stored as rows of a matrix
- kMax
the upper bound on the number of clusters
- algo
the reassignment aslgorithm used by
KMeansPlusPlusClusterer
- b
the number of reference distributions to create (default = 1)
- useSVD
use SVD to account for the shape of the points (default = true)
- plot
whether or not to plot the logs of the within-SSEs (default = false)
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
reference(x: MatrixD, useSVD: Boolean = true, stream: Int = 0): MatrixD
Compute a reference distribution based on a set of points.
Compute a reference distribution based on a set of points.
- x
the vectors/points to be clustered stored as rows of a matrix
- useSVD
use SVD to account for the shape of the points (default = true)
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )
-
def
withinSSE(x: MatrixD, cl: Clusterer, clustr: Array[Int], k: Int): Double
Compute the within sum of squared errors in terms of distances between between points within a cluster (in one direction).
Compute the within sum of squared errors in terms of distances between between points within a cluster (in one direction).
- x
the vectors/points to be clustered stored as rows of a matrix
- cl
the
Clusterer
use to compute the distance metric- clustr
the cluster assignments
- k
the number of clusters