Packages

object GapStatistic

The GapStatistic object is used to help determine the optimal number of clusters for a clusterer by comparing results to a reference distribution. -----------------------------------------------------------------------------

See also

web.stanford.edu/~hastie/Papers/gap.pdf

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. GapStatistic
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @native() @throws( ... )
  6. def cumDistance(x: MatrixD, cl: Clusterer, clustr: Array[Int], k: Int): VectorD

    Compute a sum of pairwise distances between points in each cluster (in one direction).

    Compute a sum of pairwise distances between points in each cluster (in one direction).

    x

    the vectors/points to be clustered stored as rows of a matrix

    cl

    the Clusterer use to compute the distance metric

    clustr

    the cluster assignments

    k

    the number of clusters

  7. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  8. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  9. def finalize(): Unit
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  10. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  11. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  12. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  13. def kMeansPP(x: MatrixD, kMax: Int, algo: Algorithm = HARTIGAN, b: Int = 1, useSVD: Boolean = true, plot: Boolean = false): (KMeansPPClusterer, Array[Int], Int)

    Return a KMeansPPClusterer clustering on the given points with an optimal number of clusters k chosen using the Gap statistic.

    Return a KMeansPPClusterer clustering on the given points with an optimal number of clusters k chosen using the Gap statistic.

    x

    the vectors/points to be clustered stored as rows of a matrix

    kMax

    the upper bound on the number of clusters

    algo

    the reassignment aslgorithm used by KMeansPlusPlusClusterer

    b

    the number of reference distributions to create (default = 1)

    useSVD

    use SVD to account for the shape of the points (default = true)

    plot

    whether or not to plot the logs of the within-SSEs (default = false)

  14. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  15. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  16. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  17. def reference(x: MatrixD, useSVD: Boolean = true, stream: Int = 0): MatrixD

    Compute a reference distribution based on a set of points.

    Compute a reference distribution based on a set of points.

    x

    the vectors/points to be clustered stored as rows of a matrix

    useSVD

    use SVD to account for the shape of the points (default = true)

  18. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  19. def toString(): String
    Definition Classes
    AnyRef → Any
  20. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  21. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  22. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @throws( ... )
  23. def withinSSE(x: MatrixD, cl: Clusterer, clustr: Array[Int], k: Int): Double

    Compute the within sum of squared errors in terms of distances between between points within a cluster (in one direction).

    Compute the within sum of squared errors in terms of distances between between points within a cluster (in one direction).

    x

    the vectors/points to be clustered stored as rows of a matrix

    cl

    the Clusterer use to compute the distance metric

    clustr

    the cluster assignments

    k

    the number of clusters

Inherited from AnyRef

Inherited from Any

Ungrouped