TightClusterer

scalation.modeling.clustering.TightClusterer
class TightClusterer(x: MatrixD, k0: Int, kmin: Int, s: Int)

The TightClusterer class uses tight clustering to eliminate points that do not not fit well in any cluster.

Value parameters

k0

the number of clusters to make

kmin

the minimum number of clusters to make

s

the random number stream (to vary the clusters made)

x

the vectors/points to be clustered stored as rows of a matrix

Attributes

Graph
Supertypes
class Object
trait Matchable
class Any

Members list

Value members

Concrete methods

def cluster(): ArrayBuffer[Set[Int]]

Given a set of points/vectors, put them in clusters, returning the cluster assignment vector. A basic goal is to minimize the sum of the distances between points within each cluster.

Given a set of points/vectors, put them in clusters, returning the cluster assignment vector. A basic goal is to minimize the sum of the distances between points within each cluster.

Attributes

Compute the mean comembership matrix by averaging results from several subsamples.

Compute the mean comembership matrix by averaging results from several subsamples.

Attributes

def createSubsample(): (MatrixD, Array[Int])

Create a new random subsample.

Create a new random subsample.

Attributes

def findStable(topClubs: Array[ArrayBuffer[Set[Int]]]): (Int, Set[Int])

Find a the first tight and stable cluster from the top candidate clubs. To be stable, a club must have a similar club at the next level (next k value).

Find a the first tight and stable cluster from the top candidate clubs. To be stable, a club must have a similar club at the next level (next k value).

Value parameters

topClubs

the top clubs for each level to be search for stable clusters

Attributes

def formCandidateClusters(md: MatrixD): ArrayBuffer[Set[Int]]

Form candidate clusters by collecting points with high average comembership scores together in clusters (clubs).

Form candidate clusters by collecting points with high average comembership scores together in clusters (clubs).

Value parameters

md

the mean comembership matrix

Attributes

def orderBySize(clubs: ArrayBuffer[Set[Int]]): Array[Int]

Order the clubs (candidate clusters) by size, returning the rank order (largest first).

Order the clubs (candidate clusters) by size, returning the rank order (largest first).

Value parameters

clubs

the candidate clusters

Attributes

def pickTopQ(clubs: ArrayBuffer[Set[Int]], order: Array[Int]): ArrayBuffer[Set[Int]]

Pick the top q clubs based on club size.

Pick the top q clubs based on club size.

Value parameters

clubs

all the clubs (candidate clusters)

order

the rank order (by club size) of all the clubs

Attributes

def selectCandidateClusters(k: Int): (ArrayBuffer[Set[Int]], Array[Int])

Select candidates for tight clusters in the K-means algorithm for a given number of clusters 'k'. This corresponds to Algorithm A in the paper/URL.

Select candidates for tight clusters in the K-means algorithm for a given number of clusters 'k'. This corresponds to Algorithm A in the paper/URL.

Value parameters

k

the number of clusters

Attributes

def sim(c1: Set[Int], c2: Set[Int]): Double

Compute the similarity of two clubs as the ratio of the size of their intersection to their union.

Compute the similarity of two clubs as the ratio of the size of their intersection to their union.

Value parameters

c1

the first club

c2

the second club

Attributes