com.thoughtworks.datacommons.prepbuddy

clusterers

package clusterers

Visibility
  1. Public
  2. All

Type Members

  1. class Cluster extends Serializable

    Cluster contains groups of values by their specified key

  2. trait ClusteringAlgorithm extends Serializable

    ClusteringAlgorithm is for implementing the algorithm which can be use to clustering the column value

  3. class Clusters extends AnyRef

    Clusters is a collection of cluster of a column.

  4. abstract class FingerprintAlgorithm extends ClusteringAlgorithm

  5. class LevenshteinDistance extends ClusteringAlgorithm

    This algorithm treats cardinal value as a key but grouped those values together whom Lenenshtein Distance is less than 4.

  6. class NGramFingerprintAlgorithm extends FingerprintAlgorithm

    This algorithm generates a key using N Gram Fingerprint Algorithm for every cardinal value (facet) in column and add them to the Cluster.

  7. class SimpleFingerprintAlgorithm extends FingerprintAlgorithm with Serializable

    This algorithm generates a key using Simple Fingerprint Algorithm for every cardinal value (facet) in column and add them to the Cluster.

  8. class TextFacets extends AnyRef

    TextFacets is a collection of unique strings and the number of times the string appears in a column

Ungrouped