com.thoughtworks.datacommons.prepbuddy
Cluster contains groups of values by their specified key
ClusteringAlgorithm is for implementing the algorithm which can be use to clustering the column value
Clusters is a collection of cluster of a column.
This algorithm treats cardinal value as a key but grouped those values together whom Lenenshtein Distance is less than 4.
This algorithm generates a key using N Gram Fingerprint Algorithm for every cardinal value (facet) in column and add them to the Cluster.
This algorithm generates a key using Simple Fingerprint Algorithm for every cardinal value (facet) in column and add them to the Cluster.
TextFacets is a collection of unique strings and the number of times the string appears in a column