com.thoughtworks.datacommons.prepbuddy.api.java
Zips the other JavaTransformableRDD with this TransformableRDD and returns a new JavaTransformableRDD with current file format.
Zips the other JavaTransformableRDD with this TransformableRDD and returns a new JavaTransformableRDD with current file format. Both the JavaTransformableRDD must have same number of records
Other JavaTransformableRDD from where the columns will be added to this JavaTransformableRDD
JavaTransformableRDD
Returns Clusters that has all cluster of text of @columnIndex according to @algorithm
Returns Clusters that has all cluster of text of @columnIndex according to @algorithm
Column Index
Algorithm to be used to form clusters
Clusters
Returns a new JavaTransformableRDD containing unique duplicate records of this JavaTransformableRDD by considering all the columns as primary key.
Returns a new JavaTransformableRDD containing unique duplicate records of this JavaTransformableRDD by considering all the columns as primary key.
JavaTransformableRDD A new JavaTransformableRDD consisting unique duplicate records.
Returns a new JavaTransformableRDD containing unique duplicate records of this JavaTransformableRDD by considering the given columns as primary key.
Returns a new JavaTransformableRDD containing unique duplicate records of this JavaTransformableRDD by considering the given columns as primary key.
A list of integers specifying the columns that will be combined to create the primary key
JavaTransformableRDD A new JavaTransformableRDD consisting unique duplicate records.
Returns a new JavaTransformableRDD by dropping the @columnIndex
Returns a new JavaTransformableRDD by dropping the @columnIndex
The column that will be dropped.
JavaTransformableRDD
Returns a new JavaTransformableRDD containing unique duplicate records of this JavaTransformableRDD by considering all the columns as primary key.
Returns a new JavaTransformableRDD containing unique duplicate records of this JavaTransformableRDD by considering all the columns as primary key.
JavaTransformableRDD A new JavaTransformableRDD consisting unique duplicate records.
Returns a new JavaTransformableRDD containing unique duplicate records of this JavaTransformableRDD by considering the given columns as primary key.
Returns a new JavaTransformableRDD containing unique duplicate records of this JavaTransformableRDD by considering the given columns as primary key.
A list of integers specifying the columns that will be combined to create the primary key
JavaTransformableRDD A new JavaTransformableRDD consisting unique duplicate records.
Returns a new JavaRDD[String] containing the duplicate values at the specified column
Returns a new JavaRDD[String] containing the duplicate values at the specified column
Column where to look for duplicates
JavaRDD[String]
Returns a new JavaTransformableRDD that contains records flagged by @symbol based on the evaluation of @markerPredicate
Returns a new JavaTransformableRDD that contains records flagged by @symbol based on the evaluation of @markerPredicate
Symbol that will be used to flag
A matchInDictionary which will determine whether to flag a row or not
JavaTransformableRDD
Returns a new JavaTransformableRDD by imputing missing values of the @columnIndex using the @strategy
Returns a new JavaTransformableRDD by imputing missing values of the @columnIndex using the @strategy
Column index
Imputation strategy
JavaTransformableRDD
Returns a new JavaTransformableRDD by imputing missing values and @missingHints of the @columnIndex using the @strategy
Returns a new JavaTransformableRDD by imputing missing values and @missingHints of the @columnIndex using the @strategy
Column Index
Imputation Strategy
List of Strings that may mean empty
JavaTransformableRDD
Returns a new TextFacet containing the facets of @columnIndexes
Returns a new TextFacet containing the facets of @columnIndexes
List of column index
TextFacets
Returns a new TextFacet containing the cardinal values of @columnIndex
Returns a new TextFacet containing the cardinal values of @columnIndex
index of the column
TextFacets
Returns a new JavaTransformableRDD by applying the function on all rows marked as @flag
Returns a new JavaTransformableRDD by applying the function on all rows marked as @flag
Symbol that has been used for flagging.
Symbol column index
map function
JavaTransformableRDD
Returns a new JavaTransformableRDD by merging @columnIndexes
Returns a new JavaTransformableRDD by merging @columnIndexes
List of columns to be merged
Separator to be used to separate the merge value
false when you want to remove the column value at @column in the result TransformableRDD
JavaTransformableRDD
Returns a new JavaTransformableRDD by merging @columnIndexes with default separator
Returns a new JavaTransformableRDD by merging @columnIndexes with default separator
columnIndexes to be merged
JavaTransformableRDD
Returns a JavaDoubleRDD which is a product of the values in @firstColumn and @secondColumn
Returns a JavaDoubleRDD which is a product of the values in @firstColumn and @secondColumn
First Column Index
Second Column Index
JavaDoubleRDD
Returns a new JavaTransformableRDD by normalizing values of the given column using different Normalizers
Returns a new JavaTransformableRDD by normalizing values of the given column using different Normalizers
Column Index
Normalization Strategy
JavaTransformableRDD
Returns number of column in this rdd
Returns number of column in this rdd
Int
Generates a PivotTable by pivoting data in the pivotalColumn
Generates a PivotTable by pivoting data in the pivotalColumn
Pivotal Column
Independent Column Indexes
PivotTable
Returns a new JavaTransformableRDD containing only the elements that satisfy the matchInDictionary.
Returns a new JavaTransformableRDD containing only the elements that satisfy the matchInDictionary.
A matchInDictionary function, which gives bool value for every row.
JavaTransformableRDD
Returns a new JavaTransformableRDD by replacing the @cluster's text with specified @newValue
Returns a new JavaTransformableRDD by replacing the @cluster's text with specified @newValue
Cluster of similar values to be replaced
Value that will be used to replace all the cluster value
Column index
JavaTransformableRDD
Returns a JavaRDD of given column
Returns a JavaRDD of given column
Column index
JavaRDD[String]
Returns a new JavaTransformableRDD containing values of @columnIndexes
Returns a new JavaTransformableRDD containing values of @columnIndexes
A number of integer values specifying the columns that will be used to create the new table
JavaTransformableRDD
Returns a new JavaDoubleRDD containing smoothed values of @columnIndex using @smoothingMethod
Returns a new JavaDoubleRDD containing smoothed values of @columnIndex using @smoothingMethod
Column Index
Method that will be used for smoothing of the data
JavaDoubleRDD
Returns a new JavaTransformableRDD by splitting the @column by the delimiter provided
Returns a new JavaTransformableRDD by splitting the @column by the delimiter provided
Column index of the value to be split
delimiter or regEx that will be used to split the value @column
false when you want to remove the column value at @column in the result JavaTransformableRDD
Maximum number of split to be added to the result TransformableRDD
JavaTransformableRDD
Returns a new JavaTransformableRDD by splitting the @column by the delimiter provided
Returns a new JavaTransformableRDD by splitting the @column by the delimiter provided
Column index of the value to be split
delimiter or regEx that will be used to split the value @column
false when you want to remove the column value at @column in the result JavaTransformableRDD
JavaTransformableRDD
Returns a JavaTransformableRDD by splitting the @column according to the specified lengths
Returns a JavaTransformableRDD by splitting the @column according to the specified lengths
Column index of the value to be split
List of integers specifying the number of character each split value will contains
false when you want to remove the column value at @column in the result TransformableRDD
JavaTransformableRDD
Returns a JavaDoubleRdd of given column index
Returns a JavaDoubleRdd of given column index
Column index
JavaDoubleRDD
Returns a new JavaRDD containing the unique elements in the specified column
Returns a new JavaRDD containing the unique elements in the specified column
Column Index
JavaRDD[String]