com.thoughtworks.datacommons.prepbuddy.rdds
Zips the other TransformableRDD with this TransformableRDD and returns a new TransformableRDD with current file format.
Zips the other TransformableRDD with this TransformableRDD and returns a new TransformableRDD with current file format. Both the TransformableRDD must have same number of records
Other TransformableRDD from where the columns will be added to this TransformableRDD
TransformableRDD
Returns Clusters that has all cluster of text of @columnIndex according to @algorithm
Returns Clusters that has all cluster of text of @columnIndex according to @algorithm
Column Index
Algorithm to be used to form clusters
Clusters
Returns a new TransformableRDD containing unique duplicate records of this TransformableRDD by considering the given columns as primary key.
Returns a new TransformableRDD containing unique duplicate records of this TransformableRDD by considering the given columns as primary key.
A list of integers specifying the columns that will be combined to create the primary key
TransformableRDD A new TransformableRDD consisting unique duplicate records.
Returns a new TransformableRDD containing unique duplicate records of this TransformableRDD by considering all the columns as primary key.
Returns a new TransformableRDD containing unique duplicate records of this TransformableRDD by considering all the columns as primary key.
TransformableRDD A new TransformableRDD consisting unique duplicate records.
Returns a new TransformableRDD by dropping the @columnIndex
Returns a new TransformableRDD by dropping the @columnIndex
The column that will be dropped.
TransformableRDD
Returns a new TransformableRDD containing unique duplicate records of this TransformableRDD by considering the given columns as primary key.
Returns a new TransformableRDD containing unique duplicate records of this TransformableRDD by considering the given columns as primary key.
A list of integers specifying the columns that will be combined to create the primary key
TransformableRDD A new TransformableRDD consisting unique duplicate records.
Returns a new TransformableRDD containing duplicate records of this TransformableRDD by considering all the columns as primary key.
Returns a new TransformableRDD containing duplicate records of this TransformableRDD by considering all the columns as primary key.
TransformableRDD A new TransformableRDD consisting unique duplicate records.
Returns a new RDD containing the duplicate values at the specified column
Returns a new RDD containing the duplicate values at the specified column
Column where to look for duplicates
RDD
Returns a new TransformableRDD that contains records flagged by @symbol based on the evaluation of @markerPredicate
Returns a new TransformableRDD that contains records flagged by @symbol based on the evaluation of @markerPredicate
Symbol that will be used to flag
A matchInDictionary which will determine whether to flag a row or not
TransformableRDD
Returns a new TransformableRDD by imputing missing values and @missingHints of the @columnIndex using the @strategy
Returns a new TransformableRDD by imputing missing values and @missingHints of the @columnIndex using the @strategy
Column Index
Imputation Strategy
List of Strings that may mean empty
TransformableRDD
Returns a new TransformableRDD by imputing missing values of the @columnIndex using the @strategy
Returns a new TransformableRDD by imputing missing values of the @columnIndex using the @strategy
Column index
Imputation strategy
TransformableRDD
Returns inferred DataType of @columnIndex
Returns inferred DataType of @columnIndex
Column Index on which type will be infered
DataType
Returns a new TextFacet containing the facets of @columnIndexes
Returns a new TextFacet containing the facets of @columnIndexes
List of column index
TextFacets
Returns a new TextFacet containing the cardinal values of @columnIndex
Returns a new TextFacet containing the cardinal values of @columnIndex
index of the column
TextFacets
Returns a new TransformableRDD by applying the function on all rows marked as @flag
Returns a new TransformableRDD by applying the function on all rows marked as @flag
Symbol that has been used for flagging.
Symbol column index
map function
TransformableRDD
Returns a new TransformableRDD by merging 2 or more columns together
Returns a new TransformableRDD by merging 2 or more columns together
List of columns to be merged
Separator to be used to separate the merge value
false when you want to remove the column value at @column in the result TransformableRDD
TransformableRDD
Returns a RDD of double which is a product of the values in @firstColumn and @secondColumn
Returns a RDD of double which is a product of the values in @firstColumn and @secondColumn
First Column Index
Second Column Index
RDD[Double]
Returns a new TransformableRDD by normalizing values of the given column using different Normalizers
Returns a new TransformableRDD by normalizing values of the given column using different Normalizers
Column Index
Normalization Strategy
TransformableRDD
Returns number of column in this rdd
Returns number of column in this rdd
int
Generates a PivotTable by pivoting data in the pivotalColumn
Generates a PivotTable by pivoting data in the pivotalColumn
Pivotal Column
Independent Column Indexes
PivotTable
Returns a new TransformableRDD containing only the elements that satisfy the matchInDictionary.
Returns a new TransformableRDD containing only the elements that satisfy the matchInDictionary.
A matchInDictionary function, which gives bool value for every row.
TransformableRDD
Returns a new TransformableRDD by replacing the @cluster's text with specified @newValue
Returns a new TransformableRDD by replacing the @cluster's text with specified @newValue
Cluster of similar values to be replaced
Value that will be used to replace all the cluster value
Column index
TransformableRDD
Returns a List of some elements of @columnIndex
Returns a List of some elements of @columnIndex
column Index for the sample
List[String]
Returns a new TransformableRDD containing values of @columnIndexes
Returns a new TransformableRDD containing values of @columnIndexes
A number of integer values specifying the columns that will be used to create the new table
TransformableRDD
Returns a RDD of given column
Returns a RDD of given column
Column index
RDD[String]
Returns a new RDD containing smoothed values of @columnIndex using @smoothingMethod
Returns a new RDD containing smoothed values of @columnIndex using @smoothingMethod
Column Index
Method that will be used for smoothing of the data
RDD<Double>
Returns a new TransformableRDD by splitting the @column by the delimiter provided
Returns a new TransformableRDD by splitting the @column by the delimiter provided
Column index of the value to be split
delimiter or regEx that will be used to split the value @column
false when you want to remove the column value at @column in the result TransformableRDD
Maximum number of split to be added to the result TransformableRDD
TransformableRDD
Returns a TransformableRDD by splitting the @column according to the specified lengths
Returns a TransformableRDD by splitting the @column according to the specified lengths
Column index of the value to be split
List of integers specifying the number of character each split value will contains
false when you want to remove the column value at @column in the result TransformableRDD
TransformableRDD
Returns a double RDD of given column index
Returns a double RDD of given column index
Column index
RDD[Double]
Returns RDD
Returns a new TransformableRDD containing the unique elements in the specified column
Returns a new TransformableRDD containing the unique elements in the specified column
Column Index
RDD<String>
(Since version 1.0.0) use mapPartitionsWithIndex and filter
(Since version 1.0.0) use mapPartitionsWithIndex and flatMap
(Since version 1.0.0) use mapPartitionsWithIndex and foreach
(Since version 1.2.0) use TaskContext.get
(Since version 0.7.0) use mapPartitionsWithIndex
(Since version 1.0.0) use mapPartitionsWithIndex
(Since version 1.0.0) use collect