Prep-Buddy
A library to clean, transform and prepare data at scale using Apache Spark
A Scala / Java / Python library for cleaning, transforming and executing other preparation tasks for large datasets on Apache Spark.
It is currently maintained by a team of developers from ThoughtWorks.
Our aim is to provide a set of algorithms for cleaning and transforming very large data sets, inspired by predecessors such as Open Refine, Pandas and Scikit-learn packages.