View on GitHub

Data Commons

Enabling working with data at scale

The Data Commons Project

Data commons community is a group of passionate data engineers and data scientists at ThoughtWorks. Our goal is to provide a collection of rich, high-performance libraries to automate various data processing tasks at scale. We are currently building these tools to work with Apache Spark platform.

Active Projects

We are building a set of scalable, high-performance libraries that address an array of data processing concerns such as data quality assurance, data preparation for machine learning, data anonymization and data security. Here is a list of currently active projects

Support or Contact

Catch up with us at our google group