Google Cloud wants to empower data scientists with the release of an alpha version of Cloud Dataproc for Kubernetes. 

Google Cloud Dataproc provides “provides open source data and analytic processing for data engineers and data scientists who need to process data and train models faster at scale.” The company hopes the solution will empower data scientists to put more of a focus on workloads than infrastructure by combining cloud and open source.

Cloud Dataproc for Kubernetes will bring enterprise-grade support, management, and security of Apache Spark jobs that are running on GKE clusters, the company explained. In addition to Apache Spark, the company plans to bring other open-source processing engines to Cloud Dataproc on Kubernetes. 

The solution will provide data professionals with a single central view that spans both Kubernetes and YARN cluster management systems. 

Data scientists can also move models and ETL pipelines from development into production without having to consider compatibility. 

Finally, data scientists using this solution won’t have to worry about sizing and building clusters, manipulating Docker files, or playing with Kubernetes networking configuration. 

“Open source has always been a core pillar of Google Cloud’s data and analytics strategy. As we continue to work with the community to set industry standards, we continue to integrate those standards into our services so organizations around the world can unlock the value of data faster,” wrote Christopher Crosbie, and James Malone, product managers at Google Cloud.