Google Cloud Platform
Cloud Dataproc is now even faster and easier to use for running Apache Spark and Apache Hadoop
Since its initial release, Cloud Dataproc has given users a faster, easier and more cost-effective way to run Apache Spark, Apache Hadoop and other components in the open source data processing ecosystem, bringing down the traditional barriers to success with those platforms. With 90-second cluster spin-up time, per-minute billing and fully-managed infrastructure, Cloud Dataproc helps you re-think how you do operations.
Over the past few weeks, we’ve done several releases — including component updates and new features, fixes and API changes — which collectively add yet more performance, ease-of-use and efficiency to the user experience, as well as provide access to the latest innovations from the open source community.
Software component updates
- Apache Spark has been updated to 2.2.0 (upstream current).
- Apache Hadoop has been updated to 2.8.0 (upstream current).
- The default security (SSL) provider used by the Cloud Storage connector has been changed to one based on Conscrypt. This change should more efficiently utilize the CPU for SSL operations. In many cases, this change should result in better read and write performance between Cloud Dataproc and Cloud Storage.
- The reported block size for Cloud Storage is now 128MB.
- Memory configuration for Hadoop and Spark have both been adjusted to improve performance and stability.