Running business-critical workloads on Dataproc requires multiple parties to carry different responsibilities. While not an exhaustive list, this page lists the responsibilities for Google and the customer.
Dataproc: Google responsibilities
Protecting the underlying infrastructure, including hardware, firmware, kernel, OS, storage, network, and more. This includes:
- encrypting data at rest by default
- providing additional customer-managed disk encryption
- encrypting data in transit
- using custom-designed hardware
- laying private network cables
- protecting data centers from physical access
- protecting the bootloader and kernel against modification using Shielded Nodes
- providing network protection with VPC Service Controls
- following secure software development practices
Releasing security patches for Dataproc images . This includes:
- patches for the base operating systems included in Dataproc images (Ubuntu, Debian, and Rocky Linux)
- patches and fixes available for the open source components included in Dataproc images
Providing Google Cloud integrations for Connect, Identity and Access Management, Cloud Audit Logs, Cloud Key Management Service, Security Command Center, and others.
Restricting and logging Google administrative access to customer clusters for contractual support purposes with Access Transparency and Access Approval
Recommending best practices for configuring Dataproc and the open source components included in Dataproc images
Dataproc: Customer responsibilities
Maintaining your workloads, including your application code, custom images, data, IAM policy, and clusters that you run
Running clusters on up-to-date Dataproc images by leveraging the latest subminor image version, promptly refreshing your custom images, and migrating to the most recent minor image version as soon as it is feasible. Image metadata includes a
previous-subminor
label, which is set totrue
if the cluster is not using the latest subminor image version. For information on how to view image metadata, see Important notes about versioning.Providing Google with environmental details when requested for troubleshooting purposes
Following best practices for the configuration of Dataproc and other Google Cloud services, and for the configuration of open source components included in Dataproc images