Dataproc shared responsibility model

Running business-critical workloads on Dataproc requires multiple parties to carry different responsibilities. While not an exhaustive list, this page lists the responsibilities for Google and the customer.

Dataproc: Google responsibilities

  • Protecting the underlying infrastructure, including hardware, firmware, kernel, OS, storage, network, and more. This includes:

  • Releasing security patches for Dataproc images . This includes:

    • patches for the base operating systems included in Dataproc images (Ubuntu, Debian, and Rocky Linux)
    • patches and fixes available for the open source components included in Dataproc images
  • Providing Google Cloud integrations for Connect, Identity and Access Management, Cloud Audit Logs, Cloud Key Management Service, Security Command Center, and others.

  • Restricting and logging Google administrative access to customer clusters for contractual support purposes with Access Transparency and Access Approval

  • Recommending best practices for configuring Dataproc and the open source components included in Dataproc images

Dataproc: Customer responsibilities

  • Maintaining your workloads, including your application code, custom images, data, IAM policy, and clusters that you run

  • Running clusters on up-to-date Dataproc images by leveraging the latest subminor image version, promptly refreshing your custom images, and migrating to the most recent minor image version as soon as it is feasible. Image metadata includes a previous-subminor label, which is set to true if the cluster is not using the latest subminor image version. For information on how to view image metadata, see Important notes about versioning.

  • Providing Google with environmental details when requested for troubleshooting purposes

  • Following best practices for the configuration of Dataproc and other Google Cloud services, and for the configuration of open source components included in Dataproc images