Cloud Dataproc Versioning

Google Cloud Dataproc uses images to tie together useful Google Cloud Platform connectors and Spark & Hadoop components into one package that can be deployed on a Cloud Dataproc cluster. These images contain the base operating system (Debian) for the cluster, along with components needed to run jobs, such as Spark, Hadoop, Hive, and so on. These images will be upgraded periodically to include new improvements and features. Cloud Dataproc versioning allows you to select sets of software versions when you create clusters.

How versioning works

When an image is created, it is given an Image Version number. Image version numbers use the following format:

version_major.version_minor.version_patch

See Cloud Dataproc version list for a list of all versions.

Selecting versions

When you create a new Cloud Dataproc cluster, the latest available image version will be used by default. You can select an image version when creating a new cluster.

Google Cloud Platform Console

When creating a new cluster in the Google Cloud Platform Console you can use the Image version field. This field appears in the expander at the bottom of the form to create a new cluster.

This field shows all available image versions which can be used with your cluster.

When you submit the form, your cluster will be created using the specified version. If you do not select a version, the latest available version will be used.

Google Cloud SDK

When using the gcloud dataproc clusters create command you can use the --image-version argument to specify an image version. For example, you can run the following command to create a new my-test-cluster that uses the current patch release of image version 1.0:

gcloud dataproc clusters create my-test-cluster --image-version 1.0

Best practice is to specify the major and minor version only, so the latest patch version is always used. However, if necessary, the patch version can be specified as well.

You can check your current version via the gcloud command-line tool or the API:

gcloud dataproc clusters describe

When new versions are created

New major versions will be created periodically to incorporate one or more of the following:

  • Major releases for:
    • Operating system
    • Spark, Hadoop, and other Big Data components
    • Google Cloud connectors
  • Major changes or updates to Cloud Dataproc functionality

New minor versions will be created periodically to incorporate one or more of the following:

  • Minor releases and updates for:
    • Operating system
    • Spark, Hadoop, and other Big Data components
    • Google Cloud connectors
  • Minor changes or updates to Cloud Dataproc functionality

When a new minor version is created, it becomes the default for the major version, and represents the latest release of the major version.

New patch versions will be created periodically to incorporate one or more of the following:

  • Patches or fixes for a component in the image

Image Version and Cloud Dataproc support

Major and minor image versions are supported for a specified period of time after they are released. During this period, clusters using the Image Versions are eligible for support. After the support window has closed, clusters using the Image Version are not eligible for support.

Months after Image Version release Can create clusters with this Image Version? Clusters using this Image Version eligible for support?
0-12 Yes Yes
12-18 Yes No
18+ No No

Patch versions do not have guaranteed lifetimes or support.

Important notes about versioning

Send feedback about...

Google Cloud Dataproc Documentation