Cloud Dataproc Versioning

Cloud Dataproc uses images to tie together useful Google Cloud Platform connectors and Apache Spark & Apache Hadoop components into one package that can be deployed on a Cloud Dataproc cluster. These images contain the base operating system (Debian or Ubuntu) for the cluster, along with core and optional components needed to run jobs, such as Spark, Hadoop, and Hive. These images will be upgraded periodically to include new improvements and features. Cloud Dataproc versioning allows you to select sets of software versions when you create clusters.

How versioning works

When an image is created, it is given an Image Version number in the following format:

version_major.version_minor.version_sub_minor-os_distribution

The following OS distributions are currently maintained:

OS Distribution Code OS Distribution
debian9 Debian 9
ubuntu18 Ubuntu 18

See old image versions for previously supported OS distributions.

The recommended practice is to specify the major.minor image version for production environments or when compatibility with specific component versions is important. The sub-minor and OS distributions will be automatically set to the latest weekly release.

Selecting versions

When you create a new Cloud Dataproc cluster, the latest available Debian image version will be used by default. You can select a Debian or Ubuntu image version when creating a cluster (see the Cloud Dataproc Image version List). When specifying Debian-based images, you can omit the OS Distribution Code suffix, for example by specifying "1.3" to select the 1.3-debian9 image. The OS suffix must be used to select an Ubuntu-based image, for example by specifying "1.4-ubuntu18".

gcloud Command

When using the gcloud dataproc clusters create command, you can use the --image-version argument to specify an image version for the new cluster.

Debian image example:

gcloud dataproc clusters create new-cluster-name --image-version 1.4

Ubuntu image example:

gcloud dataproc clusters create new-cluster-name --image-version 1.3-ubuntu18

Best practice is to omit the sub-minor version so that the latest sub-minor version is used. However, if necessary, the sub-minor version can be specified, for example, "1.2.67".

You can check your current version with the gcloud command-line tool.

gcloud dataproc clusters describe cluster-name

REST API

You can specify the SoftwareConfig imageVersion field as part of a cluster.create API request.

Example

POST /v1/projects/project-id/regions/us-central1/clusters/
{
  "projectId": "project-id",
  "clusterName": "example-cluster",
  "config": {
    "configBucket": "",
    "gceClusterConfig": {
      "subnetworkUri": "default",
      "zoneUri": "us-central1-b"
    },
    "masterConfig": {
      ...
      }
    },
    "workerConfig": {
      ...
      }
    },
    "softwareConfig": {
      "imageVersion": "1.3"
    }
  }
}
  

Console

When creating a new cluster, click Advanced options at bottom of the Cloud Dataproc Create a cluster form.

The Image field shows the image that will be used when creating the cluster. Initially, it shows the default (latest available Debian version).

Click Change to display a lists of available images you can select to use for your cluster. You select a standard or custom image.

When new versions are created

New major versions will be created periodically to incorporate one or more of the following:

  • Major releases for:
    • Spark, Hadoop, and other Big Data components
    • Google Cloud connectors
  • Major changes or updates to Cloud Dataproc functionality

New minor versions will be created periodically to incorporate one or more of the following:

  • Minor releases and updates for:
    • Spark, Hadoop, and other Big Data components
    • Google Cloud connectors
  • Minor changes or updates to Cloud Dataproc functionality

When a new minor version is created, its Debian image becomes the default for the major version, and represents the latest release of the major version.

New sub-minor versions will be created periodically to incorporate one or more of the following:

  • Patches or fixes for a component in the image

Image Version and Cloud Dataproc support

Major and minor image versions are supported for a specified period of time after they are released. During this period, clusters using the image versions are eligible for support. After the support window has closed, clusters using the image versions are not eligible for support.

Months after Image Version release Can create clusters with this Image Version? Clusters using this Image Version eligible for support?
0-12 Yes Yes
12-24 Yes No
24+ No No

Sub-minor versions do not have guaranteed lifetimes or support.

Old Image Versions

Previously supported OS distributions

The following OS distributions were previously supported:

OS Distribution Code OS Distribution Last Released
deb8 Debian 8 October 26, 2018

Image Versions without explicit OS distribution

Prior to August 16, 2018, image versions were built with Debian 8, and omitted the OS Distribution Code. They are specified in the following format:

version_major.version_minor.version_sub_minor

0.1 and 0.2

Image versions released as alpha or beta releases prior to Cloud Dataproc version 1.0 general availability are not subject to the Cloud Dataproc support policy.

Important notes about versioning

Kunde den här sidan hjälpa dig? Berätta:

Skicka feedback om ...

Cloud Dataproc Documentation
Behöver du hjälp? Besök vår supportsida.