Cloud Dataproc uses images to tie together useful Google Cloud Platform connectors and Apache Spark & Apache Hadoop components into one package that can be deployed on a Cloud Dataproc cluster. These images contain the base operating system (Debian) for the cluster, along with components needed to run jobs, such as Spark, Hadoop, Hive, and so on. These images will be upgraded periodically to include new improvements and features. Cloud Dataproc versioning allows you to select sets of software versions when you create clusters.
How versioning works
When an image is created, it is given an Image Version number in the following format:
The following OS distributions are currently maintained:
|OS Distribution Code||OS Distribution|
See old image versions for previously supported OS distributions.
The recommended practice is to specify the
version for production environments or when compatibility with specific component
versions is important. The sub-minor and OS distributions will be automatically
set to the latest weekly release.
When you create a new Cloud Dataproc cluster, the latest available image version will be used by default. You can select an image version when creating a new cluster.
When using the
gcloud dataproc clusters create command, you can
--image-version argument to specify an image version. For
example, you can run the following command to create a new
my-test-cluster that uses the current sub-minor release of image
When using the
gcloud dataproc clusters create my-test-cluster --image-version 1.0
Best practice is to specify the major and minor version only, so that the latest sub-minor version is always used. However, if necessary, the sub-minor version can be specified as well.
You can check your current version with the
gcloud command-line tool.
gcloud dataproc clusters describe cluster-name
When creating a new cluster, click Advanced options at bottom of the Cloud Dataproc Create a cluster form.
The Image field shows the image that will be used when creating the cluster. Initially, it shows the default (latest available version).
Click Change to display a lists of available images you can select to use for your cluster. You select a standard or custom image.