Stay organized with collections
Save and categorize content based on your preferences.
This document describes how to create a Dataproc zero-scale cluster.
Dataproc zero-scale clusters provide a cost-effective way to use
Dataproc clusters. Unlike
standard Dataproc clusters
that require at least two primary workers, Dataproc zero-scale clusters
use only secondary workers
that can be scaled down to zero.
Dataproc zero-scale clusters are ideal for use as long-running clusters
that experience idle periods, such as a cluster that hosts a Jupiter notebook.
They provide improved resource utilization through the use of zero-scale
autoscaling policies.
Characteristics and limitations
A Dataproc zero-scale cluster shares similarities with a standard
cluster, but has the following unique characteristics and limitations:
Requires image version 2.2.53 or later.
Supports only secondary workers, not primary workers.
Includes services such as YARN, but doesn't support the HDFS file system.
To use Cloud Storage as the default file system, set the
core:fs.defaultFS cluster property to a Cloud Storage bucket location
(gs://BUCKET_NAME).
If you disable a component during cluster creation, also
disable HDFS.
Can't be converted to or from a standard cluster.
Requires an autoscaling policy for ZERO_SCALE cluster types.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[],[],null,["| **Preview**\n|\n|\n| This product or feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA products and features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n| [launch stage descriptions](/products#product-launch-stages).\n\nThis document describes how to create a Dataproc zero-scale cluster.\n\nDataproc zero-scale clusters provide a cost-effective way to use\nDataproc clusters. Unlike\n[standard Dataproc clusters](/dataproc/docs/guides/create-cluster)\nthat require at least two primary workers, Dataproc zero-scale clusters\nuse only [secondary workers](/dataproc/docs/concepts/compute/secondary-vms)\nthat can be scaled down to zero.\n\nDataproc zero-scale clusters are ideal for use as long-running clusters\nthat experience idle periods, such as a cluster that hosts a Jupiter notebook.\nThey provide improved resource utilization through the use of zero-scale\nautoscaling policies.\n\nCharacteristics and limitations\n\nA Dataproc zero-scale cluster shares similarities with a standard\ncluster, but has the following unique characteristics and limitations:\n\n- Requires image version `2.2.53` or later.\n- Supports only secondary workers, not primary workers.\n- Includes services such as YARN, but doesn't support the HDFS file system.\n\n - To use Cloud Storage as the default file system, set the `core:fs.defaultFS` cluster property to a Cloud Storage bucket location (`gs://`\u003cvar translate=\"no\"\u003eBUCKET_NAME\u003c/var\u003e).\n - If you disable a component during cluster creation, also disable HDFS.\n- Can't be converted to or from a standard cluster.\n\n- Requires an autoscaling policy for `ZERO_SCALE` cluster types.\n\n- Requires selecting\n [flexible VMs](/dataproc/docs/concepts/configuring-clusters/flexible-vms#how_to_request_flexible_vms)\n as machine type.\n\n- Doesn't support the Oozie component.\n\n- Can't be created from the Google Cloud console.\n\nOptional: Configure an autoscaling policy\n\nYou can configure an autoscaling policy to define secondary working scaling for\na zero-scale cluster. When doing so, note the following:\n\n- Set the cluster type to `ZERO_SCALE`.\n- Configure an autoscaling policy to the secondary worker config only.\n\nFor more information, see\n[Create an autoscaling policy](/dataproc/docs/concepts/configuring-clusters/autoscaling#create_an_autoscaling_policy).\n\nCreate a Dataproc zero-scale cluster\n\nCreate a zero-scale cluster using the gcloud CLI or\nthe Dataproc API.\n**Note:** When selecting a machine type for zero-scale clusters, use [flexible VMs](/dataproc/docs/concepts/configuring-clusters/flexible-vms#how_to_request_flexible_vms). \n\ngcloud\n\nRun\n[`gcloud dataproc clusters create`](/sdk/gcloud/reference/dataproc/clusters/create)\ncommand locally in a terminal window or in\n[Cloud Shell](https://console.cloud.google.com/?cloudshell=true%22). \n\n gcloud dataproc clusters create \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e \\\n --region=\u003cvar translate=\"no\"\u003eREGION\u003c/var\u003e \\\n --cluster-type=zero-scale \\\n --autoscaling-policy=\u003cvar translate=\"no\"\u003eAUTOSCALING_POLICY\u003c/var\u003e \\\n --properties=core:fs.defaultFS=gs://\u003cvar translate=\"no\"\u003eBUCKET_NAME\u003c/var\u003e \\\n --secondary-worker-machine-types=\"type=\u003cvar translate=\"no\"\u003eMACHINE_TYPE1\u003c/var\u003e[,type=\u003cvar translate=\"no\"\u003eMACHINE_TYPE2\u003c/var\u003e...][,rank=\u003cvar translate=\"no\"\u003eRANK\u003c/var\u003e]\"\n ...other args\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e: name of the Dataproc zero-scale cluster.\n- \u003cvar translate=\"no\"\u003eREGION\u003c/var\u003e: an [available Compute Engine region](/compute/docs/regions-zones#available).\n- \u003cvar translate=\"no\"\u003eAUTOSCALING_POLICY\u003c/var\u003e: the ID or resource URI of the autoscaling policy.\n- \u003cvar translate=\"no\"\u003eBUCKET_NAME\u003c/var\u003e: name of your Cloud Storage bucket.\n- \u003cvar translate=\"no\"\u003eMACHINE_TYPE\u003c/var\u003e: specific Compute Engine machine type, such as `n1-standard-4`, `e2-standard-8`.\n- \u003cvar translate=\"no\"\u003eRANK\u003c/var\u003e: defines the priority of a list of machine types.\n\nREST\n\nCreate a zero-scale cluster using a Dataproc REST API\n[cluster.create](/dataproc/docs/reference/rest/v1/projects.regions.clusters/create)\nrequest:\n\n- Set [`ClusterConfig.ClusterType`](/dataproc/docs/reference/rest/v1/ClusterConfig#ClusterType.ENUM_VALUES.ZERO_SCALE) for the `secondaryWorkerConfig` to `ZERO_SCALE`.\n- Set the [`AutoscalingConfig.policyUri`](/dataproc/docs/reference/rest/v1/ClusterConfig#AutoscalingConfig.FIELDS.policy_uri) with the `ZERO_SCALE` autoscaling policy ID.\n- Add the `core:fs.defaultFS:gs://`\u003cvar translate=\"no\"\u003eBUCKET_NAME\u003c/var\u003e [SoftwareConfig.property](/static/dataproc/docs/reference/rest/v1/ClusterConfig#SoftwareConfig.FIELDS.properties). Replace \u003cvar translate=\"no\"\u003eBUCKET_NAME\u003c/var\u003e with the name of your Cloud Storage bucket.\n\nWhat's next\n\n- Learn more about [Dataproc autoscaling](/dataproc/docs/concepts/configuring-clusters/autoscaling)."]]