Custom machine types

Google Cloud Dataproc clusters are built on Google Compute Engine instances. Machine types define the virtualized hardware resources available to an instance. Compute Engine offers both predefined machine types and custom machine types. Cloud Dataproc clusters can use both standard and custom types for both master and/or worker nodes.

Use cases for custom machine types

As noted in the custom machine type documentation, custom machine types are ideal for the following workloads:

  • Workloads that are not a good fit for the predefined machine types.
  • Workloads that require more processing power or more memory, but don't need all of the upgrades that are provided by the next machine type level.

Example

As an example, let's assume that you have a workload that needs more processing power than that provided by an n1-standard-4 instance, but the next step up, an n1-standard-8 instance, provides too much capacity. With custom machine types, you can create Cloud Dataproc clusters with master and/or worker nodes in the middle range, with 6 virtual CPUs and 25 GB of memory.

Pricing

Custom machine type pricing varies based on the resources used in a custom machine. Dataproc pricing is added to the cost of compute resources you use, and is based on the total number of virtual CPUs used in a cluster.

Using custom machine types with Cloud Dataproc

At present, creating clusters with custom machine types is only supported through the Google Cloud SDK gcloud dataproc command.

Understand custom machine types first

Before you create a cluster with custom machine types, review the Creating a VM Instance with a Custom Machine Type to understand important considerations, including custom type specifications and pricing.

Custom machine types use a special machine type specification. As an example, the custom machine type specification for a custom VM with 6 virtual CPUs and 22.5 GB of memory is:

custom-6-23040

The numbers in the machine type correspond to the number of virtual CPUs in the machine (in this case 6) and the amount of memory (in this case 23040). The amount of memory is calculated by multiplying the amount of memory in gigabytes by 1024. In this example we multiply 22.5 (GB) by 1024:

22.5 * 1024 = 23040

Following the limits on CPU and memory combinations, you can use the above syntax to specify the custom machine type you wish you use with your clusters. You can set the machine type for master and or worker nodes or both when you create a cluster. If you set both, the master node can use a custom machine type that is different from the custom machine type used by workers. The machine type settings for preemptible (secondary) workers follow the settings for primary workers and cannot be separately set (see How preemptibles work with Cloud Dataproc).

Create a Cloud Dataproc cluster with custom machine types

gcloud command

Run the gcloud dataproc clusters create command with the following flags to create a Cloud Dataproc cluster with master and/or worker custom machine types:
  • The --master-machine-type custom-machine-type flag allows you to set the custom machine type used by the master VM instance in your cluster (or master instances if you create a HA cluster)
  • The --worker-machine-type custom-machine-type flag allows you to set the custom machine type used by the worker VM instances in your cluster

Example:

gcloud dataproc clusters create test-cluster /
    --master-machine-type custom-6-23040 /
    --worker-machine-type custom-6-23040 /
    other args
Once the Cloud Dataproc cluster starts, cluster details are displayed in the terminal window. The following is a partial sample listing of cluster properties displayed in the terminal window:
...
properties:
  distcp:mapreduce.map.java.opts: -Xmx1638m
  distcp:mapreduce.map.memory.mb: '2048'
  distcp:mapreduce.reduce.java.opts: -Xmx4915m
  distcp:mapreduce.reduce.memory.mb: '6144'
  mapred:mapreduce.map.cpu.vcores: '1'
  mapred:mapreduce.map.java.opts: -Xmx1638m
...

REST API

To create a cluster with custom machine types, set the machineTypeUri in the masterConfig and/or workerConfig InstanceGroupConfig in the cluster.create API request.

Example:
POST /v1/projects/my-project-id/regions/global/clusters/
{
  "projectId": "my-project-id",
  "clusterName": "test-cluster",
  "config": {
    "configBucket": "",
    "gceClusterConfig": {
      "subnetworkUri": "default",
      "zoneUri": "us-central1-a"
    },
    "masterConfig": {
      "numInstances": 1,
      "machineTypeUri": "n1-highmem-4",
      "diskConfig": {
        "bootDiskSizeGb": 500,
        "numLocalSsds": 0
      }
    },
    "workerConfig": {
      "numInstances": 2,
      "machineTypeUri": "n1-highmem-4",
      "diskConfig": {
        "bootDiskSizeGb": 500,
        "numLocalSsds": 0
      }
    }
  }
}

Console

When you create a Cloud Dataproc cluster from the Create a cluster page, click Customize in the Machine type section of the Master node and/or Worker nodes panel, then fill in the number of cores and amount of memory. The console labels and help text assist you in selecting valid custom machine type values.

The following screenshot shows master and worker node values to create a cluster with 6 virtual CPUs and 22.5 GB memory in each node.
Click Extend memory to provide extended memory values for the master and/or worker nodes.

CPU Extended Memory

Cloud Dataproc supports custom machine types with extended memory beyond the 6.5GB per vCPU limit (see Extended Memory Pricing).

Using Extended Memory

gcloud Command

To create a cluster from the gcloud command line with custom CPUs with extended memory, add a -ext suffix to the ‑‑master-machine-type and/or ‑‑worker-machine-type flags.

Example

The following gcloud command-line sample creates a Cloud Dataproc cluster with 1 CPU and 50 GB memory (50 * 1024 = 51200) in each node:

gcloud dataproc clusters create test-cluster /
    --master-machine-type custom-1-51200-ext /
    --worker-machine-type custom-1-51200-ext /
    other args

REST API

The following sample JSON snippet from a Cloud Dataproc REST API clusters.create request specifies 1 CPU and 50 GB memory (50 * 1024 = 51200) in each node:

...
    "masterConfig": {
      "numInstances": 1,
      "machineTypeUri": "custom-1-51200-ext",
    ...
    },
    "workerConfig": {
      "numInstances": 2,
      "machineTypeUri": "custom-1-51200-ext",
     ...
...

Console

Click Extend memory when customizing Machine type memory in the Master node and/or Worker nodes section on the Cloud Dataproc Create a cluster page on GCP Console.

For more information

See Creating a VM Instance with a Custom Machine Type.

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataproc Documentation