Supported machine types

Dataproc clusters are built on Compute Engine instances. Machine types define the virtualized hardware resources available to an instance. Compute Engine offers both predefined machine types and custom machine types. Dataproc clusters can use both predefined and custom types for both master and/or worker nodes.

Dataproc supports the following Compute Engine predefined machine types in clusters:

  • General purpose machine types, which include N1, N2, N2D, and E2 machine types:
  • Dataproc also supports N1, N2, N2D, and E2 custom machine types.

  • Compute-optimized machine types, which include C2 machine types.

  • Memory-optimized machine types, which include M1 and M2 machine types.

Custom machine types

Custom machine types are ideal for the following workloads:

  • Workloads that are not a good fit for the predefined machine types.
  • Workloads that require more processing power or more memory, but don't need all of the upgrades that are provided by the next machine type level.

For example, if you have a workload that needs more processing power than that provided by an n1-standard-4 instance, but the next step up, an n1-standard-8 instance, provides too much capacity. With custom machine types, you can create Dataproc clusters with master and/or worker nodes in the middle range, with 6 virtual CPUs and 25 GB of memory.

Specifying a custom machine type

Custom machine types use a special machine type specification and are subject to limitations. For example, the custom machine type specification for a custom VM with 6 virtual CPUs and 22.5 GB of memory is custom-6-23040.

The numbers in the machine type specification correspond to the number of virtual CPUs (vCPUs)in the machine (6) and the amount of memory (23040). The amount of memory is calculated by multiplying the amount of memory in gigabytes by 1024 (see Expressing memory in GB or MB). In this example, 22.5 (GB) is multiplied by 1024: 22.5 * 1024 = 23040.

You use the above syntax to specify the custom machine type with your clusters. You can set the machine type for either master or worker nodes or both when you create a cluster. If you set both, the master node can use a custom machine type that is different from the custom machine type used by workers. The machine type used by any secondary workers follow the settings for primary workers and cannot be separately set (see Secondary workers - preemptible and non-preemptible VMs).

Pricing

Custom machine type pricing is based on the resources used in a custom machine. Dataproc pricing is added to the cost of compute resources, and is based on the total number of virtual CPUs (vCPUs) used in a cluster.

Create a Dataproc cluster with a specified machine type

gcloud command

Run the gcloud dataproc clusters create command with the following flags to create a Dataproc cluster with master and/or worker machine types:
  • The --master-machine-type machine-type flag allows you to set the predefined or custom machine type used by the master VM instance in your cluster (or master instances if you create a HA cluster)
  • The --worker-machine-type custom-machine-type flag allows you to set the predefined or custom machine type used by the worker VM instances in your cluster

Example:

gcloud dataproc clusters create test-cluster /
    --master-machine-type custom-6-23040 /
    --worker-machine-type custom-6-23040 /
    other args
Once the Dataproc cluster starts, cluster details are displayed in the terminal window. The following is a partial sample listing of cluster properties displayed in the terminal window:
...
properties:
  distcp:mapreduce.map.java.opts: -Xmx1638m
  distcp:mapreduce.map.memory.mb: '2048'
  distcp:mapreduce.reduce.java.opts: -Xmx4915m
  distcp:mapreduce.reduce.memory.mb: '6144'
  mapred:mapreduce.map.cpu.vcores: '1'
  mapred:mapreduce.map.java.opts: -Xmx1638m
...

REST API

To create a cluster with custom machine types, set the machineTypeUri in the masterConfig and/or workerConfig InstanceGroupConfig in the cluster.create API request.

Example:
POST /v1/projects/my-project-id/regions/is-central1/clusters/
{
  "projectId": "my-project-id",
  "clusterName": "test-cluster",
  "config": {
    "configBucket": "",
    "gceClusterConfig": {
      "subnetworkUri": "default",
      "zoneUri": "us-central1-a"
    },
    "masterConfig": {
      "numInstances": 1,
      "machineTypeUri": "n1-highmem-4",
      "diskConfig": {
        "bootDiskSizeGb": 500,
        "numLocalSsds": 0
      }
    },
    "workerConfig": {
      "numInstances": 2,
      "machineTypeUri": "n1-highmem-4",
      "diskConfig": {
        "bootDiskSizeGb": 500,
        "numLocalSsds": 0
      }
    }
  }
}

Console

From the Configure nodes panel of the Dataproc Create a cluster page in the Cloud Console, select machine family, series and type for the cluster's master and worker nodes.

CPU Extended Memory

Dataproc supports custom machine types with extended memory beyond the 6.5GB per vCPU limit (see Extended Memory Pricing).

Using Extended Memory

gcloud Command

To create a cluster from the gcloud command line with custom CPUs with extended memory, add a -ext suffix to the ‑‑master-machine-type and/or ‑‑worker-machine-type flags.

Example

The following gcloud command-line sample creates a Dataproc cluster with 1 CPU and 50 GB memory (50 * 1024 = 51200) in each node:

gcloud dataproc clusters create test-cluster /
    --master-machine-type custom-1-51200-ext /
    --worker-machine-type custom-1-51200-ext /
    other args

REST API

The following sample JSON snippet from a Dataproc REST API clusters.create request specifies 1 CPU and 50 GB memory (50 * 1024 = 51200) in each node:

...
    "masterConfig": {
      "numInstances": 1,
      "machineTypeUri": "custom-1-51200-ext",
    ...
    },
    "workerConfig": {
      "numInstances": 2,
      "machineTypeUri": "custom-1-51200-ext",
     ...
...

Console

Click Extend memory when customizing Machine type memory in the Master node and/or Worker nodes section on the Dataproc Create a cluster page on Cloud Console.

For more information

See Creating a VM Instance with a Custom Machine Type.