Custom machine types

Google Cloud Dataproc clusters are built on Google Compute Engine instances. Machine types define the virtualized hardware resources available to an instance. Compute Engine offers both predefined machine types and custom machine types. Cloud Dataproc clusters can use both standard and custom types for both master and/or worker nodes.

Use cases for custom machine types

As noted in the custom machine type documentation, custom machine types are ideal for the following workloads:

  • Workloads that are not a good fit for the predefined machine types.
  • Workloads that require more processing power or more memory, but don't need all of the upgrades that are provided by the next machine type level.

Example

As an example, let's assume that you have a workload that needs more processing power than that provided by an n1-standard-4 instance, but the next step up, an n1-standard-8 instance, provides too much capacity. With custom machine types, you can create Cloud Dataproc clusters with master and/or worker nodes in the middle range, with 6 virtual CPUs and 25 GB of memory.

Pricing

Custom machine type pricing varies based on the resources used in a custom machine. Dataproc pricing is added to the cost of compute resources you use, and is based on the total number of virtual CPUs used in a cluster.

Using custom machine types with Cloud Dataproc

At present, creating clusters with custom machine types is only supported through the Google Cloud SDK gcloud dataproc command.

Understand custom machine types first

Before you create a cluster with custom machine types, we recommend you review the custom machine type documentation to understand important considerations, including custom type specifications and pricing.

Custom machine types use a customized machine type name. As an example, the custom machine type name for a custom VM with 6 virtual CPUs and 22.5 GB of memory is:

custom-6-23040

The numbers in the machine type correspond to the number of virtual CPUs in the machine (in this case 6) and the amount of memory (in this case 23040). The amount of memory is calculated by multiplying the amount of memory in gigabytes by 1024. In this example we multiply 22.5 (GB) by 1024:

22.5 * 1024 = 23040

Following the limits on CPU and memory combinations, you can use this method to find the name of the custom machine type you wish you use with your clusters.

Create a Cloud Dataproc cluster with custom machine types

Once you know the machine type name you wish to use, you can use the gcloud dataproc command to create a cluster with that custom machine type.

The gcloud dataproc clusters create command has two options to allow you to set the master and/or worker machine type. The --master-machine-type allows you to set the type used by workers The --worker-machine-type allows you to set worker machine types.

For example, to create a cluster named test-cluster with the custom machine-type created above for both the master and worker nodes, you can use the following command:

gcloud dataproc clusters create test-cluster /
    --worker-machine-type custom-6-23040 /
    --master-machine-type custom-6-23040

You can set the machine type for both master and worker nodes together or independently. If your set both, the master node can use a custom machine type that is different from the worker nodes' custom machine type.

Once the Cloud Dataproc cluster starts, cluster details are displayed in the terminal window. This example shows a partial listing of cluster properties in the terminal window:

...
properties:
    distcp:mapreduce.map.java.opts: -Xmx1638m
    distcp:mapreduce.map.memory.mb: '2048'
    distcp:mapreduce.reduce.java.opts: -Xmx4915m
    distcp:mapreduce.reduce.memory.mb: '6144'
    mapred:mapreduce.map.cpu.vcores: '1'
    mapred:mapreduce.map.java.opts: -Xmx1638m
...

Use instance templates to explore custom machine type settings

You can use the instance template feature to experiment with memory and CPU combinations and then find the machine type name for that combination. In this process you will go through the steps to create an instance template but will not actually create an instance template. This is entirely optional, as you can manually determine machine type names, as explained above.

Start by opening the Google Cloud Platform Console. From the Compute EngineInstance groups page, click the Create instance template button.

In the "Create an instance template" form, click on the Customize link to Customize the resources of the virtual machine.

In the advanced form, you can now choose the number of virtual CPUs and the amount of memory dedicated to the instance.

Once you have adjusted the machine settings to your preferences, you can find the machine type name by clicking on the REST link at the bottom of the page.

A window will open showing you the REST code for programmatically creating this instance template. You can see the machine type name next to machineType.

You can use this machine type name for your Cloud Dataproc clusters. Click on the Close button to close the window and then click on the Cancel button to leave the "Create an instance template" form.

For more information

For more information about custom machine types, take a look at the custom machine type documentation.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Google Cloud Dataproc Documentation