InstanceGroupConfig

The config settings for Compute Engine resources in an instance group, such as a master or worker group.

JSON representation
{
  "numInstances": integer,
  "instanceNames": [
    string
  ],
  "imageUri": string,
  "machineTypeUri": string,
  "diskConfig": {
    object (DiskConfig)
  },
  "isPreemptible": boolean,
  "preemptibility": enum (Preemptibility),
  "managedGroupConfig": {
    object (ManagedGroupConfig)
  },
  "accelerators": [
    {
      object (AcceleratorConfig)
    }
  ],
  "minCpuPlatform": string,
  "minNumInstances": integer,
  "instanceFlexibilityPolicy": {
    object (InstanceFlexibilityPolicy)
  },
  "startupConfig": {
    object (StartupConfig)
  }
}
Fields
numInstances

integer

Optional. The number of VM instances in the instance group. For HA cluster masterConfig groups, must be set to 3. For standard cluster masterConfig groups, must be set to 1.

instanceNames[]

string

Output only. The list of instance names. Dataproc derives the names from clusterName, numInstances, and the instance group.

imageUri

string

Optional. The Compute Engine image resource used for cluster instances.

The URI can represent an image or image family.

Image examples:

  • https://www.googleapis.com/compute/v1/projects/[projectId]/global/images/[image-id]
  • projects/[projectId]/global/images/[image-id]
  • image-id

Image family examples. Dataproc will use the most recent image from the family:

  • https://www.googleapis.com/compute/v1/projects/[projectId]/global/images/family/[custom-image-family-name]
  • projects/[projectId]/global/images/family/[custom-image-family-name]

If the URI is unspecified, it will be inferred from SoftwareConfig.image_version or the system default.

machineTypeUri

string

Optional. The Compute Engine machine type used for cluster instances.

A full URL, partial URI, or short name are valid. Examples:

  • https://www.googleapis.com/compute/v1/projects/[projectId]/zones/[zone]/machineTypes/n1-standard-2
  • projects/[projectId]/zones/[zone]/machineTypes/n1-standard-2
  • n1-standard-2

Auto Zone Exception: If you are using the Dataproc Auto Zone Placement feature, you must use the short name of the machine type resource, for example, n1-standard-2.

diskConfig

object (DiskConfig)

Optional. Disk option config settings.

isPreemptible

boolean

Output only. Specifies that this instance group contains preemptible instances.

preemptibility

enum (Preemptibility)

Optional. Specifies the preemptibility of the instance group.

The default value for master and worker groups is NON_PREEMPTIBLE. This default cannot be changed.

The default value for secondary instances is PREEMPTIBLE.

managedGroupConfig

object (ManagedGroupConfig)

Output only. The config for Compute Engine Instance Group Manager that manages this group. This is only used for preemptible instance groups.

accelerators[]

object (AcceleratorConfig)

Optional. The Compute Engine accelerator configuration for these instances.

minCpuPlatform

string

Optional. Specifies the minimum cpu platform for the Instance Group. See Dataproc -> Minimum CPU Platform.

minNumInstances

integer

Optional. The minimum number of primary worker instances to create. If minNumInstances is set, cluster creation will succeed if the number of primary workers created is at least equal to the minNumInstances number.

Example: Cluster creation request with numInstances = 5 and minNumInstances = 3:

  • If 4 VMs are created and 1 instance fails, the failed VM is deleted. The cluster is resized to 4 instances and placed in a RUNNING state.
  • If 2 instances are created and 3 instances fail, the cluster in placed in an ERROR state. The failed VMs are not deleted.
instanceFlexibilityPolicy

object (InstanceFlexibilityPolicy)

Optional. Instance flexibility Policy allowing a mixture of VM shapes and provisioning models.

startupConfig

object (StartupConfig)

Optional. Configuration to handle the startup of instances during cluster create and update process.

DiskConfig

Specifies the config of disk options for a group of VM instances.

JSON representation
{
  "bootDiskType": string,
  "bootDiskSizeGb": integer,
  "numLocalSsds": integer,
  "localSsdInterface": string
}
Fields
bootDiskType

string

Optional. Type of the boot disk (default is "pd-standard"). Valid values: "pd-balanced" (Persistent Disk Balanced Solid State Drive), "pd-ssd" (Persistent Disk Solid State Drive), or "pd-standard" (Persistent Disk Hard Disk Drive). See Disk types.

bootDiskSizeGb

integer

Optional. Size in GB of the boot disk (default is 500GB).

numLocalSsds

integer

Optional. Number of attached SSDs, from 0 to 8 (default is 0). If SSDs are not attached, the boot disk is used to store runtime logs and HDFS data. If one or more SSDs are attached, this runtime bulk data is spread across them, and the boot disk contains only basic config and installed binaries.

Note: Local SSD options may vary by machine type and number of vCPUs selected.

localSsdInterface

string

Optional. Interface type of local SSDs (default is "scsi"). Valid values: "scsi" (Small Computer System Interface), "nvme" (Non-Volatile Memory Express). See local SSD performance.

Preemptibility

Controls the use of preemptible instances within the group.

Enums
PREEMPTIBILITY_UNSPECIFIED Preemptibility is unspecified, the system will choose the appropriate setting for each instance group.
NON_PREEMPTIBLE

Instances are non-preemptible.

This option is allowed for all instance groups and is the only valid value for Master and Worker instance groups.

PREEMPTIBLE

Instances are preemptible.

This option is allowed only for secondary worker groups.

SPOT

Instances are Spot VMs.

This option is allowed only for secondary worker groups. Spot VMs are the latest version of preemptible VMs, and provide additional features.

ManagedGroupConfig

Specifies the resources used to actively manage an instance group.

JSON representation
{
  "instanceTemplateName": string,
  "instanceGroupManagerName": string,
  "instanceGroupManagerUri": string
}
Fields
instanceTemplateName

string

Output only. The name of the Instance Template used for the Managed Instance Group.

instanceGroupManagerName

string

Output only. The name of the Instance Group Manager for this group.

instanceGroupManagerUri

string

Output only. The partial URI to the instance group manager for this group. E.g. projects/my-project/regions/us-central1/instanceGroupManagers/my-igm.

AcceleratorConfig

Specifies the type and number of accelerator cards attached to the instances of an instance. See GPUs on Compute Engine.

JSON representation
{
  "acceleratorTypeUri": string,
  "acceleratorCount": integer
}
Fields
acceleratorTypeUri

string

Full URL, partial URI, or short name of the accelerator type resource to expose to this instance. See Compute Engine AcceleratorTypes.

Examples:

  • https://www.googleapis.com/compute/v1/projects/[projectId]/zones/[zone]/acceleratorTypes/nvidia-tesla-t4
  • projects/[projectId]/zones/[zone]/acceleratorTypes/nvidia-tesla-t4
  • nvidia-tesla-t4

Auto Zone Exception: If you are using the Dataproc Auto Zone Placement feature, you must use the short name of the accelerator type resource, for example, nvidia-tesla-t4.

acceleratorCount

integer

The number of the accelerator cards of this type exposed to this instance.

InstanceFlexibilityPolicy

Instance flexibility Policy allowing a mixture of VM shapes and provisioning models.

JSON representation
{
  "provisioningModelMix": {
    object (ProvisioningModelMix)
  },
  "instanceSelectionList": [
    {
      object (InstanceSelection)
    }
  ],
  "instanceSelectionResults": [
    {
      object (InstanceSelectionResult)
    }
  ]
}
Fields
provisioningModelMix

object (ProvisioningModelMix)

Optional. Defines how the Group selects the provisioning model to ensure required reliability.

instanceSelectionList[]

object (InstanceSelection)

Optional. List of instance selection options that the group will use when creating new VMs.

instanceSelectionResults[]

object (InstanceSelectionResult)

Output only. A list of instance selection results in the group.

ProvisioningModelMix

Defines how Dataproc should create VMs with a mixture of provisioning models.

JSON representation
{
  "standardCapacityBase": integer,
  "standardCapacityPercentAboveBase": integer
}
Fields
standardCapacityBase

integer

Optional. The base capacity that will always use Standard VMs to avoid risk of more preemption than the minimum capacity you need. Dataproc will create only standard VMs until it reaches standardCapacityBase, then it will start using standardCapacityPercentAboveBase to mix Spot with Standard VMs. eg. If 15 instances are requested and standardCapacityBase is 5, Dataproc will create 5 standard VMs and then start mixing spot and standard VMs for remaining 10 instances.

standardCapacityPercentAboveBase

integer

Optional. The percentage of target capacity that should use Standard VM. The remaining percentage will use Spot VMs. The percentage applies only to the capacity above standardCapacityBase. eg. If 15 instances are requested and standardCapacityBase is 5 and standardCapacityPercentAboveBase is 30, Dataproc will create 5 standard VMs and then start mixing spot and standard VMs for remaining 10 instances. The mix will be 30% standard and 70% spot.

InstanceSelection

Defines machines types and a rank to which the machines types belong.

JSON representation
{
  "machineTypes": [
    string
  ],
  "rank": integer
}
Fields
machineTypes[]

string

Optional. Full machine-type names, e.g. "n1-standard-16".

rank

integer

Optional. Preference of this instance selection. Lower number means higher preference. Dataproc will first try to create a VM based on the machine-type with priority rank and fallback to next rank based on availability. Machine types and instance selections with the same priority have the same preference.

InstanceSelectionResult

Defines a mapping from machine types to the number of VMs that are created with each machine type.

JSON representation
{
  "machineType": string,
  "vmCount": integer
}
Fields
machineType

string

Output only. Full machine-type names, e.g. "n1-standard-16".

vmCount

integer

Output only. Number of VM provisioned with the machineType.

StartupConfig

Configuration to handle the startup of instances during cluster create and update process.

JSON representation
{
  "requiredRegistrationFraction": number
}
Fields
requiredRegistrationFraction

number

Optional. The config setting to enable cluster creation/ updation to be successful only after requiredRegistrationFraction of instances are up and running. This configuration is applicable to only secondary workers for now. The cluster will fail if requiredRegistrationFraction of instances are not available. This will include instance creation, agent registration, and service registration (if enabled).