You can use instance templates to create managed instance groups with GPUs added to each instance. Managed instance groups use the template to create multiple identical instances. You can scale the number of instances in the group to match your workload.
Because the instances created must have the CUDA toolkit and NVIDIA driver installed, the process for creating an instance template for GPU instances is as follows:
- Create an instance that has attached GPUs.
- Install a GPU driver on the instance.
- Create an image from the disk that is attached to the VM instance that has the GPUs and drivers installed.
- Use the image to create an instance template.
- Use the template to create an instance group
Before you begin
- If you want to use the command-line examples in this guide, do the following:
- Install or update to the latest version of the Google Cloud CLI.
- Set a default region and zone.
- If you want to use the API examples in this guide, set up API access.
Creating an instance template
For steps to create an instance template, see Creating instance templates.
Console
To create the instance template using the Console, ensure that you make the following customizations:
- Specify the machine type.
- Specify the image name and family for your custom image that has attached GPUs and drivers.
For more information about using custom images, see Using custom or public images in your instance templates.
gcloud
To create the instance template using
gcloud compute instance-templates create,
include the --accelerators
and --maintenance-policy TERMINATE
flags.
The following example creates an instance template with 2 vCPUs, a 250 GB
boot disk based on your image (with drivers installed) and, an
NVIDIA K80 GPU. Replace my-image
and
my-project
with the image and project name for the image
that has the attached GPUs and drivers.
gcloud compute instance-templates create gpu-template \ --machine-type n1-standard-2 \ --boot-disk-size 250GB \ --accelerator type=nvidia-tesla-k80,count=1 \ --image-family my-image \ --image-project my-project \ --maintenance-policy TERMINATE \ --restart-on-failure
Creating an instance group
After you create the template, use the template to create an instance group. Every time you add an instance to the group, it starts that instance using the settings in the instance template.
If you are creating a regional managed instance group, be sure to
select zones
that specifically support the GPU model that you want. For a list of GPU models
and available zones, see GPUs on Compute Engine.
The following example creates a regional managed instance group across two
zones that support the nvidia-tesla-k80
model.
gcloud compute instance-groups managed create example-rmig \ --template gpu-template --base-instance-name example-instances \ --size 30 --zones us-east1-c,us-east1-d
What's next?
- Learn more about GPU platforms.
- To learn more about managing and scaling groups of instances, see Set the group's target size.
- To monitor GPU performance, see Monitoring GPU performance.
- To handle GPU host maintenance, see Handling GPU host maintenance events.
- To optimize GPU performance, see Optimizing GPU performance.