Creating a PyTorch Deep Learning VM Instance

This topic provides instructions for creating a new Deep Learning VM instance with PyTorch and other tools pre-installed. You have the option of including one or more GPUs in your instance on setup.

Before you begin

If you are using GPUs with your Deep Learning VM, check the quotas page to ensure that you have enough GPUs available in your project: Quotas.

If GPUs are not listed on the quotas page or you require additional GPU quota, you can request a quota increase. See "Requesting additional quota" on the Compute Engine Resource Quotas page.

Creating a PyTorch Deep Learning VM instance from the Cloud Marketplace

Cloud Marketplace lets you quickly deploy functional software packages that run on Compute Engine. A Deep Learning VM with PyTorch can be created quickly from the Cloud Marketplace within the GCP Console without having to use the command line.

Without GPUs

To provision a Deep Learning VM instance without a GPU:

  1. Visit the Deep Learning Virtual Machine Image Cloud Marketplace page.
  2. Click Launch on Compute Engine.
  3. Enter a Deployment name which will be the root of your VM name. Compute Engine appends -vm to this name when naming your instance.
  4. Set Framework to PyTorch and choose Zone.
  5. In the GPU section, set the number of GPUs to Zero and enter n/a in the Quota confirmation field.
  6. In the CPU section, select your Machine type. To learn more about machine types, see Machine Types.
  7. Select your boot disk type and size.
  8. Click Deploy.

Once the VM has been deployed, the page will update with instructions for accessing the instance.

With one or more GPUs

Compute Engine offers the option of adding GPUs to your virtual machine instances. GPUs offer faster processing for many complex data and machine learning tasks. To learn more about GPUs, see GPUs on Compute Engine.

To provision a Deep Learning VM instance with one or more GPUs:

  1. Visit the Deep Learning Virtual Machine Image Cloud Marketplace page.
  2. Click Launch on Compute Engine.
  3. Enter a Deployment name which will be the root of your VM name. Compute Engine appends -vm to this name when naming your instance.
  4. Set Framework to PyTorch and choose Zone.
  5. Choose your GPU type. Not all GPU types are available in all zones; see the GPUs on Compute Engine page to confirm that your combination is supported.
  6. Choose the number of GPUs to deploy. Each GPU supports different numbers; see the GPUs on Compute Engine page to confirm that your combination is supported.
  7. An NVIDIA driver is required when using GPUs. You can install the driver yourself, or select the checkbox to have the latest stable driver installed automatically.
  8. Follow the instructions on the page to check your GPU quota, and enter the required phrase to confirm.
  9. In the CPU section, adjust your machine type as needed. For certain workflows, you may want to increase the number or cores (e.g. for CPU-heavy preprocessing) or the amount of memory (e.g. using CPU as a parameter store for distributed training).
  10. Click Deploy.

If you've elected to install NVIDIA drivers, allow 3-5 minutes for installation to complete.

Once the VM has been deployed, the page will update with instructions for accessing the instance.

Creating a PyTorch Deep Learning VM instance from the command line

To use the gcloud command-line tool to create a new a Deep Learning VM instance, you must first install and initialize the Cloud SDK:

  1. Download and install the Cloud SDK using the instructions given on Installing Google Cloud SDK.
  2. Initialize the SDK using the instructions given on Initializing Cloud SDK.

To use gcloud in Cloud Shell, first activate Cloud Shell using the instructions given on Starting Cloud Shell.

Without GPUs

To create a Deep Learning VM with the latest PyTorch instance and a CPU, enter the following at the command line:

export IMAGE_FAMILY="pytorch-latest-cpu"
export ZONE="us-west1-b"
export INSTANCE_NAME="my-instance"

gcloud compute instances create $INSTANCE_NAME \
  --zone=$ZONE \
  --image-family=$IMAGE_FAMILY \
  --image-project=deeplearning-platform-release

Options:

  • --image-family must be either pytorch-latest-cpu or pytorch-VERSION-cpu (for example, pytorch-0-4-cpu).

  • --image-project must be deeplearning-platform-release.

With one or more GPUs

Compute Engine offers the option of adding one or more GPUs to your virtual machine instances. GPUs offer faster processing for many complex data and machine learning tasks. To learn more about GPUs, see GPUs on Compute Engine.

To create a Deep Learning VM with the latest PyTorch instance and one or more attached GPUs, enter the following at the command line:

export IMAGE_FAMILY="pytorch-latest-cu92"
export ZONE="us-west1-b"
export INSTANCE_NAME="my-instance"

gcloud compute instances create $INSTANCE_NAME \
  --zone=$ZONE \
  --image-family=$IMAGE_FAMILY \
  --image-project=deeplearning-platform-release \
  --maintenance-policy=TERMINATE \
  --accelerator="type=nvidia-tesla-v100,count=1" \
  --metadata="install-nvidia-driver=True"

Options:

  • --image-family must be either pytorch-latest-cu92 or pytorch-VERSION-cu92 (for example, pytorch-0-4-cu92).

  • --image-project must be deeplearning-platform-release.

  • --maintenance-policy must be TERMINATE. To learn more, see GPU Restrictions.

  • --accelerator specifies the GPU type to use. Must be specified in the format --accelerator="type=TYPE,count=COUNT". Supported values of TYPE are:

    • nvidia-tesla-v100 (count=1 or 8)
    • nvidia-tesla-p100 (count=1, 2, or 4)
    • nvidia-tesla-p4 (count=1, 2, or 4)
    • nvidia-tesla-k80 (count=1, 2, 4, or 8)

    Not all GPU types are supported in all regions. For details, see GPUs on Compute Engine.

  • --metadata is used to specify that the NVIDIA driver should be installed on your behalf. The value is install-nvidia-driver=True. If specified, Compute Engine loads the latest stable driver on the first boot and performs the necessary steps (including a final reboot to activate the driver).

If you've elected to install NVIDIA drivers, allow 3-5 minutes for installation to complete.

It may take up to 5 minutes before your VM is fully provisioned. In this time, you will be unable to SSH into your machine. When the installation is complete, to guarantee that the driver installation was successful, you can SSH in and run nvidia-smi.

When you've configured your image, you can save a snapshot of your image so that you can start derivitave instances without having to wait for the driver installation.

Creating a preemptible instance

You can create a preemptible Deep Learning VM instance. A preemptible instance is an instance you can create and run at a much lower price than normal instances. However, Compute Engine might terminate (preempt) these instances if it requires access to those resources for other tasks. Preemptible instances will always terminate after 24 hours. To learn more about preemptible instances, see Preemptible VM Instances.

To create a preemptible Deep Learning VM instance:

  • Follow the instructions located above to create a new instance using the command line. To the gcloud compute instances create command, append the following:

      --preemptible

What's next

For instructions on connecting to your new Deep Learning VM instance through the GCP Console or command line, see Connecting to Instances. Your instance name is the Deployment name you specified with -vm appended.

Was this page helpful? Let us know how we did:

Send feedback about...

Deep Learning VM