Optimizing GPU performance

You can use the following options to improve the performance of GPUs on virtual machine (VM) instances :

Disabling autoboost

When you use the autoboost feature with NVIDIA® Tesla® K80 GPUs, the system automatically adjusts clock speeds to find the optimal rate for a given application. However, constantly adjusting clock speeds can also lead to some reduction in the performance of your GPUs. For more information about autoboost, see Increase Performance with GPU Boost and K80 Autoboost.

We recommend that you disable autoboost when running NVIDIA® Tesla® K80 GPUs on Compute Engine.

To disable autoboost on instances with NVIDIA® Tesla® K80 GPUs attached, run the following command:

sudo nvidia-smi --auto-boost-default=DISABLED

The output might resemble the following:

All done.

Setting GPU clock speed to the maximum frequency

To set GPU clock speed to the maximum frequency on instances with NVIDIA® Tesla® K80 GPUs attached, run the following command:

sudo nvidia-smi --applications-clocks=2505,875

Using network bandwidths of up to 100 Gbps

Creating VM instances that use higher bandwidths

You can use higher network bandwidths to improve the performance of distributed workloads on VM instances running on Compute Engine that use NVIDIA® Tesla® T4 or V100 GPUs.

For more information about the network bandwidths that are supported for your GPU instances, see Network bandwidths and GPUs.

To create a VM instance with attached GPUs and a network bandwidth of up to 100 Gbps:

  1. Review the minimum CPU, GPU, and memory configuration required to get the maximum bandwidth available.
  2. Create your VM instance with attached T4 or V100 GPUs, see Adding or removing GPUs.

    The image that you use to create the VM instance must have the Compute Engine virtual network interface (gVNIC) installed. For more information about creating VM instances that support the Compute Engine virtual network interface, see Creating VM instances that use the Compute Engine virtual network interface.

    Alternatively, you can use the tf-latest-gpu-gvnic image from Google deep learning VM image catalogue that already has the GPU driver, ML software, and the Compute Engine network driver preinstalled.

    For example, to create a VM instance named test-instance that has a maximum bandwidth of 100 Gbps, has eight V100 GPUs attached, and uses the deep learning VM image, run the following command:

    gcloud compute instances create test-instance \
       --custom-cpu 96 \
       --custom-memory 624 \
       --image-project=deeplearning-platform-release \
       --image-family=tf-latest-gpu-gvnic \
       --accelerator type=nvidia-tesla-v100,count=8 \
       --maintenance-policy TERMINATE \
       --metadata="install-nvidia-driver=True"  \
       --boot-disk-size 200GB \
       --zone=us-central1-f
    
  3. After you create the VM instance, you can verify the network bandwidth.

Checking network bandwidth

When working with high bandwidth GPUs, you can use a network traffic tool, such as iperf2, to measure the networking bandwidth.

To check bandwidth speeds, you need at least two VM instances that have attached GPUs and can both support the bandwidth speed that you are testing.

To measure the network bandwidth, complete the following steps:

  1. On one VM instance, run the following command:

    iperf -s
  2. On another VM instance, run the following command. Replace server_dns_or_internal_ip with the DNS or internal IP address for your VM instance.

    iperf -c server_dns_or_internal_ip -P 16 -t 30

When you use the maximum available bandwidth of 100 Gbps, keep the following considerations in mind:

  • Due to header overheads for protocols such as Ethernet, IP, and TCP on the virtualization stack, the throughput, as measured by netperf, saturates at around 90 Gbps.

    TCP is able to achieve the 100-Gbps network speed. Other protocols, such as UDP are currently slower.

  • Due to factors such as protocol overhead and network congestion, end-to-end performance of data streams might be slightly lower than 100 Gbps.

  • You need to use multiple TCP streams to achieve maximum bandwidth between VM instances. Google recommends 4–16 streams. At 16 flows you'll frequently max out the throughput. Depending on your application and software stack, you might need to adjust settings or your code to set up multiple streams.

  • The 100-Gbps network bandwidth can only be achieved unidirectionally. You can expect the sum of TX + RX to be roughly 100 Gbps.

What's next?

Was this page helpful? Let us know how we did:

Send feedback about...

Compute Engine Documentation