Monitoring GPU performance on Windows VMs


To help with better utilization of resources, you can track the GPU usage rates of your virtual machine (VM) instances.

When you know the GPU usage rates, you can perform tasks such as setting up managed instance groups that can be used to autoscale resources.

To review GPU metrics using Cloud Monitoring, complete the following steps:

  1. On each VM, set up the GPU metrics reporting script. This script installs the GPU metrics reporting agent. This agent runs at intervals on the VM to collect GPU data, and sends this data to Cloud Monitoring.
  2. On each VM, run the script.
  3. On each VM, set GPU metrics reporting agent to automatically start on boot.
  4. View logs in Google Cloud Cloud Monitoring.

Set up the GPU metrics reporting script

Requirements

On each of your VMs, check that you meet the following requirements:

Download the script

Open a PowerShell terminal as an administrator and use the Invoke-WebRequest command to download the script.

Invoke-WebRequest is available on PowerShell 3.0 or later. We recommend that you use ctrl+v to paste the copied code blocks.

mkdir c:\google-scripts
cd c:\google-scripts
Invoke-Webrequest -uri https://raw.githubusercontent.com/GoogleCloudPlatform/compute-gpu-monitoring/main/windows/gce-gpu-monitoring-cuda.ps1 -outfile gce-gpu-monitoring-cuda.ps1

Run the script

cd c:\google-scripts
.\gce-gpu-monitoring-cuda.ps1

Configure the agent to automatically start on boot

To ensure that the GPU metrics reporting agent agent is set up to run on system boot, use the following command to add the agent to the Windows Task Scheduler.

$Trigger= New-ScheduledTaskTrigger -AtStartup
$Trigger.ExecutionTimeLimit = "PT0S"
$User= "NT AUTHORITY\SYSTEM"
$Action= New-ScheduledTaskAction -Execute "PowerShell.exe" -Argument "C:\google-scripts\gce-gpu-monitoring-cuda.ps1"
$settingsSet = New-ScheduledTaskSettingsSet
# Set the Execution Time Limit to unlimited on all versions of Windows Server
$settingsSet.ExecutionTimeLimit = 'PT0S'
Register-ScheduledTask -TaskName "MonitoringGPUs" -Trigger $Trigger -User $User -Action $Action -Force -Settings $settingsSet

Review metrics in Cloud Monitoring

  1. In the Google Cloud Console, go to the Metrics Explorer page.

    Go to Monitoring

  2. In the Resource type drop-down, select VM instance.

  3. In the Metric drop-down, type custom/instance/gpu/utilization.

    Your GPU utilization should resemble the following output:

    Cloud Monitoring initiation.

What's next?