Known issues


This page describes known issues that you might run into while using Compute Engine. For issues that specifically affect Confidential VMs, see Confidential VM Known issues.

Price estimation unavailable when creating Z3 VMs

During Preview for Z3, there is no price estimate available in the Google Cloud console. To see the prices associated with a Z3 VM, see Google Cloud SKUs.

General issues

The following issues provide troubleshooting guidance or general information.

Creating reservations or future reservation requests using an instance template that specifies an A2, C3, or G2 machine type causes issues

If you use an instance template that specifies an A2, C3, or G2 machine type to create a reservation, or to create and submit a future reservation request for review, you encounter issues. Specifically:

  • Creating the reservation might fail. If it succeeds, then one of the following applies:

    • If you created an automatically consumed reservation (default), creating VMs with matching properties won't consume the reservation.

    • If you created a specific reservation, creating VMs to specifically target the reservation fails.

  • Creating the future reservation request succeeds. However, if you submit it for review, Google Cloud declines your request.

You can't replace the instance template used to create a reservation or future reservation request, or override the template's VM properties. If you want to reserve resources for A2, C3, or G2 machine types, do one of the following instead:

Limitations when using c3-standard-*-lssd and c3d-standard-*-lssd machine types with Google Kubernetes Engine

When using the Google Kubernetes Engine API, the node pool with Local SSD attached that you provision must have the same number of SSD disks as the selected C3 and C3D machine type. For example, if you plan to create a VM that uses the c3-standard-8-lssd there must be 2 SSD disks, whereas for a c3d-standard-8-lssd, just 1 SSD disk is required. If the disk number doesn't match you will get a Local SSD misconfiguration error from the Compute Engine control plane. See the General-purpose machine family document to select the correct number of Local SSD disks based on the C3 or C3D lssd machine type.

Using the Google Kubernetes Engine Google Cloud console to create a cluster or node pool with c3-standard-*-lssd and c3d-standard-*-lssd VMs results in node creation failure or a failure to detect Local SSDs as ephemeral storage.

Single flow TCP throughput variability on C3D VMs

C3D VMs larger than 30 vCPUs might experience single flow TCP throughput variability and occasionally be limited to 20-25 Gbps. To achieve higher rates, use multiple tcp flows.

Managed instance group having T2D machine series doesn't autoscale as expected

Managed instance groups (MIGs) that have T2D machine series VMs in projects that were created before June 18, 2023, don't correctly detect CPU load on VMs in the MIG. In such projects, autoscaling based on CPU utilization in a MIG that has T2D machine series VMs might be incorrect.

To apply a fix to your project, contact Cloud Customer Care.

The CPU utilization observability metric is incorrect for VMs that use one thread per core

If your VM's CPU uses one thread per core, the CPU utilization Cloud Monitoring observability metric in the Compute Engine > VM instances > Observability tab only scales to 50%. Two threads per core is the default for all machine types, except Tau T2D. For more information, see Set number of threads per core.

To view your VM's CPU utilization normalized to 100%, view CPU utilization in Metrics Explorer instead. For more information, see Create charts with Metrics Explorer.

Google Cloud console SSH-in-browser connections might fail if you use custom firewall rules

If you use custom firewall rules to control SSH access to your VM instances, you might not be able to use the SSH-in-browser feature.

To work around this issue, do one of the following:

Downsizing or deleting specific reservations stops VMs from consuming other reservations

If you downsize or delete a specific reservation that was consumed by one or more VMs, the orphaned VMs cannot consume any reservations.

Learn more about deleting reservations and resizing reservations.

Moving VMs or disks using the moveInstance API or the gcloud CLI causes unexpected behavior

Moving virtual machine (VM) instances using the gcloud compute instances move command or the project.moveInstance method might cause data loss, VM deletion, or other unexpected behavior.

To move VMs, we recommend that you follow the instructions in Move a VM instance between zones or regions.

Disks attached to VMs with n2d-standard-64 machine types don't consistently reach performance limits

Persistent disks attached to VMs with n2d-standard-64 machine types don't consistently reach the maximum performance limit of 100,000 IOPS. This is the case for both read and write IOPS.

Temporary names for disks

During virtual machine (VM) instance updates initiated using the gcloud compute instances update command or the instances.update API method, Compute Engine might temporarily change the name of your VM's disks, by adding of the following suffixes to the original name:

  • -temp
  • -old
  • -new

Compute Engine removes the suffix and restores the original disk names as the update completes.

Increased latency for some persistent disks caused by disk resizing

In some cases, resizing large persistent disks (~3 TB or larger) might be disruptive to the I/O performance of the disk. If you are impacted by this issue, your persistent disks might experience increased latency during the resize operation. This issue can impact persistent disks of any type.

Using MBR images with C3 VMs with Local SSD

A C3 VM created using c3-standard-44-lssd and larger machine types don't boot successfully with MBR images.

Able to attach unsupported PD-Standard and PD-Extreme disks to C3 and M3 VMs

Standard persistent disks (pd-standard) are the default boot disk type when using Google Cloud CLI or Compute Engine API. However, pd-standard disks aren't supported on C3 and M3 VMs. Additionally, C3 VMs don't support pd-extreme disks.

The following problems can occur when using Google Cloud CLI or Compute Engine API:

  • pd-standard is configured as the default boot disk type and the disk is created unless you specify a different, supported boot disk type, such as pd-balanced or pd-ssd.
  • Prior to the general availability (GA) of C3, you could attach pd-extreme disks to C3 VMs and pd-standard disks to C3 and M3 VMs.

If you created a C3 or M3 VM with an unsupported disk type, move your data to a new, supported disk type, as described in Change the type of your persistent disk. If you don't change the disk type, the VMs will continue working, but some operations such as disk detach and reattach will fail.

Workaround

To work around this issue, do one of the following:

  • Use the Google Cloud console to create C3 or M3 VMs and attach disks. The console creates C3 and M3 VMs with pd-balanced boot disks and doesn't allow attaching unsupported disk types to VMs.
  • If using Google Cloud CLI or Compute Engine API, explicitly configure a boot disk of type pd-balanced or pd-ssd when creating a VM.

Your automated processes might fail if they use API response data about your resource-based commitment quotas

Your automated processes that consume and use API response data about your Compute Engine resource-based commitment quotas might fail if each of the following things happen. Your automated processes can include any snippets of code, business logic, or database fields that use or store the API responses.

  1. The response data is from any of the following Compute Engine API methods:

  2. You use an int instead of a number to define the field for your resource quota limit in your API response bodies. You can find the field in the following ways for each method:

  3. You have unlimited default quota available for any of your Compute Engine committed SKUs.

    For more information about quotas for commitments and committed SKUs, see Quotas for commitments and committed resources.

Root cause

When you you have limited quota, if you define the items[].quotas[].limit or quotas[].limit field as an int type, the API response data for your quota limits might still fall within the range for int type and your automated process might not get disrupted. But when the default quota limit is unlimited, Compute Engine API returns a value for the limit field that falls outside of the range defined by int type. Your automated process can't consume the value returned by the API method and fails as a result.

How to work around this issue

You can work around this issue and continue generating your automated reports in the following ways:

  • Recommended: Follow the Compute Engine API reference documentation and use the correct data types for the API method definitions. Specifically, use the number type to define the items[].quotas[].limit and quotas[].limit fields for your API methods.

  • Decrease your quota limit to a value under 9,223,372,036,854,775,807. You must set quota caps for all projects that have resource-based commitments, across all regions. You can do this in one of the following ways:

Known issues for Linux VM instances

These are the known issues for Linux VMs.

RHEL 7 and CentOS VMs lose network access after reboot

CentOS and Red Hat Enterprise Linux (RHEL) 7 OS images that are provided by Google, have predictable network interface names disabled by default.

However, if your CentOS or RHEL 7 VMs have multiple network interface cards (NICs) and one of these NICs doesn't use the VirtIO interface, then network access might be lost on reboot. This happens because RHEL doesn't support disabling predictable network interface names if at least one NIC doesn't use the VirtIO interface.

Resolution

Network connectivity can be restored by stopping and starting the VM until the issue resolves. Network connectivity loss can be prevented from reoccurring by doing the following: 1. Edit the /etc/default/grub file and remove the kernel parameters net.ifnames=0 and biosdevname=0. 2. Regenerate the grub configuration. 3. Reboot the VM.

Public Google Cloud SUSE images don't include the required udev configuration to create symlinks for C3 and C3D Local SSD devices.

Resolution

To add udev rules for SUSE and custom images, see Symlinks not created C3 and C3D with Local SSD.

repomd.xml signature couldn't be verified

On Red Hat Enterprise Linux (RHEL) or CentOS 7 based systems, you might see the following error when trying to install or update software using yum. This error shows that you have an expired or incorrect repository GPG key.

Sample log:

[root@centos7 ~]# yum update


...

google-cloud-sdk/signature                                                                  | 1.4 kB  00:00:01 !!!
https://packages.cloud.google.com/yum/repos/cloud-sdk-el7-x86_64/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for google-cloud-sdk
Trying other mirror.

...

failure: repodata/repomd.xml from google-cloud-sdk: [Errno 256] No more mirrors to try.
https://packages.cloud.google.com/yum/repos/cloud-sdk-el7-x86_64/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for google-cloud-sdk

Resolution

To fix this, disable repository GPG key checking in the yum repository configuration by setting repo_gpgcheck=0. In supported Compute Engine base images this setting might be found in /etc/yum.repos.d/google-cloud.repo file. However, your VM can have this set in different repository configuration files or automation tools.

Yum repositories don't usually use GPG keys for repository validation. Instead, the https endpoint is trusted.

To locate and update this setting, complete the following steps:

  1. Look for the setting in your /etc/yum.repos.d/google-cloud.repo file.

    cat /etc/yum.repos.d/google-cloud.repo
    
    
    [google-compute-engine]
    name=Google Compute Engine
    baseurl=https://packages.cloud.google.com/yum/repos/google-compute-engine-el7-x86_64-stable
    enabled=1
    gpgcheck=1
    repo_gpgcheck=1
    gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
       https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
    [google-cloud-sdk]
    name=Google Cloud SDK
    baseurl=https://packages.cloud.google.com/yum/repos/cloud-sdk-el7-x86_64
    enabled=1
    gpgcheck=1
    repo_gpgcheck=1
    gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
       https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
    
    
  2. Change all lines that say repo_gpgcheck=1 to repo_gpgcheck=0.

    sudo sed -i 's/repo_gpgcheck=1/repo_gpgcheck=0/g' /etc/yum.repos.d/google-cloud.repo
  3. Check that the setting is updated.

    cat /etc/yum.repos.d/google-cloud.repo
    
    [google-compute-engine]
    name=Google Compute Engine
    baseurl=https://packages.cloud.google.com/yum/repos/google-compute-engine-el7-x86_64-stable
    enabled=1
    gpgcheck=1
    repo_gpgcheck=0
    gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
       https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
    [google-cloud-sdk]
    name=Google Cloud SDK
    baseurl=https://packages.cloud.google.com/yum/repos/cloud-sdk-el7-x86_64
    enabled=1
    gpgcheck=1
    repo_gpgcheck=0
    gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
       https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
    

Instances using OS Login return a login message after connection

On some instances that use OS Login, you might receive the following error message after the connection is established:

/usr/bin/id: cannot find name for group ID 123456789

Resolution

Ignore the error message.

Known issues for Windows VM instances

  • Instances running Windows 11, Version 22H2 fail to boot. Use Windows 11 version 21H2 until this issue is resolved.
  • Support for NVMe on Windows using the Community NVMe driver is in Beta, the performance might not match that of Linux instances. The Community NVMe driver has been replaced with the Microsoft StorNVMe driver in Google Cloud public images. We recommend that you replace the NVME driver on VMs created before May 2022 and use the Microsoft StorNVMe driver instead.
  • After you create an instance, you cannot connect to it instantly. All new Windows instances use the System preparation (sysprep) tool to set up your instance, which can take 5–10 mins to complete.
  • Windows Server images cannot activate without a network connection to kms.windows.googlecloud.com and stop functioning if they don't initially authenticate within 30 days. Software activated by the KMS must reactivate every 180 days, but the KMS attempts to reactivate every 7 days. Make sure to configure your Windows instances so that they remain activated.
  • Kernel software that accesses non-emulated model specific registers will generate general protection faults, which can cause a system crash depending on the guest operating system.

Errors when measuring NTP time drift using w32tm on Windows VMs

For Windows VMs on Compute Engine running VirtIO NICs, there is a known bug where measuring NTP drift produces errors when using the following command:

w32tm /stripchart /computer:metadata.google.internal

The errors appear similar to the following:

Tracking metadata.google.internal [169.254.169.254:123].
The current time is 11/6/2023 6:52:20 PM.
18:52:20, d:+00.0007693s o:+00.0000285s  [                  *                  ]
18:52:22, error: 0x80072733
18:52:24, d:+00.0003550s o:-00.0000754s  [                  *                  ]
18:52:26, error: 0x80072733
18:52:28, d:+00.0003728s o:-00.0000696s  [                  *                  ]
18:52:30, error: 0x80072733
18:52:32, error: 0x80072733

This bug only impacts Compute Engine VMs with VirtIO NICs. VMs that use gVNIC don't encounter this issue.

To avoid this issue, Google recommends using other NTP drift measuring tools, such as the Meinberg Time Server Monitor.

Moving a Windows VM to a third generation machine series causes boot issues

If you move a Windows VM to a third generation machine series (for example, C3 or H3) from a first or second generation machine series (for example, N1 or N2), the VM will fail to boot when you restart it.

To workaround this issue, do the following:

  1. Confirm that the boot disk of the VM you want to upgrade is compatible with third generation machine types by running the gcloud compute disks describe command:

    gcloud compute disks describe DISK_NAME --zone=ZONE
    

    Replace the following:

    • DISK_NAME: with the name of the boot disk
    • ZONE: the zone of the disk

    The output must contain the following to use the boot disk with a third generation machine series:

    guestOsFeatures:
    ...
    - type: GVNIC
    - type: WINDOWS
    
  2. Stop the VM that you want to upgrade.

  3. Detach the VM's boot disk.

  4. Use the Google Cloud console to create a Windows VM with the following properties:

    • Zone: the same zone as the original VM
    • Boot disk: the original VM's boot disk
    • Machine series: a third generation machine series

Poor networking throughput when using gVNIC

Windows Server 2022 and Windows 11 VMs that use gVNIC driver GooGet package version 1.0.0@44 or earlier might experience poor networking throughput when using Google Virtual NIC (gVNIC).

To resolve this issue, update the gVNIC driver GooGet package to version 1.0.0@45 or later by doing the following:

  1. Check which driver version is installed on your VM by running the following command from an administrator Command Prompt or Powershell session:

    googet installed
    

    The output looks similar to the following:

    Installed packages:
      ...
      google-compute-engine-driver-gvnic.x86_64 VERSION_NUMBER
      ...
    
  2. If the google-compute-engine-driver-gvnic.x86_64 driver version is 1.0.0@44 or earlier, update the GooGet package repository by running the following command from an administrator Command Prompt or Powershell session:

    googet update
    

Limited bandwidth with gVNIC on Microsoft Windows with C3 and C3D VMs

On Windows operating systems, the gVNIC driver does not reach the documented bandwidth limits. The gVNIC driver can achieve up to 85 Gbps of network bandwidth on C3 and C3D VMs running Microsoft Windows, for both the default network and per VM Tier_1 networking performance.

Replace the NVME driver on VMs created before May 2022

If you want to use NVMe on a VM that uses Microsoft Windows, and the VM was created prior to May 1, 2022, you must update the existing NVMe driver in the Guest OS to use the Microsoft StorNVMe driver.

You must update the NVMe driver on your VM before you change the machine type to a third generation machine series, or before creating a boot disk snapshot that will be used to create new VMs that use a third generation machine series.

Use the following commands to install the StorNVME driver package and remove the community driver, if it's present in the guest OS.

googet update
googet install google-compute-engine-driver-nvme

Lower performance for Local SSD on Microsoft Windows with C3 and C3D VMs

Local SSD performance is limited for C3 and C3D VMs running Microsoft Windows.

Performance improvements are in progress.

Lower IOPS performance for Hyperdisk Extreme on Microsoft Windows with C3 and M3 VMs

Hyperdisk Extreme performance is limited on Microsoft Windows VMs.

Performance improvements are in progress.

C3D 180 and 360 vCPU machines types don't support Windows OS images

C3D 180 vCPU machines types don't support Windows Server 2012 and 2016 OS images. C3D VMs created with 180 vCPUs and Windows Server 2012 and 2016 OS images will fail to boot. To workaround this issue, select a smaller machine type or use another OS image.

C3D VMs created with 360 vCPUs and Windows OS images will fail to boot. To work around this issue, select a smaller machine type or use another OS image.

Generic disk error on Windows Server 2016 and 2012 R2 for M3, C3, and C3D VMs

The ability to add or resize a Hyperdisk or Persistent Disk for a running M3, C3, or C3D VM doesn't work as expected on specific Windows guests at this time. Windows Server 2012 R2 and Windows Server 2016, and their corresponding non-server Windows variants, don't respond correctly to the disk attach and disk resize commands.

For example, removing a disk from a running M3 VM disconnects the disk from a Windows Server instance without the Windows operating system recognizing that the disk is gone. Subsequent writes to the disk return a generic error.

Resolution

You must restart the M3, C3, or C3D VM running on Windows after modifying a Hyperdisk or Persistent Disk for the disk modifications to be recognized by these guests.