Troubleshoot faulty host API

This document explains how to resolve issues when reporting a faulty host. It also documents some known issues for this API.

Known issues

This page describes known issues that you might run into while using the faulty node API. When a problem occurs, your VMs might return a FAILED_WITH_UNEXPECTED_STATUS or they might stay in a stopped state.

Duplicate calls for the same VM causes the operation to fail

If you call the API more than once on the same VM instance prior to the completion of previous calls, duplicate calls will fail.

Workaround: You must wait until the operation completes before trying to re-run the call. If the operation looks stuck, restart the VM before you make another call to the API.

INTERNAL_ERROR returned while the reportHostAsFaulty operation is in progress

If a VM is deleted while the reportHostAsFaulty operation is in progress, then an INTERNAL_ERROR might be returned and the reportHostAsFaulty operation fails.

Workaround: No workaround is available. Ensure that the operation completes before you try to delete a VM.

Error messages

The faulty node API might return one of the following error messages.

VM state not supported

The following error occurs when the VM is an unsupported state.

Error message:

INSTANCE_SHOULD_BE_RUNNING

Resolution: Ensure that the VM is in a RUNNING state.

Machine type incorrect

The following error occurs when the VM has an unsupported machine type.

Error message:

MACHINE_TYPE_NOT_SUPPORTED

Resolution: The faulty node API is supported only for VMs with A4 or A3 Ultra machine types.

VM not a part of a reservation

The following error occurs when the VM was not reserved.

Error message:

INSTANCES_WITHOUT_RESERVATION_NOT_SUPPORTED

Resolution: The faulty node API is only supported for VMs that use the reservation-bound provisioning model.

If you want to report a faulty host for A4 or A3 Ultra VMs that use other provisioning models, contact your Google Cloud account team.

Fault reason is missing

Error message:

FAULT_REASONS_EMPTY_SHOULD_BE_SPECIFIED

Resolution: faultReasons is a required field. The faulty node API returns this error when you omit the faultReasons field. To fix this, restart the VM and specify a value for faultReasons.

Rate limit exceeded

Error message:

RATE_LIMIT_EXCEEDED

Resolution: You have exceeded the allowed limits for calling this method or Google doesn't have enough capacity to fulfill the request.