Troubleshoot faulty host API

This document explains how to resolve issues when calling the reporting a faulty node API. It also documents some known issues for this API.

Known issues

This page describes known issues that you might run into while using the faulty node API. When a problem occurs, your VMs might return a FAILED_WITH_UNEXPECTED_STATUS or they might stay in a stopped state.

Passing empty faultReasons when invoking the API results in unsuccessful fault reporting

When you call the API, the faultReasons field is required. The API doesn't perform checks, so calls to the API proceeds if this field is left empty in the API request. This results in the API request getting stuck in the backend. Retry of the API call on the same VM instance will fail the request with a duplicate request response message.

Workaround: If you omit the faultReasons parameter, cancel the API request by restarting the VM instance. After you restart the VM, ensure that you specify a value for faultReasons.

Duplicate calls for the same VM causes the operation to fail

If you call the API more than once on the same VM instance prior to the completion of previous calls, duplicate calls will fail.

Workaround: You must wait until the operation completes before trying to re-run the call. If the operation looks stuck, restart the VM before you make another call to the API.

Unable to cancel a call to the API

If an operation is stuck due to any issues with the request made to the API, there is no way to cancel the operation.

Workaround: Release the API request by restarting the VM instance. After you restart the VM, you can make a new call to the API.

reportHostAsFaulty operation fails but other operations succeed

The reportHostAsFaulty operation might fail after about 10 minutes with a RESOURCE_NOT_READY error even though all other operations succeed and the VM restarts on a new host.

Reason for this behavior: This might take place if you change the VM's maintenance policy, either the automaticRestart value or the onHostMaintenance value, without restarting the VM. Although the value is saved and the operation is successful, some calls may return a RESOURCE_NOT_READY error message.

Error messages

The faulty node API might return one of the following error messages.

VM state not supported

Error message:

INSTANCE_SHOULD_BE_RUNNING

Resolution: Ensure that the VM is in a running state. This message is returned if the VM is in any other state.

Machine type incorrect

Error message:

MACHINE_TYPE_NOT_SUPPORTED

Resolution: This method is supported only on A3 High, A3 Mega and A3 Ultra VM families that are part of a reserved block of capacity. Reserved blocks must be created for you by a technical account manager (TAM). Calling this API on any other VM instance will fail with this error.

Fault reason is missing

Error message:

FAULT_REASONS_EMPTY_SHOULD_BE_SPECIFIED

Resolution: faultReasons is a required field. The faulty node API returns this error when you omit the faultReasons field. To fix this, restart the VM and specify a value for faultReasons.

VM not a part of a reservation block

Error message:

INSTANCES_WITHOUT_RESERVATION_NOT_SUPPORTED

Resolution: This method is only supported on A3 High, A3 Mega, and A3 Ultra VMs that are part of a reserved block of capacity. Reserved blocks must be created for you by a technical account manager (TAM). Running this method on a VM that is not part of a block reservation leads to this error.

Rate limit exceeded

Error message:

RATE_LIMIT_EXCEEDED

Resolution: You have exceeded the allowed limits for calling this method.