VM shutdowns and reboots can be caused by system events or admin activities. System event shutdowns and reboots are generated by Google systems or your VM's operating system. Admin activity shutdowns and reboots are generated by a user- or service account-generated API call. All shutdowns and reboots are logged, except for reboots that are initiated from within the VM.
Before you begin
- If you want to use the command-line examples in this guide, do the following:
Diagnosing VM shutdowns and reboots
To diagnose the cause of a VM's spontaneous shutdown or reboot, you must query
your VM's logs. After you query the logs, review the
principalEmail fields to determine what event and which user or service
initiated the shutdown or reboot.
Querying Cloud Audit Logs
Query Cloud Audit Logs to display a list of system events and admin activities that might have caused the shutdown or reboot.
Permissions required for this task
To perform this task, you must have the following permissions:
- The Logging/Logs Viewer role or the Project/Viewer role.
In the Google Cloud Console, go to the Logs explorer page.
In the Query builder field, enter the following query:
resource.type="gce_instance" "VM_NAME" logName:("logs/cloudaudit.googleapis.com%2Fsystem_event" OR "logs/cloudaudit.googleapis.com%2Factivity")
VM_NAMEwith the name of the VM that shutdown or rebooted.
If the shutdown or reboot happened more than an hour ago, set a custom time frame by clicking the clock symbol and entering a custom range.
Click Run query. The results are displayed in the Query results section.
Click theexpander arrow next to each result to show detailed information.
Each result displays a
methodfield and a
principalEmailfield, which show the methods and users responsible for shutdowns and reboots. Continue to Reviewing audit logs to learn more about the methods that cause shutdowns and reboots and what you can do to prevent them.
View system event and admin activity Cloud Audit Logs using the
gcloud logging readcommand:
gcloud logging read --freshness=TIME 'resource.type="gce_instance" "VM_NAME" logName:("logs/cloudaudit.googleapis.com%2Fsystem_event" OR "logs/cloudaudit.googleapis.com%2Factivity")'
Replace the following:
TIME: the amount of time you want to query. For example,
1hqueries log entries in the past hour. For information about date and time formats, see gcloud topic datetimes.
VM_NAME: the name of the VM that shutdown or rebooted.
methodNamefields in the system event logs. Continue to Reviewing Cloud Audit Logs to learn more about the methods that cause shutdowns and reboots and what you can do to prevent them.
Reviewing Cloud Audit Logs
principalEmail fields of the Cloud Audit Logs to
determine why your VM was shut down or rebooted.
methodfields of the Cloud Audit Logs and compare them with the methods listed in the following table.
Method Shutdown type Description
If your VM belongs to a managed instance group (MIG), the MIG recreates the VM if the VM's state changes from
RUNNINGand the MIG did not initiate the change in state.
Changes of instance state that are not initiated by the MIG include:
A host error means that there was a hardware or software issue on the physical machine hosting your VM that caused your VM to crash. If your VM is set to automatically restart, which is the default setting, Google restarts your VM on a different physical machine, typically within three minutes from the time the error was detected. In cases with certain hardware issues, the attempt to restart your VM might get delayed by 5.5 minutes to 16.5 minutes.
Certain resources behave differently, such as local SSDs. If there is a host error, Compute Engine makes a best effort to reconnect to the VM and preserve the local SSD data, but if the underlying drive does not recover within 60 minutes, the VM restarts without the local SSD data. While Compute Engine is recovering your VM and local SSD, which can take up to 60 minutes, the host system and the underlying drive are unresponsive. For more information about how local SSD disks behave in the event of host errors, see Local SSD data persistence.
Physical hardware and software failures can happen occasionally but are rare occurrences. To protect your applications and services from these potentially disruptive system events, review the following resources:
System event Your VM's operating system initiated the shutdown.
If you set your VM's
onHostMaintenancemaintenance policy to
TERMINATE, Compute Engine stops your VM when there is a maintenance event where Google must move your VM to another host.
If you want to change your VM's
onHostMaintenancepolicy, see Updating options for an instance.
Compute Engine stopped your preemptible instance. Compute Engine always stops preemptible instances after they run for 24 hours.
If you require a VM that runs for longer periods of time, see Creating and starting a VM instance.
A user or service account stopped your VM.
Continue to the next step to identify the user or service account that stopped your VM. For information about restarting your VM, see Restarting a stopped instance.
A user or service account deleted your VM.
Continue to the next step to identify the user or service account that deleted your VM. For information about creating a new VM, see Creating and staring a VM.
A user or service account reset your VM.
Continue to the next step to identify the user or service account that stopped your VM.
principalEmailfields of the Cloud Audit Logs to identify the user or service that initiated the shutdown or reboot. The following table include common Google managed services that initiate shutdowns or reboots.
A system event caused the shutdown or reboot.
A Google-managed service account initiated the shutdown.
To determine which project the service initiated the shutdown from, review the service account's
To determine which Google service made the request, review the
If a user triggered the shutdown or reboot, their email address appears in the
principalEmailfield. For example,
Administrators can prevent users from changing the state of project VMs by changing Identity and Access Management permissions on user accounts. For more information, see Granting, changing, and revoking access to resources.