Use core dumps to analyze the causes of an unresponsive virtual machine (VM) instance.
To collect core dumps on Compute Engine, you must configure your
VMs to receive a Non-Maskable Interrupt (NMI) signal, and then run a
SendDiagnosticInterrupt
command to prompt a kernel panic or blue screen in
your operating system. A kernel panic or blue screen starts a core dump
collection by the guest operating system. These core dumps can then be used for
debugging purposes especially in scenarios that are hard to reproduce, such as
a kernel freeze.
Before you begin
- If you want to use the command-line examples in this guide, do the following:
- Install or update to the latest version of the Google Cloud CLI.
- Set a default region and zone.
- If you want to use the API examples in this guide, set up API access.
- Sending NMI signals are counted in the default Queries API quota. For more information, see API rate limits.
Overview
To use core dumps to help debug an unresponsive VM or a security issue, you need to complete the following steps:
- Set the required IAM permissions
- Configure your VM to generate core dumps
- Send an NMI signal to generate core dumps
- Review the core dumps
Limitations
For VMs that have Secure Boot enabled, you must disable Secure boot before you send an NMI interrupt signal. For instructions, see Modifying Shielded VM options on a VM instance.
Permissions
To send NMI signals to a VM, you need the compute.instances.sendDiagnosticInterrupt
permission on your user or service account.
You can also use a predefined role. To find predefined roles that contain this permission, see Compute Engine IAM Roles.
Configure VM
A VM's response to receiving an NMI interrupt signal depends on the VM's operating system configuration.
Each operating system writes its core dump logs in a different location. For
example in Ubuntu operating systems the crash dump file is saved to
/var/crash/
by default.
To configure your guest OS to generate a crash dump when an NMI signal is received, review the documentation for the supported operating system.
Operating system | Links to instructions | Additional notes |
---|---|---|
Ubuntu | Ubuntu: Kernel crash dump | For Linux VMs, you must configure the kernel to crash when it receives the
NMI interrupt signal. To configure the kernel to crash, add the following to your configuration file: kernel.unknown_nmi_panic=1 |
SUSE Linux Enterprise Server (SLES) | Configure crashkernel memory for kernel core dump analysis | |
Red Hat Enterprise Linux (RHEL) | Use both of the following documents: |
|
Container-Optimized OS (COS) | Enabling Kernel Crash Dump on GCE COS Instances | Only COS 93 and later support kdump generation using NMI signal. |
Windows | Generate a kernel or complete crash dump | Windows client VMs don't keep memory dump files unless they are members of an AD domain or the following is true:
For more information, see Kernel dump storage and clean up behavior in Windows 7 |
Send NMI to generate core dumps
After you configure the VM, you can then send the NMI signal to the VM by using either the Google Cloud CLI, or the Compute Engine API.
gcloud
To send the NMI signal, use the
instances send-diagnostic-interrupt
command.
gcloud compute instances send-diagnostic-interrupt VM_NAME \ --zone=ZONE
Replace the following:
VM_NAME
: instance ID or name of the VM that you want to collect core dumps fromZONE
: the zone where your VM is located
The output is similar to the following:
<Empty Response>
For a complete list of outputs, see the next section in this document about "NMI command responses".
API
Optional. If not already available, create an API key. For more information about creating API keys, see Creating an API key.
To send the NMI signal, make a
POST
request to thesendDiagnosticInterrupt
method.POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/sendDiagnosticInterrupt?key=API_KEY
For, example, you can use the
curl
command to make the request as follows:curl --request POST 'https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/sendDiagnosticInterrupt?key=API_KEY' \ --header 'Authorization: Bearer $(gcloud auth application-default print-access-token)' \ --header 'Accept: application/json' \ --compressed
Replace the following:
PROJECT_ID
: ID of the project to create the VM inZONE
: the zone where your VM is locatedVM_NAME
: instance ID or name of the VM that you want to collect core dumps fromAPI_KEY
: your API key
The output is similar to the following:
<Empty Response>
For a complete list of outputs, see the next section in this document about "NMI command responses".
NMI command responses
One of the following responses are returned when you attempt to send an NMI signal.
State | Body | Notes |
---|---|---|
SUCCESS | <Empty Response> |
SUCCESS shows that the NMI signal is delivered to the
operating system. It does not guarantee that the core dump is collected, or
that the VM shuts down or reboots. These behaviors are determined by the
operating system configuration. |
FAIL | UNSUPPORTED_OPERATION
|
This occurs when the operating system fails to receive the NMI signal. There
are multiple reasons for this. Common scenarios are that the VM is being
live migrated or the VM
is not properly configured to receive NMI signals.
To resolve this, you can try the following:
|
FAIL | Required 'compute.instances.sendDiagnosticInterrupt' permission for [..]
|
The command failed because the user making the request does not have
sufficient permissions. To resolve this, you can assign a role to the user that contains the compute.instances.sendDiagnosticInterrupt permission. |
Review core dumps
Review the crash dump file in the configured or default location for your operating system.
For example in Ubuntu operating systems, by default, the crash dump file is
saved to /var/crash/
.