Collecting core dumps


Use core dumps to analyze the causes of an unresponsive virtual machine (VM) instance.

To collect core dumps on Compute Engine, you must configure your VMs to receive a Non-Maskable Interrupt (NMI) signal, and then run a SendDiagnosticInterrupt command to prompt a kernel panic or blue screen in your operating system. A kernel panic or blue screen starts a core dump collection by the guest operating system. These core dumps can then be used for debugging purposes especially in scenarios that are hard to reproduce, such as a kernel freeze.

Before you begin

Overview

To use core dumps to help debug an unresponsive VM or a security issue, you need to complete the following steps:

  1. Set the required IAM permissions
  2. Configure your VM to generate core dumps
  3. Send an NMI signal to generate core dumps
  4. Review the core dumps

Limitations

For VMs that have Secure Boot enabled, you must disable Secure boot before you send an NMI interrupt signal. For instructions, see Modifying Shielded VM options on a VM instance.

Permissions

To send NMI signals to a VM, you need the compute.instances.sendDiagnosticInterrupt permission on your user or service account.

You can also use a predefined role. To find predefined roles that contain this permission, see Compute Engine IAM Roles.

Configure VM

A VM's response to receiving an NMI interrupt signal depends on the VM's operating system configuration.

Each operating system writes its core dump logs in a different location. For example in Ubuntu operating systems the crash dump file is saved to /var/crash/ by default.

To configure your guest OS to generate a crash dump when an NMI signal is received, review the documentation for the supported operating system.

Operating system Links to instructions Additional notes
Ubuntu Ubuntu: Kernel crash dump For Linux VMs, you must configure the kernel to crash when it receives the NMI interrupt signal.

To configure the kernel to crash, add the following to your configuration file:
kernel.unknown_nmi_panic=1
SUSE Linux Enterprise Server (SLES) Configure crashkernel memory for kernel core dump analysis
Red Hat Enterprise Linux (RHEL) Use both of the following documents:
Container-Optimized OS (COS) Enabling Kernel Crash Dump on GCE COS Instances
Windows Generate a kernel or complete crash dump

Send NMI to generate core dumps

After you configure the VM, you can then send the NMI signal to the VM by using either the gcloud command-line tool, or the Compute Engine API.

gcloud

To send the NMI signal, use the instances send-diagnostic-interrupt command.

gcloud compute instances send-diagnostic-interrupt VM_NAME \
    --zone=ZONE

Replace the following:

  • VM_NAME: instance ID or name of the VM that you want to collect core dumps from
  • ZONE: the zone where your VM is located

The output is similar to the following:

<Empty Response>

For a complete list of outputs, see the next section in this document about "NMI command responses".

API

  1. Optional. If not already available, create an API key. For more information about creating API keys, see Creating an API key.

  2. To send the NMI signal, make a POST request to the sendDiagnosticInterrupt method.

    POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/sendDiagnosticInterrupt?key=API_KEY
    

    For, example, you can use the curl command to make the request as follows:

    curl --request POST 'https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME]/sendDiagnosticInterrupt?key=API_KEY' \
      --header 'Authorization: Bearer $(gcloud auth application-default print-access-token)' \
      --header 'Accept: application/json' \
      --compressed
    

    Replace the following:

    • PROJECT_ID: ID of the project to create the VM in
    • ZONE: the zone where your VM is located
    • VM_NAME: instance ID or name of the VM that you want to collect core dumps from
    • API_KEY: your API key

    The output is similar to the following:

    <Empty Response>

    For a complete list of outputs, see the next section in this document about "NMI command responses".

NMI command responses

One of the following responses are returned when you attempt to send an NMI signal.

State Body Notes
SUCCESS <Empty Response> SUCCESS shows that the NMI signal is delivered to the operating system. It does not guarantee that the core dump is collected, or that the VM shuts down or reboots. These behaviors are determined by the operating system configuration.
FAIL UNSUPPORTED_OPERATION This occurs when the operating system fails to receive the NMI signal. There are multiple reasons for this. Common scenarios are that the VM is being live migrated or the VM is not properly configured to receive NMI signals.
To resolve this, you can try the following:
  • Verify that the VM is properly configured. See Configure VM.
  • Wait and retry the SendDiagnosticInterrupt request.
FAIL Required 'compute.instances.sendDiagnosticInterrupt' permission for [..] The command failed because the user making the request does not have sufficient permissions.

To resolve this, you can assign a role to the user that contains the compute.instances.sendDiagnosticInterrupt permission.

Review core dumps

Review the crash dump file in the configured or default location for your operating system.

For example in Ubuntu operating systems, by default, the crash dump file is saved to /var/crash/.