Collecting core dumps


Use core dumps to analyze the causes of an unresponsive virtual machine (VM) instance.

To collect core dumps on Compute Engine, you must configure your VMs to receive a Non-Maskable Interrupt (NMI) signal, and then run a SendDiagnosticInterrupt command to prompt a kernel panic or blue screen in your operating system. A kernel panic or blue screen starts a core dump collection by the guest operating system. These core dumps can then be used for debugging purposes especially in scenarios that are hard to reproduce, such as a kernel freeze.

Before you begin

  • Sending NMI signals are counted in the default Queries API quota. For more information, see API rate limits.
  • If you haven't already, set up authentication. Authentication is the process by which your identity is verified for access to Google Cloud services and APIs. To run code or samples from a local development environment, you can authenticate to Compute Engine as follows.

    Select the tab for how you plan to use the samples on this page:

    gcloud

    1. Install the Google Cloud CLI, then initialize it by running the following command:

      gcloud init
    2. Set a default region and zone.

    REST

    To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.

      Install the Google Cloud CLI, then initialize it by running the following command:

      gcloud init

Required roles

To ensure that your user or service account has the necessary permission to send NMI signals to a VM, ask your administrator to grant your user or service account the Compute Instance Admin (v1) (roles/compute.instanceAdmin.v1) IAM role on your project. For more information about granting roles, see Manage access.

This predefined role contains the compute.instances.sendDiagnosticInterrupt permission, which is required to send NMI signals to a VM.

Your administrator might also be able to give your user or service account this permission with custom roles or other predefined roles.

Overview

To use core dumps to help debug an unresponsive VM or a security issue, you need to complete the following steps:

  1. Configure your VM to generate core dumps
  2. Send an NMI signal to generate core dumps
  3. Review the core dumps

Limitations

For VMs that have Secure Boot enabled, you must disable Secure boot before you send an NMI interrupt signal. For instructions, see Modifying Shielded VM options on a VM instance.

Configure VM

A VM's response to receiving an NMI interrupt signal depends on the VM's operating system configuration.

Each operating system writes its core dump logs in a different location. For example in Ubuntu operating systems the crash dump file is saved to /var/crash/ by default.

To configure your guest OS to generate a crash dump when an NMI signal is received, review the documentation for the supported operating system.

Operating system Links to instructions Additional notes
Ubuntu Ubuntu: Kernel crash dump For Linux VMs, you must configure the kernel to crash when it receives the NMI interrupt signal.

To configure the kernel to crash, add the following to your configuration file:
kernel.unknown_nmi_panic=1
SUSE Linux Enterprise Server (SLES) Configure crashkernel memory for kernel core dump analysis
Red Hat Enterprise Linux (RHEL) Use both of the following documents:
Container-Optimized OS (COS) Enabling Kernel Crash Dump on GCE COS Instances Only COS 93 and later support kdump generation using NMI signal.
Windows Generate a kernel or complete crash dump

Windows client VMs don't keep memory dump files unless they are members of an AD domain or the following is true:

  • The registry sets AlwaysKeepMemoryDump to 1
  • The disk has more than 25 GB of free space

For more information, see Kernel dump storage and clean up behavior in Windows 7

Send NMI to generate core dumps

After you configure the VM, you can then send the NMI signal to the VM by using either the Google Cloud CLI, or REST.

gcloud

To send the NMI signal, use the instances send-diagnostic-interrupt command.

gcloud compute instances send-diagnostic-interrupt VM_NAME \
    --zone=ZONE

Replace the following:

  • VM_NAME: instance ID or name of the VM that you want to collect core dumps from
  • ZONE: the zone where your VM is located

The output is similar to the following:

<Empty Response>

For a complete list of outputs, see the next section in this document about "NMI command responses".

REST

  1. Optional. If not already available, create an API key. For more information about creating API keys, see Creating an API key.

  2. To send the NMI signal, make a POST request to the sendDiagnosticInterrupt method.

    POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/sendDiagnosticInterrupt?key=API_KEY
    

    For, example, you can use the curl command to make the request as follows:

    curl --request POST 'https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/sendDiagnosticInterrupt?key=API_KEY' \
      --header 'Authorization: Bearer $(gcloud auth print-access-token)' \
      --header 'Accept: application/json' \
      --compressed
    

    Replace the following:

    • PROJECT_ID: ID of the project to create the VM in
    • ZONE: the zone where your VM is located
    • VM_NAME: instance ID or name of the VM that you want to collect core dumps from
    • API_KEY: your API key

    The output is similar to the following:

    <Empty Response>

    For a complete list of outputs, see the next section in this document about "NMI command responses".

NMI command responses

One of the following responses are returned when you attempt to send an NMI signal.

State Body Notes
SUCCESS <Empty Response> SUCCESS shows that the NMI signal is delivered to the operating system. It does not guarantee that the core dump is collected, or that the VM shuts down or reboots. These behaviors are determined by the operating system configuration.
FAIL UNSUPPORTED_OPERATION This occurs when the operating system fails to receive the NMI signal. There are multiple reasons for this. Common scenarios are that the VM is being live migrated or the VM is not properly configured to receive NMI signals.
To resolve this, you can try the following:
  • Verify that the VM is properly configured. See Configure VM.
  • Wait and retry the SendDiagnosticInterrupt request.
FAIL Required 'compute.instances.sendDiagnosticInterrupt' permission for [..] The command failed because the user making the request does not have sufficient permissions.

To resolve this, you can assign a role to the user that contains the compute.instances.sendDiagnosticInterrupt permission.

Review core dumps

Review the crash dump file in the configured or default location for your operating system.

For example in Ubuntu operating systems, by default, the crash dump file is saved to /var/crash/.