Monitor health status

User-managed notebooks instances provide several methods for monitoring the health of your notebooks. This page describes how to use each method.

Methods for monitoring health status

You can monitor the health of your user-managed notebooks instances in a few different ways. This page describes how to use the following methods:

Setting up the gcloud tool

The gcloud command-line tool is required for some steps in this guide. Install and initialize the Cloud SDK.

Using guest attributes to report system health

You can use guest attributes to report the system health of these core services:

  • Docker service
  • Docker reverse proxy agent
  • Jupyter service
  • Jupyter API

Guest attributes are a specific type of custom metadata that applications can write to while running on your user-managed notebooks instance. To learn more about guest attributes, see Storing and retrieving metadata.

How instances use guest attributes to report system health

The notebooks-collection-agent service runs a Python process in the background that verifies the status of the user-managed notebooks instance's core services and updates the guest attributes as either 1 if no problems are detected or -1 if a failure is detected.

To use the notebooks-collection-agent service to report on your user-managed notebooks instance's health, you must enable the following guest attributes while creating a new user-managed notebooks instance:

  • enable-guest-attributes=TRUE: This enables guest attributes on your user-managed notebooks instance. All new instances enable this attribute by default.
  • report-system-health=TRUE: This records system health check results to your guest attributes.

The notebooks-collection-agent service does not need any special permissions to write to the instance's guest attributes.

Create a new instance with system health guest attributes enabled

To use system health guest attributes to report on your user-managed notebooks instance's health, you must enable the guest attributes while creating a new user-managed notebooks instance.

Use the Google Cloud Console

Use the following steps to create a new user-managed notebooks instance and enable it to use guest attributes to report system health.

  1. Follow the steps in Before you begin to create a Google Cloud project and enable the Notebooks API.
  2. Go to the User-managed notebooks page in the Google Cloud Console.

    Go to User-managed notebooks
  3. Click  New notebook, and then select Customize instance.

    Note: You can type notebook.new into a browser to go directly to the user-managed notebooks creation page.

  4. On the Create a notebook instance page, expand the Environment upgrade and system health section.
  5. Select Enable system health report. This setting enables the system health guest attributes that let you monitor your instance's health status.
  6. Complete the rest of the dialog to specify the properties of the type of instance that you want.
  7. At the bottom of the dialog, click Create.

Use the gcloud command-line tool

Use the following gcloud tool command to create a new user-managed notebooks instance and enable it to use guest attributes to report system health.

        gcloud notebooks instances create INSTANCE_NAME \
            --vm-image-project=deeplearning-platform-release \
            --vm-image-family=IMAGE_FAMILY \
            --machine-type=MACHINE_TYPE \
            --location=ZONE \
            --metadata=enable-guest-attributes=TRUE,report-system-health=TRUE
        

In this command, replace the following placeholders:

  • INSTANCE_NAME: The name of your new user-managed notebooks instance
  • IMAGE_FAMILY: The image family that you want to use to create your user-managed notebooks instance
  • MACHINE_TYPE: The machine type for your new instance; for example, n1-standard-4
  • ZONE: The zone where you want your new instance to be located; for example, us-west1-a

Monitor system health through guest attributes

For user-managed notebooks instances that have the related guest attributes enabled, you can retrieve the values of your system health guest attributes in the following ways.

Use the Google Cloud Console

To monitor your health in the Google Cloud Console, complete these steps.

  1. Go to the User-managed notebooks page in the Google Cloud Console.

    Go to User-managed notebooks
  2. Under Instance name, click the instance that you want to monitor.
  3. On the Notebook instance details page, click the Instance health tab.
  4. Review the status of your instance and its core services.

Use the gcloud command-line tool

To monitor your system health, you can use the gcloud tool to retrieve the values of your guest attributes.

      gcloud compute instances get-guest-attributes INSTANCE_NAME \
          --zone ZONE
      

In this command, replace the following placeholders:

  • INSTANCE_NAME: the name of your user-managed notebooks instance
  • ZONE: the zone where your instance is located

If your core services are healthy, the results look like the following. A value of 1 means no failure was detected.

      NAMESPACE   KEY                         VALUE
      notebooks   docker_proxy_agent_status   1
      notebooks   docker_status               1
      notebooks   jupyterlab_api_status       1
      notebooks   jupyterlab_status           1
      notebooks   system-health               1
      notebooks   updated                     2020-10-01 17:00:00.12345
      

If any of the four core services fail, system-health will report a -1 value to indicate system failure. In most cases, a system failure means that JupyterLab is not accessible.

An example of a failure result might look like the following.

      NAMESPACE   KEY                         VALUE
      notebooks   docker_proxy_agent_status   -1
      notebooks   docker_status               -1
      notebooks   jupyterlab_api_status       1
      notebooks   jupyterlab_status           1
      notebooks   system-health               -1
      notebooks   updated                     2020-10-01 17:00:00.12345
      

Use the Notebooks API

To monitor your system health, you can use the getInstanceHealth method to retrieve the values of your guest attributes.

The following example shows how to do this using the gcloud tool.

      gcloud notebooks instances is-healthy example-instance \
          --location=ZONE
      

In this command, replace the following placeholders:

  • INSTANCE_NAME: The name of your user-managed notebooks instance
  • ZONE: The zone where your instance is located; for example, us-west1-a

If your core services are healthy, the results look like the following. A 1 value means no failure was detected.

      {
              "health_state": HEALTHY,
              "docker-proxy-agent": 1,
              "docker-service": 1,
              "jupyter-service": 1,
              "jupyter-api": 1,
              "last-updated": "2020-10-01 17:00:30.12345"
      }
      

An example of a failure result might look like the following.

      {
              "healthy": UNHEALTHY,
              "docker-proxy-agent": 1,
              "docker-service": 1,
              "jupyter-service": -1,
              "jupyter-api": -1,
              "last-updated": "2020-10-01 17:00:30.12345"
      }
      

Reporting custom metrics to Monitoring

User-managed notebooks instances let you collect system status and JupyterLab metrics and report them to Cloud Monitoring. These custom metrics are different from the standard metrics that are reported when you install Monitoring on your user-managed notebooks instance.

To use this feature, you must enable it while creating a new user-managed notebooks instance. You must also ensure that the user-managed notebooks instance's service account has Monitoring Metric Writer (roles/monitoring.metricWriter) permissions.

The custom metrics reported to Monitoring include:

  • The system health of these user-managed notebooks core services:

    • Docker service
    • Docker reverse proxy agent
    • Jupyter service
    • Jupyter API
  • The following JupyterLab metrics:

    • Number of kernels
    • Number of terminals
    • Number of connections
    • Number of sessions
    • Maximum memory
    • High memory
    • Current memory

Create a new notebook that reports custom metrics to Monitoring

To report custom metrics to Monitoring, you must enable the report-notebook-metrics metadata setting while creating a new user-managed notebooks instance.

Use the Google Cloud Console

Use the following steps to create a new user-managed notebooks instance that reports custom metrics to Monitoring.

  1. Follow the steps in Before you begin to create a Google Cloud project and enable the Notebooks API.
  2. Go to the User-managed notebooks page in the Google Cloud Console.

    Go to User-managed notebooks
  3. Click  New notebook, and then select Customize instance.

    Note: You can type notebook.new into a browser to go directly to the user-managed notebooks creation page.

  4. On the Create a notebook instance page, expand the Environment upgrade and system health section.
  5. Select Report custom metrics to Cloud Monitoring. This option enables the report-notebook-metrics metadata setting.
  6. Complete the rest of the dialog to specify the properties of the type of instance that you want.
  7. At the bottom of the dialog, click Create.
  8. Grant Monitoring Metric Writer permissions (roles/monitoring.metricWriter) to the service account for the user-managed notebooks instance that you just created. See Granting, changing, and revoking access to resources if you need help.

Use the gcloud command-line tool

Use the following gcloud tool command to create a new user-managed notebooks instance and enable the report-notebook-metrics metadata setting to report metrics to Monitoring.

        gcloud notebooks instances create INSTANCE_NAME \
            --vm-image-project=deeplearning-platform-release \
            --vm-image-family=IMAGE_FAMILY \
            --machine-type=MACHINE_TYPE \
            --location=ZONE \
            --metadata=report-notebook-metrics=TRUE
        

In this command, replace the following placeholders:

  • INSTANCE_NAME: The name of your new user-managed notebooks instance
  • IMAGE_FAMILY: The image family that you want to use to create your user-managed notebooks instance
  • MACHINE_TYPE: The machine type for your new instance; for example, n1-standard-4
  • ZONE: The zone where you want your new instance to be located; for example, us-west1-a

After you've created your new user-managed notebooks instance, grant Monitoring Metric Writer permissions (roles/monitoring.metricWriter) to the service account for the user-managed notebooks instance. See Granting, changing, and revoking access to resources if you need help.

Installing Monitoring on an instance

You can report system and application metrics by installing Cloud Monitoring on your user-managed notebooks instance while creating a new user-managed notebooks instance. These metrics are different from the custom metrics that are reported when you enable the report-notebook-metrics metadata setting.

This option automatically installs Monitoring. The installation requires 256 MB of disk space. An internet connection is required for the metrics to be reported to Monitoring.

Use the Google Cloud Console

Use the following steps to create a new user-managed notebooks instance with Monitoring installed.

  1. Follow the steps in Before you begin to create a Google Cloud project and enable the Notebooks API.
  2. Go to the User-managed notebooks page in the Google Cloud Console.

    Go to User-managed notebooks
  3. Click New notebook, and then select Customize instance.

    Note: You can type notebook.new into a browser to go directly to the user-managed notebooks creation page.

  4. On the Create a notebook instance page, expand the Environment upgrade and system health section.
  5. Select Install Cloud Monitoring agent.
  6. Complete the rest of the dialog to specify the properties of the type of instance that you want.
  7. At the bottom of the dialog, click Create.

Use the gcloud command-line tool

Use the following gcloud tool command to create a new user-managed notebooks instance with Monitoring installed. Note that this command sets the install-monitoring-agent metadata entry to TRUE.

        gcloud notebooks instances create INSTANCE_NAME \
            --vm-image-project=deeplearning-platform-release \
            --vm-image-family=IMAGE_FAMILY \
            --machine-type=MACHINE_TYPE \
            --location=ZONE \
            --metadata=install-monitoring-agent=TRUE
        

In this command, replace the following placeholders:

  • INSTANCE_NAME: The name of your new user-managed notebooks instance
  • IMAGE_FAMILY: The image family that you want to use to create your user-managed notebooks instance
  • MACHINE_TYPE: The machine type for your new instance; for example, n1-standard-4
  • ZONE: The zone where you want your new instance to be located; for example, us-west1-a

Using the diagnostic tool to monitor system health

User-managed notebooks instances include a built-in diagnostic tool that can help you monitor the system health of your instances.

Tasks performed by the diagnostic tool

The diagnostic tool performs the following tasks:

  • Verifies the status of these user-managed notebooks core services:

    • Docker service
    • Docker reverse proxy agent
    • Jupyter service
    • Jupyter API
  • Checks whether the disk space for boot and data disks are used beyond an 85% threshold.

  • Installs lsof (internet connection required).

  • Collects the following instance logs:

    • Network information (ifconfig, netstat)
    • Logs in the /var/log/ folder
    • Docker status information
    • lsof (open files) data
    • Docker service status
    • Proxy reverse agent status
    • Jupyter service status
    • Jupyter API status
    • Proxy agent configuration file
    • Python processes
  • Runs the following commands and collects the results:

    • pip freeze
    • conda list
    • gcloud compute instances describe INSTANCE_NAME
    • gcloud config list

Run the diagnostic tool

To run the diagnostic tool, complete the following steps:

  1. Use ssh to connect to your user-managed notebooks instance.

  2. In the SSH terminal, run the following commands.

    sudo -i
    cd /opt/deeplearning/bin/
    ./diagnostic_tool.sh
    
  3. The diagnostic tool collects the logs, compresses them in a .tar.gz file, and places the file in the /tmp/ folder.

  4. Extract the file and then evaluate the contents. The contents include:

    • log folder: The logs from the var/log/ folder
    • report.log: Output for all commands collected
    • proxy-agent-config.json: Proxy configuration information
    • Docker log: A -json.log file that includes Docker container logs

You can use the following options with the diagnostic tool.

Option Description
-r A repair option that will try to restore failed user-managed notebooks core services status
-s Runs without a confirmation
-b Uploads the .tar.gz file to a Cloud Storage bucket.
-v A debug option for troubleshooting the tool in case of failures
-c Captures 30 seconds of packet traffic into your user-managed notebooks instance, filtering SSH
-d A destination folder for where to save the logs
-h Help

What's next