Monitor health status

User-managed notebooks instances provide several methods for monitoring the health of your notebooks. This page describes how to use each method.

Methods for monitoring health status

You can monitor the health of your user-managed notebooks instances in a few different ways. This page describes how to use the following methods:

Set up the gcloud CLI

To complete some of the steps on this page, you need to use the Google Cloud CLI.

Install the Google Cloud CLI, then initialize it by running the following command:

gcloud init

Use guest attributes to report system health

You can use guest attributes to report the system health of the following core services:

  • Docker service
  • Docker reverse proxy agent
  • Jupyter service
  • Jupyter API

Guest attributes are a specific type of custom metadata that applications can write to while running on your user-managed notebooks instance. To learn more about guest attributes, see About VM metadata.

How instances use guest attributes to report system health

The notebooks-collection-agent service runs a Python process in the background that verifies the status of the user-managed notebooks instance's core services and updates the guest attributes as either 1 if no problems are detected or -1 if a failure is detected.

To use the notebooks-collection-agent service to report on your user-managed notebooks instance's health, you must enable the following guest attributes while creating a user-managed notebooks instance:

  • enable-guest-attributes=TRUE: This enables guest attributes on your user-managed notebooks instance. All new instances enable this attribute by default.
  • report-system-health=TRUE: This records system health check results to your guest attributes.

The notebooks-collection-agent service doesn't need any special permissions to write to the instance's guest attributes.

Create a user-managed notebooks instance with system health guest attributes enabled

To use system health guest attributes to report on your user-managed notebooks instance's health, you must select the Enable system health report checkbox when you create a user-managed notebooks instance.

You can enable the system health report by using either the Google Cloud console or the Google Cloud CLI.

Before you begin

Before you can create a user-managed notebooks instance, you must have a Google Cloud project and enable the Notebooks API for that project.
  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Notebooks API.

    Enable the API

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project.

  7. Enable the Notebooks API.

    Enable the API

  8. If you plan to use GPUs with your user-managed notebooks instance, check the quotas page in the Google Cloud console to ensure that you have enough GPUs available in your project. If GPUs are not listed on the quotas page, or you require additional GPU quota, you can request a quota increase. See Requesting an increase in quota on the Compute Engine Resource quotas page.

Required roles

If you created the project, you have the Owner (roles/owner) IAM role on the project, which includes all required permissions. Skip this section and start creating your user-managed notebooks instance. If you didn't create the project yourself, continue in this section.

To ensure that your user account has the necessary permissions to create a Vertex AI Workbench user-managed notebooks instance, ask your administrator to grant your user account the following IAM roles on the project:

For more information about granting roles, see Manage access to projects, folders, and organizations.

Your administrator might also be able to give your user account the required permissions through custom roles or other predefined roles.

Create the instance

Console

  1. In the Google Cloud console, go to the User-managed notebooks page. Or go to notebook.new (https://notebook.new) and skip the next step.

    Go to User-managed notebooks

  2. Click  New notebook, and then select Customize.

  3. On the Create a user-managed notebook page, in the Details section, provide the following information for your new instance:

    • Name: a name for your new instance
    • Region and Zone: Select a region and zone for the new instance. For best network performance, select the region that is geographically closest to you. See the available user-managed notebooks locations.
  4. Select the System health section.

  5. In the System health and reporting section, select the Enable system health report checkbox.

  6. Complete the rest of the instance creation dialog, and then click Create.

gcloud

  1. From Cloud Shell or any environment where the Google Cloud CLI is installed, enter the following Google Cloud CLI command:

    gcloud notebooks instances create INSTANCE_NAME \
        --vm-image-project=deeplearning-platform-release \
        --vm-image-family=IMAGE_FAMILY \
        --machine-type=MACHINE_TYPE \
        --location=ZONE \
        --metadata=enable-guest-attributes=TRUE,report-system-health=TRUE
    

    Replace the following:

    • INSTANCE_NAME: the name of your new instance
    • IMAGE_FAMILY: the image family name that you want to use to create your instance
    • MACHINE_TYPE: the machine type of your instance's VM; for example, n1-standard-4
    • ZONE: the zone where you want your new instance to be located, for example, us-west1-a
  2. Access your instance from the Google Cloud console.

Monitor system health through guest attributes

For user-managed notebooks instances that have the related guest attributes enabled, you can retrieve the values of your system health guest attributes by using either the Google Cloud console, the Google Cloud CLI with Compute Engine commands, or the Google Cloud CLI with Vertex AI Workbench commands.

Console

  1. In the Google Cloud console, go to the User-managed notebooks page.

    Go to User-managed notebooks

  2. Click the instance name that you want to view the system health status of.

  3. On the Notebook details page, click the Health tab. Review the status of your instance and its core services.

gcloud with Compute Engine

gcloud compute instances get-guest-attributes INSTANCE_NAME \
    --zone ZONE

Replace the following:

  • INSTANCE_NAME: the name of your instance
  • ZONE: the zone where your instance is located

If your core services are healthy, the results look like the following. A value of 1 means no failure was detected.

 NAMESPACE   KEY                         VALUE
 notebooks   docker_proxy_agent_status   1
 notebooks   docker_status               1
 notebooks   jupyterlab_api_status       1
 notebooks   jupyterlab_status           1
 notebooks   system-health               1
 notebooks   updated                     2020-10-01 17:00:00.12345

If any of the four core services fail, system-health reports a -1 value to indicate system failure. In most cases, a system failure means that JupyterLab is not accessible.

An example of a failure result might look like the following.

 NAMESPACE   KEY                         VALUE
 notebooks   docker_proxy_agent_status   -1
 notebooks   docker_status               -1
 notebooks   jupyterlab_api_status       1
 notebooks   jupyterlab_status           1
 notebooks   system-health               -1
 notebooks   updated                     2020-10-01 17:00:00.12345

gcloud with Vertex AI Workbench

To monitor your system health, you can use the getInstanceHealth method to retrieve the values of your guest attributes.

The following example shows how to do this using the gcloud CLI.

gcloud notebooks instances is-healthy example-instance \
    --location=ZONE

Replace ZONE with the zone where your instance is located, for example, us-west1-a.

If your core services are healthy, the results look like the following. A value of 1 means no failure was detected.

  {
          "health_state": HEALTHY,
          "docker-proxy-agent": 1,
          "docker-service": 1,
          "jupyter-service": 1,
          "jupyter-api": 1,
          "last-updated": "2020-10-01 17:00:30.12345"
  }

An example of a failure result might look like the following.

  {
          "healthy": UNHEALTHY,
          "docker-proxy-agent": 1,
          "docker-service": 1,
          "jupyter-service": -1,
          "jupyter-api": -1,
          "last-updated": "2020-10-01 17:00:30.12345"
  }

Report custom metrics to Monitoring

User-managed notebooks instances let you collect system status and JupyterLab metrics and report them to Cloud Monitoring. These custom metrics are different from the standard metrics that are reported when you install Monitoring on your user-managed notebooks instance.

The custom metrics reported to Monitoring include the following:

  • The system health of these user-managed notebooks core services:

    • Docker service
    • Docker reverse proxy agent
    • Jupyter service
    • Jupyter API
  • The following JupyterLab metrics:

    • Number of kernels
    • Number of terminals
    • Number of connections
    • Number of sessions
    • Maximum memory
    • High memory
    • Current memory

How instances report custom metrics to Monitoring

To report custom metrics to Monitoring, you must enable the report-notebook-metrics metadata setting while creating a user-managed notebooks instance.

You must also make sure that the user-managed notebooks instance's service account has Monitoring Metric Writer (roles/monitoring.metricWriter) permissions. For more information, see Manage access to projects, folders, and organizations.

Create a user-managed notebooks instance that reports custom metrics to Monitoring

To report custom metrics to Monitoring, you must select the Report custom metrics to Cloud Monitoring checkbox when you create a user-managed notebooks instance.

You can enable reporting custom metrics to Cloud Monitoring by using either the Google Cloud console or the Google Cloud CLI.

Before you begin

Before you can create a user-managed notebooks instance, you must have a Google Cloud project and enable the Notebooks API for that project.
  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Notebooks API.

    Enable the API

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project.

  7. Enable the Notebooks API.

    Enable the API

  8. If you plan to use GPUs with your user-managed notebooks instance, check the quotas page in the Google Cloud console to ensure that you have enough GPUs available in your project. If GPUs are not listed on the quotas page, or you require additional GPU quota, you can request a quota increase. See Requesting an increase in quota on the Compute Engine Resource quotas page.

Required roles

If you created the project, you have the Owner (roles/owner) IAM role on the project, which includes all required permissions. Skip this section and start creating your user-managed notebooks instance. If you didn't create the project yourself, continue in this section.

To ensure that your user account has the necessary permissions to create a Vertex AI Workbench user-managed notebooks instance, ask your administrator to grant your user account the following IAM roles on the project:

For more information about granting roles, see Manage access to projects, folders, and organizations.

Your administrator might also be able to give your user account the required permissions through custom roles or other predefined roles.

Create the instance

Console

  1. In the Google Cloud console, go to the User-managed notebooks page. Or go to notebook.new (https://notebook.new) and skip the next step.

    Go to User-managed notebooks

  2. Click  New notebook, and then select Customize.

  3. On the Create a user-managed notebook page, in the Details section, provide the following information for your new instance:

    • Name: a name for your new instance
    • Region and Zone: Select a region and zone for the new instance. For best network performance, select the region that is geographically closest to you. See the available user-managed notebooks locations.
  4. Select the System health section.

  5. In the System health and reporting section, select the Report custom metrics to Cloud Monitoring checkbox.

  6. Complete the rest of the instance creation dialog, and then click Create.

gcloud

  1. From Cloud Shell or any environment where the Google Cloud CLI is installed, enter the following Google Cloud CLI command:

    gcloud notebooks instances create INSTANCE_NAME \
        --vm-image-project=deeplearning-platform-release \
        --vm-image-family=IMAGE_FAMILY \
        --machine-type=MACHINE_TYPE \
        --location=ZONE \
        --metadata=report-notebook-metrics=TRUE
    

    Replace the following:

    • INSTANCE_NAME: the name of your new instance
    • IMAGE_FAMILY: the image family name that you want to use to create your instance
    • MACHINE_TYPE: the machine type of your instance's VM, for example, n1-standard-4
    • ZONE: the zone where you want your new instance to be located, for example, us-west1-a
  2. Access your instance from the Google Cloud console.

Grant Monitoring Metric Writer permissions to the service account

After you've created your new user-managed notebooks instance, grant Monitoring Metric Writer permissions (roles/monitoring.metricWriter) to the service account for the user-managed notebooks instance. For more information, see Manage access to projects, folders, and organizations.

Monitor custom metrics through Monitoring

For user-managed notebooks instances that have reporting custom metrics enabled, you can monitor your custom metrics by using the Google Cloud console.

  1. In the Google Cloud console, go to the User-managed notebooks page.

    Go to User-managed notebooks

  2. Click the instance name that you want to view the custom metrics of.

  3. On the Notebook details page, click the Monitoring tab. Review the custom metrics for your instance.

Install Monitoring on an instance

This option automatically installs Monitoring. The installation requires 256 MB of disk space. An internet connection is required for the metrics to be reported to Monitoring.

How instances report system and application metrics

To report system and application metrics by installing Cloud Monitoring on your user-managed notebooks instance, you must select the Install Cloud Monitoring agent checkbox when you create a user-managed notebooks instance. These metrics are different from the custom metrics that are reported when you enable the report-notebook-metrics metadata setting.

Create a user-managed notebooks instance that reports system and application metrics to Monitoring

To install Monitoring on your user-managed notebooks instance, you can use either the Google Cloud console or the Google Cloud CLI.

Before you begin

Before you can create a user-managed notebooks instance, you must have a Google Cloud project and enable the Notebooks API for that project.
  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Notebooks API.

    Enable the API

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project.

  7. Enable the Notebooks API.

    Enable the API

  8. If you plan to use GPUs with your user-managed notebooks instance, check the quotas page in the Google Cloud console to ensure that you have enough GPUs available in your project. If GPUs are not listed on the quotas page, or you require additional GPU quota, you can request a quota increase. See Requesting an increase in quota on the Compute Engine Resource quotas page.

Required roles

If you created the project, you have the Owner (roles/owner) IAM role on the project, which includes all required permissions. Skip this section and start creating your user-managed notebooks instance. If you didn't create the project yourself, continue in this section.

To ensure that your user account has the necessary permissions to create a Vertex AI Workbench user-managed notebooks instance, ask your administrator to grant your user account the following IAM roles on the project:

For more information about granting roles, see Manage access to projects, folders, and organizations.

Your administrator might also be able to give your user account the required permissions through custom roles or other predefined roles.

Create the instance

Console

  1. In the Google Cloud console, go to the User-managed notebooks page. Or go to notebook.new (https://notebook.new) and skip the next step.

    Go to User-managed notebooks

  2. Click  New notebook, and then select Customize.

  3. On the Create a user-managed notebook page, in the Details section, provide the following information for your new instance:

    • Name: a name for your new instance
    • Region and Zone: Select a region and zone for the new instance. For best network performance, select the region that is geographically closest to you. See the available user-managed notebooks locations.
  4. Select the System health section.

  5. In the System health and reporting section, select the Install Cloud Monitoring agent checkbox.

  6. Complete the rest of the instance creation dialog, and then click Create.

gcloud

  1. From Cloud Shell or any environment where the Google Cloud CLI is installed, enter the following Google Cloud CLI command:

    gcloud notebooks instances create INSTANCE_NAME \
        --vm-image-project=deeplearning-platform-release \
        --vm-image-family=IMAGE_FAMILY \
        --machine-type=MACHINE_TYPE \
        --location=ZONE \
        --metadata=install-monitoring-agent=TRUE
    

    Replace the following:

    • INSTANCE_NAME: the name of your new instance
    • IMAGE_FAMILY: the image family name that you want to use to create your instance
    • MACHINE_TYPE: the machine type of your instance's VM; for example, n1-standard-4
    • ZONE: the zone where you want your new instance to be located, for example, us-west1-a
  2. Access your instance from the Google Cloud console.

Monitor system and application metrics through Monitoring

For user-managed notebooks instances that have Monitoring installed, you can monitor your system and application metrics by using the Google Cloud console:

  1. In the Google Cloud console, go to the User-managed notebooks page.

    Go to User-managed notebooks

  2. Click the instance name that you want to view the system and application metrics of.

  3. On the Notebook details page, click the Monitoring tab. Review the system and application metrics for your instance. To learn how to interpret these metrics, see Review resource metrics.

Use the diagnostic tool to monitor system health

User-managed notebooks instances include a built-in diagnostic tool that can help you monitor the system health of your instances.

Tasks performed by the diagnostic tool

The diagnostic tool performs the following tasks:

  • Verifies the status of the following user-managed notebooks core services:

    • Docker service
    • Docker reverse proxy agent
    • Jupyter service
    • Jupyter API
  • Checks whether the disk space for boot and data disks is used beyond an 85% threshold.

  • Installs lsof (internet connection required).

  • Collects the following instance logs:

    • Network information (ifconfig, netstat)
    • Logs in the /var/log/ folder
    • Docker status information
    • lsof (open files) data
    • Docker service status
    • Proxy reverse agent status
    • Jupyter service status
    • Jupyter API status
    • Proxy agent configuration file
    • Python processes
  • Runs the following commands and collects the results:

    • pip freeze
    • conda list
    • gcloud compute instances describe INSTANCE_NAME
    • gcloud config list

Run the diagnostic tool

To run the diagnostic tool, complete the following steps:

  1. Use ssh to connect to your user-managed notebooks instance.

  2. In the SSH terminal, run the following commands:

    sudo -i
    cd /opt/deeplearning/bin/
    ./diagnostic_tool.sh
    

    The diagnostic tool collects the logs, compresses them in a .tar.gz file, and places the file in the /tmp/ folder.

  3. Extract the file and then evaluate the contents. The contents include:

    • log folder: Logs from the var/log/ folder
    • report.log: Output for all commands collected
    • proxy-agent-config.json: Proxy configuration information
    • Docker log: A -json.log file that includes Docker container logs

You can use the following options with the diagnostic tool.

Option Description
-r A repair option that tries to restore failed user-managed notebooks core services status
-s Runs without a confirmation
-b Uploads the .tar.gz file to a Cloud Storage bucket.
-v A debug option for troubleshooting the tool in case of failures
-c Captures 30 seconds of packet traffic into your user-managed notebooks instance, filtering SSH
-d A destination folder in which to save the logs
-h Help

What's next