Troubleshooting notebooks

Connecting to and opening JupyterLab notebooks

Nothing happens after clicking Open JupyterLab

Verify that your browser does not block pop-up tabs. JupyterLab opens in a new browser tab.

No Inverting Proxy server access to JupyterLab

Notebooks uses a Google internal Inverting Proxy server to provide access to JupyterLab. Notebooks instance settings, network configuration, and other factors can prevent access to JupyterLab. Use SSH to connect to JupyterLab and learn more about why you might not have access through the Inverting Proxy.

Unable to SSH into Notebooks instance

Notebooks uses OS Login to enable SSH access. This is done automatically at Notebooks instance creation time by setting the metadata entry enable-oslogin value to TRUE. To enable SSH access for Notebooks for users, complete the steps for configuring OS Login roles on user accounts.

Opening a notebook results in a 403 (Forbidden) error

There are 3 different ways to access JupyterLab notebooks:

  • Single User
  • Service Account
  • Project Editors

The access mode is configured during Notebooks instance creation and it is defined in the notebook metadata:

  • Single User: proxy-mode=mail, proxy-user-mail=user@domain.com
  • Service Account: proxy-mode=service_account
  • Project Editors: proxy-mode=project_editors

If you can't access a notebook when you click Open JupyterLab, try the following:

  • Verify that the proxy-mode metadata entry is correct.

  • Verify that the user accessing the instance has the iam.serviceAccounts.ActAs permission for the defined service account. The service account on the instance provides access to other Google Cloud services. You can use any service account within the same project, but you must have the Service Account User permission (iam.serviceAccounts.actAs) to access the instance. If not specified, the Compute Engine default service account is used and this permission is required as well.

The following example shows how to specify a service account when you create an instance:

gcloud notebooks instances create nb-1 \
  --vm-image-family=tf2-latest-cpu \
  --metadata=proxy-mode=mail,proxy-user-mail=user@domain.com \
  --service-account=your_service_account@project_id.iam.gserviceaccount.com \
  --location=us-west1-a

When you click Open JupyterLab to open a notebook, the notebook opens in a new browser tab. If you are signed in to more than one Google account, the new tab opens with your default Google account. If you did not create your Notebooks instance with your default Google account, the new browser tab will show a 403 (Forbidden) error.

Opening a notebook results in a 504 (Gateway Timeout) error

This is an indication of an internal proxy timeout or a backend server (Jupyter) timeout. This can be seen when:

  • The request never reached the internal Inverting Proxy server
  • Backend(Jupyter) returns a 504 error.

If you can't access a notebook:

  • Open a Google support case.

Opening a notebook results in a 524 (A Timeout Occurred) error

The internal Inverting Proxy server hasn't received a response from the Inverting Proxy agent for the request within the timeout period. Inverting Proxy agent runs inside your Notebooks instance as a Docker container. A 524 error is usually an indication that the Inverting Proxy agent isn't connecting to the Inverting Proxy server or the requests are taking too long on the backend server side (Jupyter). A typical case for this error is on the user side (for example, a networking issue, or the Inverting Proxy agent/Jupyter service isn't running).

If you can't access a notebook, try the following:

Opening a notebook results in a 598 (Network read timeout) error

The Inverting Proxy server hasn't heard from the Inverting Proxy agent at all for more than 10 minutes, this is a strong indication of an Inverting Proxy agent/Jupyter issue.

If you can't access a notebook, try the following:

Notebook is unresponsive

If your Notebooks instance isn't executing cells or appears to be frozen, first try restarting the kernel by clicking Kernel from the top menu and then Restart Kernel. If that doesn't work, you can try the following:

  • From a terminal session in the notebook, run the command top to see if there are processes consuming the CPU.
  • From the terminal, check the amount of free disk space using the command df, or check the available RAM using the command free.
  • Shut your instance down by selecting it from the Notebook instances page and clicking Stop. Once it has stopped completely, select it and click Start.

Working with files

Downloading files from JupyterLab results in 403 (Forbidden) error

The notebook package in the M23 release of Deep Learning VM includes a bug that prevents you from downloading a file using the JupyterLab UI. You can read more about the bug at Cannot download files after JL update and Download file functionality is broken in notebook packages version 5.7.6+ (5.7.7, 5.7.8).

If you are using the M23 release of Deep Learning VM you can resolve the issue in one of two ways:

  • Use a Safari browser. The download functionality works for Safari.

  • Downgrade your notebook package to version 5.7.5.

    To downgrade your notebook package:

    1. Connect to your Deep Learning VM using SSH. For information on connecting to a VM using SSH, see Connecting to instances.

    2. Run the following commands:

      sudo pip3 install notebook==5.7.5
      sudo service jupyter restart
      

After restarting VM, local files cannot be referenced from notebook terminal

Sometimes after restarting a Notebooks instance, local files cannot be referenced from within a notebook terminal.

This is a known issue. To reference your local files from within a notebook terminal, first re-establish your current working directory using the following command:

cd PWD

In this command, replace PWD with your current working directory. For example, if your current working directory was /home/jupyter/, use the command cd /home/jupyter/.

After re-establishing your current working directory, your local files can be referenced from within the notebook terminal.

GPU quota has been exceeded

Determine the number of GPUs available in your project by checking the quotas page. If GPUs are not listed on the quotas page, or you require additional GPU quota, you can request a quota increase. See Requesting additional quota on the Compute Engine Resource Quotas page.

Creating Notebooks instances

New Notebooks instance is not created (insufficient permissions)

It usually takes about a minute to create a Notebooks instance. If your new Notebooks instance remains in the pending state indefinitely, it might be because the service account used to start the Notebooks instance does not have the required Editor permission in your Google Cloud Platform (GCP) project.

You can start a Notebooks instance with a custom service account that you create or in single-user mode with a userid. If you start a Notebooks instance in single-user mode, then your Notebooks instance begins the boot process using Compute Engine default service account before turning control over to your userid.

To verify that a service account has the appropriate permissions, follow these steps:

Console

  1. Open the IAM page in the Cloud Console.

    Open the IAM page

  2. Determine the service account used with your Notebooks instance, which is one of the following:

    • A custom service account that you specified when you created your Notebooks instance.

    • The Compute Engine default service account for your GCP project, which is used when you start your Notebooks instance in single-user mode. The Compute Engine default service account for your GCP project is named PROJECT_NUMBER-compute@developer.gserviceaccount.com. For example: 113377992299-compute@developer.gserviceaccount.com.

  3. Verify that your service account is in the Editor role.

  4. If not, edit the service account and add it to the Editor role.

For more information, see Granting, changing, and revoking access to resources in the IAM documentation.

gcloud

  1. If you have not already, install the gcloud command-line tool.

  2. Get the name and project number for your GCP project with the following command. Replace PROJECT_ID with the project ID for your GCP project.

    gcloud projects describe PROJECT_ID
    

    You should see output similar to the following, which displays the name (name:) and project number (projectNumber:) for your project.

    createTime: '2018-10-18T21:03:31.408Z'
    lifecycleState: ACTIVE
    name: my-project-name
    parent:
     id: '396521612403'
     type: folder
    projectId: my-project-id-1234
    projectNumber: '113377992299'
    
  3. Determine the service account used with your Notebooks instance, which is one of the following:

    • A custom service account that you specified when you created your Notebooks instance.

    • The Compute Engine default service account for your GCP project, which is used when you start your Notebooks instance in single-user mode. The Compute Engine default service account for your GCP project is named PROJECT_NUMBER-compute@developer.gserviceaccount.com. For example: 113377992299-compute@developer.gserviceaccount.com.

  4. Add the roles/editor role to the service account with the following command. Replace project-name with the name of your project, and replace service-account-id with the service account ID for your Notebooks instance.

    gcloud projects add-iam-policy-binding project-name \
     --member serviceAccount:service-account-id \
     --role roles/editor
    

Creating an instance results in a "Permission denied" error

When creating a new instance, verify that the user creating the instance has the iam.serviceAccounts.ActAs permission for the defined service account.

The service account on the instance provides access to other Google Cloud services. You can use any service account within the same project, but you must have the Service Account User permission (iam.serviceAccounts.actAs) to create the instance. If not specified, the Compute Engine default service account is used.

The following example shows how to specify a service account when you create an instance:

gcloud notebooks instances create nb-1 \
  --vm-image-family=tf2-latest-cpu \
  --service-account=your_service_account@project_id.iam.gserviceaccount.com \
  --location=us-west1-a

To grant the Service Account User permission, see Allowing a member to impersonate a single service account.

Creating a new instance results in an "already exists" error

When creating a new instance, verify that a Notebooks instance with the same name was not deleted previously by Compute Engine and still exists in the Notebooks API database.

The following example shows how to list instances using the Notebooks API and verify their state.

gcloud notebooks instances list --location=LOCATION

If an instance's state is DELETED, run the following command to delete it permanently.

gcloud notebooks instances delete INSTANCE_NAME --location=LOCATION

Upgrading Notebooks instances

Unable to upgrade because unable to get instance disk information

Upgrade is not supported for single-disk Notebooks instances. You might want to migrate your user data to a new Notebooks instance.

Unable to upgrade because instance is not UEFI compatible

Notebooks depends on UEFI compatibility to complete an upgrade.

Notebooks instances created from some older images are not UEFI compatible, and therefore cannot be upgraded.

To verify that your instance is UEFI compatible, type the following command in either Cloud Shell or any environment where the Cloud SDK is installed.

gcloud compute instances describe INSTANCE_NAME \
  --zone=ZONE | grep type

Replace the following:

  • INSTANCE_NAME: the name of your instance
  • ZONE: the zone where your instance is located

To verify that the image that you used to create your instance is UEFI compatible, use the following command:

gcloud compute images describe VM_IMAGE_FAMILY \
  --project deeplearning-platform-release | grep type

Replace VM_IMAGE_FAMILY with the image family name that you used to create your instance.

If you determine that either your instance or image is not UEFI compatible, you can attempt to migrate your user data to a new Notebooks instance. To do so, complete the following steps:

  1. Verify that the image that you want to use to create your new instance is UEFI compatible. To do so, type the following command in either Cloud Shell or any environment where the Cloud SDK is installed.

    gcloud compute images describe VM_IMAGE_FAMILY \
      --project deeplearning-platform-release --format=json | grep type
    

    Replace VM_IMAGE_FAMILY with the image family name that you want to use to create your instance.

  2. Migrate your user data to a new Notebooks instance.

Notebooks instance is not accessible after upgrade

Notebooks instances that can be upgraded are dual-disk, with one boot disk and one data disk. The upgrade process upgrades the boot disk to a new image while preserving your data on the data disk.

If the Notebooks instance is not accessible after an upgrade, there may have been a failure during the replacement of the boot disk's image. Complete the following steps to attach a new valid image to the boot disk.

  1. To store values you'll use to complete this procedure, type the following command in either Cloud Shell or any environment where the Cloud SDK is installed.

    export INSTANCE_NAME=MY_INSTANCE_NAME
    export PROJECT_ID=MY_PROJECT_ID
    export ZONE=MY_ZONE
    

    Replace the following:

    • MY_INSTANCE_NAME: the name of your instance
    • MY_PROJECT_ID: your project ID
    • MY_ZONE: the zone where your instance is located
  2. Use the following command to stop the instance:

    gcloud compute instances stop $INSTANCE_NAME \
      --project=$PROJECT_ID --zone=$ZONE
    
  3. Detach the data disk from the instance.

    gcloud compute instances detach-disk $INSTANCE_NAME --device-name=data \
      --project=$PROJECT_ID --zone=$ZONE
    
  4. Delete the instance's VM.

    gcloud compute instances delete $INSTANCE_NAME --keep-disks=all --quiet \
      --project=$PROJECT_ID --zone=$ZONE
    
  5. Use the Notebooks API to delete the Notebooks instance.

    gcloud notebooks instances delete $INSTANCE_NAME \
      --project=$PROJECT_ID --location=$ZONE
    
  6. Create a new Notebooks instance using the same name as your previous instance.

    gcloud notebooks instances create $INSTANCE_NAME \
      --vm-image-project="deeplearning-platform-release" \
      --vm-image-family=MY_VM_IMAGE_FAMILY \
      --instance-owners=MY_INSTANCE_OWNER \
      --machine-type=MY_MACHINE_TYPE \
      --service-account=MY_SERVICE_ACCOUNT \
      --accelerator-type=MY_ACCELERATOR_TYPE \
      --accelerator-core-count=MY_ACCELERATOR_CORE_COUNT \
      --install-gpu-driver \
      --project=$PROJECT_ID \
      --location=$ZONE
    

    Replace the following:

    • MY_VM_IMAGE_FAMILY: the image family name
    • MY_INSTANCE_OWNER: your instance owner
    • MY_MACHINE_TYPE: the machine type of your instance's VM
    • MY_SERVICE_ACCOUNT: the service account to use with this instance, or use "default"
    • MY_ACCELERATOR_TYPE: the accelerator type; for example, "NVIDIA_TESLA_K80"
    • MY_ACCELERATOR_CORE_COUNT: the core count; for example, 1

Monitoring health status of Notebooks instances

docker-proxy-agent status failure

Follow these steps after a docker-proxy-agent status failure:

  1. Verify that the Inverting Proxy agent is running. If not, go to step 3.

  2. Restart the Inverting Proxy agent.

  3. Re-register with the Inverting Proxy server.

docker-service status failure

Follow these steps after a docker-service status failure:

  1. Verify that the Docker service is running.

  2. Restart the Docker service.

jupyter-service status failure

Follow these steps after a jupyter-service status failure:

  1. Verify that the Jupyter service is running.

  2. Restart the Jupyter service.

jupyter-api status failure

Follow these steps after a jupyter-api status failure:

  1. Verify that the Jupyter internal API is active.

  2. Restart the Jupyter service.

Boot disk space status

The boot disk space status is unhealthy if the disk space is greater than 85% full.

If your boot disk space status is unhealthy, try the following:

  1. From a terminal session in the Notebooks instance or using ssh to connect, check the amount of free disk space using the command df -H.

  2. Use the command find . -type d -size +100M to help you find large files that you may be able to delete, but do not delete them unless you are sure you can safely do so. If you aren't sure, you can get help from support.

  3. If the previous steps do not solve your problem, get support.

Data disk space status

The data disk space status is unhealthy if the disk space is greater than 85% full.

If your data disk space status is unhealthy, try the following:

  1. From a terminal session in the Notebooks instance or using ssh to connect, check the amount of free disk space using the command df -h -T /home/jupyter.

  2. Delete large files to increase the available disk space. Use the command find . -type d -size +100M to help you find large files.

  3. If the previous steps do not solve your problem, get support.

Helpful procedures

Use SSH to connect to your Notebooks instance

Use ssh to connect to your instance by typing the following command in either Cloud Shell or any environment where the Cloud SDK is installed.

gcloud compute ssh --project PROJECT_ID \
  --zone ZONE \
  INSTANCE_NAME -- -L 8080:localhost:8080

Replace the following:

  • PROJECT_ID: Your project ID
  • ZONE: The Google Cloud zone where your instance is located
  • INSTANCE_NAME: The name of your instance

Re-register with the Inverting Proxy server

To re-register the Notebooks instance with the internal Inverting Proxy server, you can stop and start the VM from the Notebook instances page or you can use ssh to connect to your Notebooks instance and enter:

cd /opt/deeplearning/bin
sudo ./attempt-register-vm-on-proxy.sh

Verify the Docker service status

To verify the Docker service status you can use ssh to connect to your Notebooks instance. and enter:

sudo service docker status

Verify that the Inverting Proxy agent is running

To verify if the notebook Inverting Proxy agent is running, use ssh to connect to your Notebooks instance and enter:

# Confirm Inverting Proxy agent Docker container is running (proxy-agent)
sudo docker ps

# Verify State.Status is running and State.Running is true.
sudo docker inspect proxy-agent

# Grab logs
sudo docker logs proxy-agent

Verify the Jupyter service status and collect logs

To verify the Jupyter service status you can use ssh to connect to your Notebooks instance and enter:

sudo service jupyter status

To collect Jupyter service logs:

sudo journalctl -u jupyter.service --no-pager

Verify that the Jupyter internal API is active

To verify that the Jupyter internal API is active you can use ssh to connect to your Notebooks instance and enter:

curl http://127.0.0.1:8080/api/kernelspecs

Restart the Docker service

To restart the Docker service, you can stop and start the VM from the Notebook instances page or you can use ssh to connect to your Notebooks instance and enter:

sudo service docker restart

Restart the Inverting Proxy agent

To restart the Inverting Proxy agent, you can stop and start the VM from the Notebook instances page or you can use ssh to connect to your Notebooks instance and enter:

sudo docker restart proxy-agent

Restart the Jupyter service

To restart the Jupyter service, you can stop and start the VM from the Notebook instances page or you can use ssh to connect to your Notebooks instance and enter:

sudo service jupyter restart

Migrate your data to a new Notebooks instance

  1. Copy your user data to a Cloud Storage bucket using gsutil. The following example command copies all of the files from the default directory /home/jupyter/ to a Cloud Storage directory.

    gsutil cp -R /home/jupyter/* gs://MY_DIRECTORY/
    

    Replace the following:

    • MY_DIRECTORY: the Cloud Storage directory where you want to store your instance's data
  2. Create a new Notebooks instance using the Create a new instance instructions to make sure that your new instance is registered with the Notebooks API.

  3. Restore your user data on the new instance. The following example command copies all of the files from a Cloud Storage directory to the default directory /home/jupyter/.

    gsutil cp gs://MY_DIRECTORY/* /home/jupyter/
    

Make a copy of the user data on your Notebooks instance

To store a copy of your instance's user data in Cloud Storage, complete the following steps:

  1. Use ssh to connect to your Notebooks instance.

  2. Copy the contents of the instance to a Cloud Storage bucket using gsutil. The following example command copies all of the notebook (.ipynb) files from the default directory /home/jupyter/ to a Cloud Storage directory named my-bucket/legacy-notebooks.

    gsutil cp -R /home/jupyter/*.ipynb gs://my-bucket/legacy-notebooks/