Last reviewed 2021-11-18 UTC

Separate operations and development when using user-managed notebooks: Deploy

This document describes how to deploy the notebooks manager for managing Vertex AI Workbench user-managed notebooks. The code for this deployment is available on GitHub.

The document is part of a series that includes the following documents:

  • Overview, which describes a solution that you can use for deploying the notebooks manager and extended notebooks UIs.
  • Deploy (this document), which guides IT administrators on how to deploy the notebooks manager and extended notebooks UIs.
  • Use, which guides data practitioners on how to use the notebooks manager and extended notebooks UIs.
  • Troubleshooting, which describes potential issues and suggested resolutions.

Objectives

  • Set up an environment that limits Google Cloud console access for data practitioners but lets users interact with Google Cloud services through an extended notebooks UI.
  • Set up the OAuth 2.0 flow for the notebooks manager web application.
  • Deploy the notebooks manager.
  • Create extended notebooks UIs for data practitioners.
  • Provide data practitioners with a link to the notebooks manager.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project. Learn how to check if billing is enabled on a project.

  4. Enable the Notebooks and Cloud Storage APIs.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project. Learn how to check if billing is enabled on a project.

  7. Enable the Notebooks and Cloud Storage APIs.

    Enable the APIs

Prepare your environment

  1. Open Cloud Shell.

  2. Clone the repository that contains the code for this tutorial:

    git clone "https://github.com/GoogleCloudPlatform/notebooks-extended-uis.git"
    

    For more information, see the GitHub repository.

  3. If you do not have Terraform set up for Google Cloud, follow the instructions in Getting Started with the Google Provider to install Terraform with the Google Cloud provider.

Set up OAuth 2.0 flow for the notebooks manager web application

This procedure is performed manually in the Google Cloud console to mitigate abuse risks.

  1. Go to the Google Cloud console.

    Go to Google Cloud console

  2. To set up a consent screen, follow the instructions in Setting up your OAuth consent screen.

    This defines the user experience when users authenticate to the notebooks manager and authorize scope access. We recommend setting the user type as internal if possible.

  3. Create an OAuth 2.0 web client ID. This lets you grant the notebooks manager access to the Vertex AI API on behalf of the user while keeping credentials private. To create an OAuth 2.0 client, follow the steps in Setting up OAuth 2.0 for Web Application.

  4. Copy the client ID from the web application page in the Google Cloud console. The client ID looks similar to the following:

    123456789-a1b2c3d4e5.apps.googleusercontent.com

    Terraform uses this client ID to update the config.tpl file when it deploys the application.

For more information, including a list of Terraform variables, see the GitHub repository's README file.

Deploy the notebooks manager

The notebooks manager is an HTML page that you can deploy on any Google Cloud service, like Cloud Run, that can serve web pages and that's supported by VPC Service Controls. In this solution, the default hosting option is Cloud Storage.

  1. Cloud Shell, go to the repository directory:

    cd notebooks-extended-uis
    
  2. Open the terraforms.tfvars file in a text editor.

    For information about the required variables in this file, see Inputs in the GitHub repository. The repository provides an example terraforms.tfvars file:

    client_id             = "123456789-1a2b3c4def5ghi6jkl.apps.googleusercontent.com"
    project               = "example-project"
    console_url           = "example-bucket"
    deploy_consoles       = "true"
  3. In the terraform.tfvars file, set the client_id variable using the value that you copied in the previous section.

  4. Set the value of the console_url variable to a unique name.

    This value is used as the name of a Cloud Storage bucket, so it must be globally unique.

  5. Save and close the terraform.tfvars file.

  6. Deploy the infrastructure:

    terraform apply
    

The Terraform script deploys the notebooks manager by doing the following:

  1. Creating a bucket in Cloud Storage. The name of the bucket is derived from the value of theconsole_url Terraform variable.
  2. Using *.tpltemplate files to create static files such as index.html, 404.html, and config.js.
  3. Copying all static files to the Cloud Storage bucket. The .tpl files contain variables, including the following:

    • client_id: Found in the config.tpl file. This value is required for the OAuth 2.0 flow. When you set the value in the tfvars file, the config.tpl value is used in the config.tpl file to create a config.js file.
    • relative_path: Found in the index.tpland 404.tpl files. This value defines where to find the static file; it's based on variables that are defined in the main.tf file. This value is required in order to load static files locally.

After Terraform completes these steps, the notebooks manager is available at the following URL:

https://storage.googleapis.com/BUCKET_NAME/index.html

BUCKET_NAME is the name of the Cloud Storage bucket where you deployed the notebooks manager. It must match the console_url value that's in your terraform.tfvars file.

The Terraform script also lets you use a static bucket with your own domain name, but you must grant access to an additional IP address. For more information, see the README file in the GitHub repository.

Deploy extended notebooks UIs

This solution described in this tutorial assumes that you deploy the extended notebooks UIs on behalf of end users. This section is optional and serves as an example of how to do that. You can integrate this example in your own processes to create instances of user-managed notebooks, whether you do it at the same time as when you deploy the notebooks manager or separately. The example shows how to create a user-managed notebooks instance with the extended notebooks UI by using a custom container image. The image includes features such as the following:

  • Cloud Storage and BigQuery extensions provide interactive features that are similar to features in the Google Cloud console.
  • Git support lets users store and manage their user-managed notebooks and local files.
  • The notebook executor lets you run user-managed notebooks end to end in the background.

The extended notebooks UI uses Google Cloud and BigQuery extension add-ons for JupyterLab. The add-ons are enabled when the enable-extended-ui key is set to True in the user-managed notebooks instance metadata. For the architecture described in this document, the key is set in the Terraform script that deploys the example instance, as shown in the following listing:

resource "google_notebooks_instance" "instance" {
  count         = contains(["true", "yes", "1"], lower(var.deploy_consoles)) ? 1 : 0
  project       = var.project
  name          = "example-notebook-console"
  machine_type  = "n2-standard-2"
  location      = "us-west1-b"
  metadata = {
    enable-extended-ui = "True"
  }
  vm_image {
    project      = "deeplearning-platform-release"
    image_family = "common-cpu"
  }
}

The main.tf script deploys a user-managed notebooks instance as an example. You can adapt this part of the script to create your pool of extended notebooks UIs based on your hardware, software, and accelerator requirements.

This solution does not enforce any IAM permissions. If your company policies prevent users from accessing each other's user-managed notebooks, you should use additional security features such as OS Login, single-user access, or IAM permissions. Setting up access to user-managed notebooks is out of scope for this solution.

Provide access for data practitioners

Before users can use the notebooks manager, you need to do the following:

  1. Publish your application.

    This step lets users access your application's endpoint and is required only for non-internal applications. You perform this step from the OAuth consent screen. To learn more, see Setting up your OAuth consent screen.

  2. Provide users with the link to the notebooks manager.

    This lets users access the notebooks manager from a client that's authorized by the perimeters of the VPC Service Controls. When you use the default deployment_context parameter, the link looks similar to the following:

    https://storage.googleapis.com/BUCKET_NAME/index.html?projectId=PROJECT_ID

For more information about running the Terraform commands, see the GitHub README file.

Select hosting options for the notebooks manager

The solution described in this document hosts the notebooks manager on Cloud Storage because the application is a static web page. Cloud Storage makes it easy to deploy the solution. The solution is supported by VPC Service Controls, and it provides regional options.

As you can see in the GitHub repository, the static page is part of a Docker folder with an Nginx setup. That folder hierarchy is independent from deploying on Cloud Storage, but it provides flexibility in case you want to expand the capabilities of the notebooks manager and you need to build a container image.

For example, you might want to add a custom backend server and deploy it on another Google Cloud offering that supports containers. Options include Cloud Run, an internal deployment on Google Kubernetes Engine (GKE), or a managed instance group with container images.

If you do not use the default deployment option to Cloud Storage, you must create the index.html, 404.html, and config.js files from the .tpl files. You can create those files either manually by replacing the templated variables or through a Terraform script that's similar to the one that's provided in the GitHub repository.

Set up a bastion host

Access to the notebooks manager and extended notebooks UIs requires a client that's within the same perimeter as those applications. For example, some companies use a bastion host approach with remote desktops.

Grant access to URLs

To make sure that users can go through the OAuth 2.0 authorization flow, interact with the Vertex AI API, and access BigQuery from Vertex AI Workbench, you need to grant access to the following external URLs in the organization's firewall rules.

You might need to grant access to additional URLs that pertain to your identity provider to support the browser sign-in flow.

A user's ability to perform tasks with the Vertex AI API depends on the IAM permissions that you set up in your project and organization. The notebooks manager does not enforce any security because it is only a client-side tool.

External URL Description
*.accounts.google.com Used for OAuth 2.0 flow.
*.accounts.youtube.com Used for OAuth 2.0 flow.
*.gstatic.com Used for OAuth 2.0 flow and a favicon.
*.googleusercontent.com Implicitly allows notebooks.googleusercontent.com.
*.datalab.cloud.google.com Used to get a notebook's proxy URL.
content-cloudresourcemanager.googleapis.com Used to list projects.
content-notebooks.googleapis.com Used when calling actions on notebooks.
https://apis.google.com/js/googleapis.proxy.js.* Used for OAuth 2.0 flow.
https://apis.google.com/_/scs/apps-static/.* Used for OAuth 2.0 flow.
*.notebooks.googleapis.com Used when calling actions on notebooks.
*.notebooks.cloud.google.com Used by the Vertex AI Workbench viewer service.
cdn.jsdelivr.net/npm Enables the BigQuery add-on to work.

What's next