Create a Vertex AI JupyterLab Notebook

You use the Vertex AI SDK in a JupyterLab Notebook to get predictions. This section shows you how to use Vertex AI Workbench to create a user-managed JupyterLab Notebook instance. A user-managed notebook instance is a secure, integrated JupyterLab environment used to experiment, develop, and deploy machine learning models into production. User-managed notebooks come preinstalled with the most recent data science and machine learning frameworks. After you create your notebook, you run sequential portions of Python code to do most of the work to generate your predictions. For more information, see Introduction to user-managed notebooks.

Create a user-managed notebooks JupyterLab Notebook instance

To create your user-managed notebooks JupyterLab Notebook instance, do the following:

  1. In the Google Cloud console, open your Google Cloud project if it's not already open.

  2. If you are not in the Vertex AI portion of the Google Cloud console, then do the following:

    1. In Search enter Vertex AI, then press return.

    2. In the search results, click Vertex AI

    Search for Vertex AI service

  3. In the left-navigation pane, under Tools, choose Workbench.

  4. If the option to enable the Notebooks API appears, click Enable. It might take a few moments for the enabling process to complete.

  5. Click User-managed notebooks.

  6. Click New notebook, then click Python 3.

    Choose a Python 3 notebook

  7. In New notebook, do the following:

    1. In Notebook name, enter a name for your notebook.

    2. In Region, select us-central1 (Iowa).

    3. In Notebook properties, don't change anything. This tutorial works with the default properties.

    4. Click Create. If you want to learn more about your notebook, after it appears in your list of user-managed notebooks click its name to see its properties.

Prepare your notebook

Your JupyterLab user-managed notebooks instance already has packages for deep learning preinstalled, including TensorFlow and PyTorch frameworks. It's also already authenticated to use your Google Cloud project. However, you must install and initialize the Vertex AI SDK for Python. This section walks you through these steps.

After you create your notebook, you use it to enter and run the sequential snippets of code in this tutorial. Each snippet of code must be run individually and in order.

Open your notebook

If your notebook is already open, you can skip to the next step, Install the Vertex AI SDK for Python.

Your notebook is where you run the code in this tutorial. It's a file with the extension .ipynb. When you open it for the first time, it's untitled. You can rename it after it's open. To open your notebook, do the following:

  1. In the Google Cloud console, open your Google Cloud project. If you don't have a project, see Create a Google Cloud project to learn how to create one.

  2. If you are not in the Vertex AI portion of the Google Cloud console, then do the following:

    1. In Search enter Vertex AI, then press return.

    2. In the search results, click Vertex AI

    Search for Vertex AI service

  3. In the left-navigation pane, under Tools, choose Workbench.

  4. Click User-managed notebooks.

  5. Locate your notebook and, next to its name, click Open JupyterLab. This opens the JupyterLab environment.

  6. In the JupyterLab environment, under Notebook, click Python 3.

    Open a Python notebook

  7. In the left-navigation pane of the JupyterLab environment, you see your new notebook. Its title is Untitled.ipynb. To rename it, right-click your new notebook, click Rename, and enter a new name.

Install the Vertex AI SDK for Python

After you open your JupyterLab user-managed notebooks instance, you must install the Vertex AI SDK for Python. You use the Vertex AI SDK for Python to make Vertex AI API calls that create your dataset, create your model, train and deploy your model, and make predictions with your model. For more information, see Use the Vertex AI SDK for Python.

When you install Vertex AI SDK for Python, other Google Cloud SDKs on which it's dependent are also installed. Two of those SDKs are used in this tutorial:

  • Cloud Storage - When you use the Vertex AI SDK for Python to make Vertex AI API calls, Vertex AI stores artifacts in a Cloud Storage bucket. The bucket is referred to as a staging bucket. You specify the staging bucket when you initialize the Vertex AI SDK for Python. For more information, see Python client for Google Cloud storage API.

  • BigQuery - Vertex AI trains your model using a BigQuery public dataset. The BigQuery SDK must be installed to access and download the dataset used in this tutorial. For more information, see BigQuery API client libraries.

To install the Vertex AI SDK for Python and its dependent SDKs, run the following code.

# Install the Vertex AI SDK
! pip3 install --upgrade --quiet google-cloud-aiplatform

The --quiet flag suppresses output so that only errors display, if there are any. The exclamation mark (!) indicates that this is a shell command.

Because this is the first code you're running in your new notebook, you enter it into the blank code cell at the top of your notebook. After you enter code in a code cell, click  Run the selected cells and advance or use the keyboard shortcut shift + enter to run the code.

Run code to install the SDK.

As you proceed through this tutorial, run code in the empty code cell that automatically appears below the most recently run code. If you want to manually add a new code cell, click the notebook file's  Insert a cell below button.

Add new code cell.

Set your project ID and region

In this step, you set your project ID and your region. You first assign them to variables so they can be easily referenced later in this tutorial. Next, you use the gcloud config command to set them for your Google Cloud session. Later, you use them and your Cloud Storage bucket URI to initialize the Vertex AI SDK for Python.

Set your project ID

To set your project ID, do the following:

  1. Locate your Google Cloud project ID. For more information, see Find your project ID.

  2. Run the following in a code cell in your notebook. In the code, replace MY_PROJECT_ID with the project ID you just located. The output this command generates is Updated property [core/project].

    project_id = "MY_PROJECT_ID"  # @param {type:"string"}
    # Set the project id
    ! gcloud config set project {project_id}
    

Set your region

This tutorial uses the us-central region. To set your region, do the following:

  1. Run the following code to set the region variable that's used by Vertex AI to us-central. This command doesn't generate output. For more information, see Choose your location.

    region = "us-central1"  # @param {type: "string"}
    

Create a Cloud Storage bucket

This tutorial requires a Cloud Storage bucket that's used by Vertex AI to stage artifacts. Vertex AI stores the data associated with the dataset you create and model resources in the staging bucket. This data is retained and available across sessions. In this tutorial, Vertex AI also stores your dataset in the staging bucket. You specify your staging bucket when you initialize the Vertex AI SDK for Python.

Every Cloud Storage bucket name must be globally unique. If you choose a name that's been used, the gsutil mb command to create your bucket fails. The following code uses a datetime stamp and your project name to create a unique bucket name. You append the bucket name to gs:// to create the URI for your Cloud Storage bucket. The echo shell command shows you the URI so you can verify it created correctly.

  1. To set your bucket's name and URI, run the following code. The last line displays the URI of your Cloud Storage bucket.

    bucket_name = "bucket-name-placeholder"  # @param {type:"string"}
    bucket_uri = f"gs://{bucket_name}"
    
    from datetime import datetime
    timestamp = datetime.now().strftime("%Y%m%d%H%M%S")
    
    if bucket_name == "" or bucket_name is None or bucket_name == "bucket-name-placeholder":
        bucket_name = project_id + "aip-" + timestamp
        bucket_uri = "gs://" + bucket_name
    ! echo $bucket_uri
    
  2. To create a bucket using the Cloud Storage client library and the bucket URI, run the following code. This code doesn't generate output.

    from google.cloud import storage
    client = storage.Client(project=project_id)
    
    # Create a bucket
    bucket = client.create_bucket(bucket_name, location=region)
    
  3. To verify your bucket created successfully, run the following:

    print("Bucket {} created.".format(bucket.name))
    

Initialize the Vertex AI SDK for Python

To initialize the Vertex AI SDK for Python, you first import its library, aiplatform. Next, you call aiplatform.init and pass in values for the following parameters:

  • project - The project specifies which Google Cloud project to use when you use the Vertex AI SDK for Python to make calls to the Vertex AI API. In this tutorial you specify your project with its name. You can also specify your project with its project number.

  • location - The location specifies which Google Cloud region to use when you make API calls. If you don't specify a location, the Vertex AI SDK for Python uses us-central1.

  • staging_bucket - The staging_bucket specifies which Cloud Storage bucket is used to stage artifacts when you use the Vertex AI SDK for Python. You specify the bucket with a URI that starts with gs://. In this tutorial, you use the URI created earlier in Create a Cloud Storage bucket.

To set your project, region, and staging bucket, run the following command. This command doesn't generate output.

from google.cloud import aiplatform

# Initialize the Vertex AI SDK
aiplatform.init(project=project_id, location=region, staging_bucket=bucket_uri)

Initialize BigQuery

This tutorial uses a BigQuery public dataset of penguins to train a model. After Vertex AI trains the model, you specify parameters that represent characteristics of penguins, and the model uses those characteristics to predict the species of penguin they represent. For more information about public datasets, see BigQuery public datasets.

Before you use the BigQuery dataset, you must initialize BigQuery with your project ID. To do this, run the following command. This command doesn't generate output.

from google.cloud import bigquery

# Set up BigQuery client
bq_client = bigquery.Client(project=project_id)