Tutorial: Spawning notebook servers on Google Kubernetes Engine

This tutorial shows how to use Google Kubernetes Engine (GKE) to install a Jupyter notebook platform that uses JupyterHub as an orchestrator for notebook servers. The document is intended for administrators who build Jupyter notebook environments for data science research. It assumes that you are familiar with Jupyter notebooks and with GKE.

This article is the third part of a series that discusses how to choose a notebook platform on Google Cloud, how to customize the JupyterHub open source code, and then how to run the modified version on one of Google Cloud infrastructure options.

The series includes the following documents:

This tutorial is relevant if the following apply:

  • Notebooks (Vertex AI and AI Platform) does not meet your needs.
  • You are an administrator in your company.
  • You can create a new GKE cluster or use an existing one.
  • You want to consolidate costs for notebook servers.
  • You need to centrally manage configurations for your end users (data scientists).

In this tutorial, you deploy a simple version of the example architecture that's described in the introduction to this series:

Architecture of the solution created in this tutorial.

In this architecture:

  1. An administrator creates Docker images for GKE Hub and for notebook servers and stores the images in Container Registry.
  2. An administrator creates or reuses a GKE cluster to deploy GKE Hub using the following:
    • A Docker image forJupyterHub. The JupyterHub image contains a custom authenticator and a modified version of KubeSpawner.
    • A Docker image for the Inverting Proxy agent. This agent communicates with a Google-managed Inverting Proxy server to manage authentication.
  3. An administrator requests an Inverting Proxy URL to share with data scientists.
  4. A data scientist uses the Inverting Proxy URL to sign in to the JupyterHub UI and then creates one or more sandboxed notebook environments. The user does this by choosing a Docker image that contains a notebook server as provided by an administrator.

Objectives

  • Launch a GKE cluster.
  • Deploy JupyterHub on GKE.
  • Let users authenticate transparently to JupyterHub by using their Google Cloud credentials.
  • Create a notebook server image.
  • Spawn multiple notebooks for the same user.

Costs

This tutorial uses the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.

  4. In the Cloud Console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Cloud Console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Cloud SDK already installed, including the gcloud command-line tool, and with values already set for your current project. It can take a few seconds for the session to initialize.

Setting up your environment

  1. In Cloud Shell, export your Google project ID to an environment variable, replacing YOUR_PROJECT_ID with the ID of your project:

    export PROJECT_ID=YOUR_PROJECT_ID
    
  2. Set your default Google Cloud project ID in Cloud Shell:

      gcloud config set project ${PROJECT_ID}
    
  3. Enable the APIs that you need for this tutorial:

    gcloud services enable \
        compute.googleapis.com \
        container.googleapis.com \
        cloudbuild.googleapis.com \
        containerregistry.googleapis.com
    
  4. Clone the Notebooks Extended repository:

    git clone https://github.com/GoogleCloudPlatform/ai-notebooks-extended.git
    
  5. Go to the gke-hub folder:

    cd ai-notebooks-extended/gke-hub-example/deploy/manually
    
  6. Install the kubectl, Minikube, and Kustomize tools:

    bash 00-install-tools.sh
    

    You use these utilities in the following ways:

    • kubectl helps you interact with your Kubernetes environment (locally or in the cloud) using commands that facilitate automation through scripts.
    • minikube lets you run a local Kubernetes environment in order to test and debug your Kubernetes environment before you deploy it in the cloud.
    • kustomize lets you customize Kubernetes declarative files in order to facilitate the deployment of the same Kubernetes environment in different contexts.
  7. Optionally, edit the 10-set-variables.sh file and change the default values to customize the deployment.

    You can change any of the following values:

    • CLUSTER_NAME: the name of your GKE cluster. The default is kstd.
    • IMAGES_JUPYTER: a list of Docker images for notebook servers. For each notebook image that you create, you need to append the relevant image names. Each item in the list acts as the image name locally or as a Container Registry image name for GKE. The default is a list that contains one notebook server for test purposes.
    • IMAGES_THIRD_PARTY: A list of existing image URIs from the registry of your choice. The default is an empty list.
    • WID: A flag to indicate whether you want to create and use a cluster by using Workload Identity. The default is false to indicate a cluster that does not have Workload Identity enabled.

Preparing the infrastructure

  1. Create a GKE cluster and update the kubectl configuration file that has relevant credentials and endpoint references for the kubectl command:

    bash 20-create-infrastructure.sh
    
  2. Create a Jupyter notebook Docker image that JupyterHub can spawn on a GKE cluster:

    bash 15-create-jupyter-image.sh \
        gke \
        jupyter-mine-basic \
        gcr.io/${PROJECT_ID}/jupyter-mine-basic
    

    This command deploys an existing jupyter-mine-basic Docker configuration to Container Registry as an example.

  3. Run the script from the preceding step for each notebook server image that you want to use, replacing jupyter-mine-basic with the name of the image you're using.

For more information about notebook configurations and profiles, see the profile_list parameter in the KubeSpawner documentation.

Deploying GKE Hub

  1. Build and push to Container Registry the JupyterHub and Inverting Proxy agent images and then deploy them on the GKE cluster:

    bash 30-deploy-gke-workloads.sh gke true
    

    The true flag forces the script to build and push the Docker images to Container Registry. This is required the first time that you run the script and whenever you want to overwrite your existing images.

    Because the flag is set to true in order to build images, this step can take several minutes.

  2. Get the URL of the Inverting Proxy:

    bash 40-get-hub-url.sh
    

    The command returns a URL that's similar to the following:

    1a2b3cde4fgh5i6j-dot-us-west1.notebooks.googleusercontent.com
    

    This URL lets an authenticated user access the JupyterHub interface.

  3. Copy the URL that you got from the previous command.

  4. In a web browser, enter the URL that you copied.

  5. In the Jupyter Options list, click the first Jupyter notebook server option.

    List of Jupyter notebook server options.

  6. Click Start.

  7. Wait for the browser to redirect to Jupyter. When the redirect is done, you see a listing like the following:

    JupyterHub displayed in browser.

Provide GKE Hub access to your users

  1. Ensure that all users of the notebook deployment have the serviceAccountUser role for the service account of the agent workload.
  2. Share the URL to the group of data scientists that you created the hub instance for.

    Users can then create personal notebook servers based on the configurations that you provided.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, you can delete the resources you've created. You can either delete the resources individually, or you can delete the Google Cloud project, which deletes the resources in that project.

Delete individual resources

  1. Delete Kubernetes items:

    bash 93-delete-gke-workloads.sh
    
  2. Delete the Kubernetes cluster:

    source 10-set-variables.sh
    gcloud container clusters delete ${CLUSTER_NAME} \
      --project ${PROJECT_ID} \
      --zone ${ZONE}
    

Delete the project

  1. In the Cloud Console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next

  • Explore reference architectures, diagrams, tutorials, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.