Using TensorFlow and JupyterHub in Classrooms

This tutorial describes how to enable JupyterHub to manage multiple TensorFlow instances on Google Container Engine.

In this example, a group of university art students are experimenting with DeepDream algorithms to render digital artwork using machine intelligence. Using Container Engine, Google Cloud Platform authentication, and Google Cloud Shell, you will configure a TensorFlow environment for each student using JupyterHub.

TensorFlow is a machine learning framework that Google open-sourced in 2015. One of the easiest ways to get started with TensorFlow is to use the instructions in the TensorFlow documentation to run the Docker container. The Docker image comes bundled with a Jupyter notebook to enable you to get machine learning experiments up and running quickly.

Jupyter is a popular tool for data scientists to create and run experiments using Python and other runtimes. Jupyter exposes a directory of notebooks, which are individual files containing a mashup of executable code, embedded charts, images, and text, all running within a wiki-style editor. The Google Datalab Python Package is a Jupyter extension that enables rich integration with Google Cloud Platform's big data tools, including Google BigQuery.

Jupyter’s rich online notebook experience, coupled with these easily accessible Docker images, makes it a natural fit for team projects, corporate training sessions, and university classrooms. However, it can be difficult to provision and manage multiple instances. In most scenarios, provisioning a Jupyter instance for each student is necessary because the scripts running from a single notebook can require a significant amount of CPU and memory. The complex nature of the TensorFlow runtime can further increase the need for dedicated compute resources.

The makers of Jupyter have built an additional tool, JupyterHub, that enables management of multiple Jupyter environments. This system manages the lifecycle of new instances of Jupyter for individual users and provides a common access gateway with secure authentication. Enabling JupyterHub will allow you to more easily manage users, automatically create instances, and more quickly provision the compute resources needed to power it all. This tutorial will walk you through the process of deploying everything on Container Engine by using an easy-to-use automation script found on GitHub.

Container Engine is a managed service built on Kubernetes, Google’s own open source container orchestration system. Container Engine enables you to quickly provision Docker containers running the necessary JupyterHub components as well as the secure SSL proxy. When new users log in to the JupyterHub system, Kubernetes will be directed to automatically create new Jupyter containers specifically for them.

Architectural Diagram

This diagram shows the complete architecture for how Jupyter containers on the Container Engine interact with Datalab and Tensorflow.

architectural diagram

Objectives

  • Configure TensorFlow instances with Jupyter on Container Engine.
  • Manage users in JupyterHub.
  • Generate original artwork through DeepDream algorithms.

Costs

This tutorial uses billable components of Cloud Platform, including:

  • Google Container Engine

Use the Pricing Calculator to generate a cost estimate based on your projected usage. New Cloud Platform users might be eligible for a free trial.

Before you begin

  1. Sign in to your Google account.

    If you don't already have one, sign up for a new account.

  2. Select or create a Cloud Platform project.

    Go to the Manage resources page

  3. Enable billing for your project.

    Enable billing

  4. Enable the Google Compute Engine API for the project you selected or created in the previous step.
    Enable the Compute Engine API

Cloning the sample code through Cloud Shell

The gke-jupyter-classroom GitHub repository contains all the code and configuration. To run all the initialization commands, you will use Cloud Shell, which is a fully functioning Linux shell accessible from directly within the Google Cloud Platform Console.

  1. Launch the Cloud Shell.

  2. Set your default compute zone. In the Cloud Shell, enter the following command:

    gcloud config set compute/zone us-east1-d

  3. Clone the lab repository:

    git clone https://github.com/GoogleCloudPlatform/gke-jupyter-classroom

  4. Enter the new project directory:

    cd gke-jupyter-classroom
    

Launching the file server

Because Docker containers are ephemeral, their local storage disappears if they crash or restart. To avoid the possibility of losing your Jupyter notebooks and JupyterHub configuration files, use Google Cloud Launcher to provision a Single Node File Server, which is a remote network file system (NFS) installed on a single Compute Engine virtual machine instance. After you deploy your file server, you can use the server as a storage target for your JupyterHub deployment.

  1. Use the Cloud Platform Console to provision your file server.

    Provision your file server

  2. Choose the Cloud Platform project in which you’d like to launch the file server.

  3. Choose the following parameters for the dimensions of your file server:

    Dimension Parameter/Instruction
    Deployment Name jupyterhub-filer
    Zone us-east1-d
    Machine type n1-standard-1
    Storage Name data
    Enable SMB Sharing Uncheck the checkbox
    Storage disk size 10 GB
    For this tutorial, you can choose a 10 GB disk, because you don't need high throughput
    and IOPS unless you are using this configuration for a big data project.
  4. Click Deploy and wait for the Cloud Platform Console to indicate that it's running.

  5. Take note of the Internal IP address of your new machine. You need this value in an upcoming step. You can find it by running this command in Cloud Shell:

    gcloud compute instances list
    

Reviewing the script and configurations

Navigate to the directory containing the cloned GitHub repository and list the files. Note the following files and directories:

  • gke-jupyter-classroom.sh deploys the configuration and offers several options to customize your deployment.
  • The proxy directory holds both a generic Nginx container definition and a Kubernetes manifest. The Kubernetes manifest loads the nginx.conf file from a Kubernetes ConfigMap volume. The docker-entrypoint.sh file substitutes any environment variables it finds in nginx.conf.
  • The jupyterhub/custom_manifests directory contains several example JSON files that the JupyterHub container can load and present to the user, allowing them to choose a specific Jupyter instance that you can define. The example files use Python-style parameter substitution to insert JupyterHub variables into the manifest before it gets sent to Kubernetes.
  • The jupyter directory contains two container definitions. One is the basic TensorFlow Jupyter image extended to include the DeepDream example notebook. The other is the TensorFlow Jupyter image extended to include Cloud Datalab Python Notebook Extension, the Python DataFlow SDK, and several other data science libraries.

Deploying JupyterHub

In this section, you use the files you cloned from GitHub to create and deploy all of the necessary configurations. The gke-jupyter-classroom.sh script simplifies the process of deploying everything, but you should take some time to review it. The script breaks down each step into a function to create the JupyterHub pod, the Nginx proxy pod, as well as the configmaps, secrets, services, SSL certificates, firewall rules, and ingress load balancer. After successfully executing, the script prints out your new domain name to open in your browser. The DNS service the script uses is called xip.io, which is a free service that cleverly uses your IP address as part of the domain name to automatically direct the DNS request back to that IP.

To deploy JupyterHub:

  1. Starting from Cloud Shell, navigate to the cloned Git repository directory gke-jupyter-classroom and execute the following command to show all the options for the script:

    ./gke-jupyter-classroom.sh -h
    

    The console prints out all of the actions available in the script, including deploying the solution, executing a dry run, tearing down the entire solution, and building the Docker images that accompany the solution.

  2. Next, execute the deploy action to create the entire configuration:

    ./gke-jupyter-classroom.sh -v --cluster-name jupytercluster1 --admin-user <your-gmail-address> --filer-ip <your filer ip> --zone <defaults to us-east1-d> --autoscale-nodes 6 deploy
    
  3. Wait for the script to prompt you to create OAuth credentials:

    #User Action Required: Please follow the instructions to create a Web Application OAuth configuration
    #: use the Cloud Console to access the API Manager credentials section
    #: https://console.cloud.google.com/apis/credentials
    #: use these values origins field:
    https://jhub.10.12.34.56.xip.io callback url:
    https://jhub.10.12.34.56.xip.io/hub/oauth_callback
    

  4. Go to the Create Client ID page in the Cloud Platform Console.

    Go to the Create Client ID page

  5. Set up your OAuth client ID by registering your new site as an app that accepts Google sign-in.

    1. Under Application type, choose Web application.
    2. Under Name, enter a client ID name.
    3. Under Restrictions, define your restrictions as follows:
      1. Set Authorized JavaScript origins to your domain name.
      2. Set Authorized redirect URIs to the callback URL from the console output.
    4. Click Create.
    5. Copy the client ID and enter it into the Cloud Shell terminal.
    6. Copy the client secret and enter it into the Cloud Shell terminal.
  6. Wait for the script to finish generating the SSL certificate files, which could take quite a while depending on available entropy in the Cloud Shell instance. The certificate will not be signed by a certificate authority, and therefore will cause your browser to issue a warning when first visiting the site. It is safe to ignore the warning for this demonstration. To learn about using trusted certificates, review the Production deployment considerations section below.

    When the script finishes, you’ll see a message with the IP and the URL of your new JupyterHub site. Visit the URL in your browser.

    -----------COMPLETED------------
    Static IP created: 10.12.34.56 using xip.io : https://jhub.10.12.34.56.xip.io
    

Signing in to JupyterHub

Congratulations, you’ve made it past the hard part and you’ve opened your browser to the new JupyterHub site.

  1. Click Sign in with Google to sign in to the JupyterHub site.
  2. Click Start My Server.
  3. Choose a Jupyter image:
    1. The default image contains TensorFlow and the DeepDream example notebook.
    2. The GCP-Tools image allows you to run the Datalab and Dataflow SDK alongside TensorFlow.

Managing users

In JupyterHub, on the upper right, click Control Panel. The Control Panel allows you to create users, stop and start their instances, or delete them and their instances.

Verifying the system

  1. Test that the system allows you to upload a new file by clicking Upload in the upper-right part of the screen. Select a fun or interesting color JPG photo from your computer. After opening the photo from the file dialog, your file should appear in the file list.
  2. Click the name and rename it to input.jpg, and then click Upload on the right side for your new file. You can test the Jupyter and TensorFlow setup by selecting the deepdream.ipynb notebook to launch it.

    Jupyter file list

    Scroll down to the bottom of the notebook and change

    img0 = PIL.Image.open('pilatus800.jpg')
    

    to

    img0 = PIL.Image.open('input.jpg')
    
  3. To run the entire notebook, open the Cell menu at the top, and click Run All.

  4. After your scripts have finished, you will see a couple different variations of your input image at the bottom of the notebook.

Congratulations, you have successfully used machine intelligence to create a unique piece of artwork on your Container Engine cluster.

Production deployment considerations

If you want to allocate more memory to a Jupyter instance or use your own domain or certificates, you need to adjust certain values before deploying the script.

Adding resources

For some projects, the default amount of memory and CPU allocated to a single user’s Jupyter instance will be too low. To adjust these values, you can open jupyterhub.yaml.tmp and edit the values for KUBESPAWN_CPU_LIMIT and KUBESPAWN_MEMORY_LIMIT prior to running the script. If you have already created your environment, you can modify the same limit parameters in the jupyterhub.yaml file that gets generated in the base directory in your Cloud Shell instance, and then run kubectl delete -f jupyterhub.yaml and kubectl apply -f jupyterhub.yaml. In either case, changing these values will recreate the JupyterHub instance with new CPU and memory limit settings. If a user’s instance has already been created, you can delete the user in the JupyterHub control panel, and then create the user again and have them re-authenticate.

Using your own domain

If you have your own public DNS server, you can use your own domain instead of xip.io. Follow the OAuth credential creation steps described in the Deploying JupyterHub section, specifying your domain name. After you've created your credentials, execute the deployment script with the --domain flag to specify your top level domain. The script will use jhub.<yourdomain>.com to configure the site and create a public, ephemeral IP address that you can manually change to a static IP in the {{ console_name_short }}. Register a new CNAME in your DNS server or jhub.<yourdomain>.com with that IP address.

Using Google Cloud SSL proxy

Cloud Platform provides a native Cloud SSL proxy solution that you can use in place of the third-party Nginx proxy. This feature is currently in beta. To use it, set the --use-ssl-proxy flag when you run the deployment script.

Using your own certificates

If you want to use your own trusted certificates, then you can look at Let’s Encrypt, or if you already have a * certificate, then follow these steps with the deployment script:

  1. Before running the script, uncomment this line in the proxy/nginx.conf file:

    #ssl_trusted_certificate /mnt/secure/trusted.crt
    
  2. Copy your cert files to the cloud shell instance:

    /tmp/tls.crt
    /tmp/tls.key
    /tmp/dhparam.pem
    /tmp/trusted.crt
    

  3. Execute the deployment script with these two additional flags:

    --skip-certs --signed-cert

Cleaning up

To avoid incurring charges to your Cloud Platform account for the resources used in this tutorial, run the same script with the following options:

./gke-jupyter-classroom.sh --cluster-name jupytercluster1 teardown

The script prints out a script to manually delete the Container Engine cluster. You will also want to delete your file server.

What's next

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...