A Deep Learning (DL) container is a Docker container with preinstalled data science frameworks, libraries, and tools. A user, such as a data scientist, chooses a single DL container and deploys it. The deployed container has a performance-optimized, consistent environment that helps with quickly prototyping and implementing workflows.
Deploy a DL container
Before using a DL container, you must choose and deploy a container image that runs your Machine Learning (ML) task. Preinstalled frameworks, libraries, and tools provision a DL container.
Deploy a DL container using kubeconfig
files
Google Distributed Cloud (GDC) air-gapped appliance provides the following kubeconfig
files
that deploy your DL container:
ADMIN_CLUSTER_KUBECONFIG
: This file connects to an admin cluster. Each organization has an admin cluster. The admin cluster hosts some main components, such as the GDC console and the Harbor registry. Each organization has multiple user clusters. The admin cluster controls all of the user clusters within an organization. TheADMIN_CLUSTER_KUBECONFIG_PATH
is the file path to the admin clusterkubeconfig
file. For example, the path might be~/home/org-admin-kubeconfig
.USER_CLUSTER_KUBECONFIG
: This file connects to a user cluster. You choose a DL container to deploy to this cluster. TheUSER_CLUSTER_KUBECONFIG_PATH
is the file path to the user clusterkubeconfig
file. For example, the path might be~/home/user-vm-1-kubeconfig
.
For more information about signing into the UI and the kubectl
tool, see
Sign in.
In order to retrieve the ADMIN_CLUSTER_KUBECONFIG
and USER_CLUSTER_KUBECONFIG
files, see Get a kubeconfig
file.
Download the sample Machine Learning (ML) script and dataset
Download the sample ML script, beginner.ipynb
, and dataset, mnist.npz
, to
run the ML quickstart tutorial. The tutorial demonstrates how to deploy and use
a DL container to run ML experiments.
mkdir -p /tmp/datasets
cd /tmp/datasets
wget --no-check-certificate
https://GDCH_Appliance_URL/.well-known/static/dl-container-tutorial/beginner.ipynb
wget --no-check-certificate
https://GDCH_Appliance_URL/.well-known/static/dl-container-tutorial/mnist.npz
Replace GDCH_Appliance_URL
with the domain name used to access
GDC. When opening any URL for the first
time, GDC redirects you to your identity provider
login page.
Look up the IP address of the Harbor registry
Before using the sample script and dataset, you must find the DL container image location in the Harbor registry. The Harbor registry is a service that stores private container images.
The first line of sample code sets the path of the KUBECONFIG
environment variable, which gets the Harbor registry address. The Harbor
registry address provides access to a list of available container images.
In the second line of sample code, the kubectl
tool uses the KUBECONFIG
environment variable.
In the third line of sample code, the REGISTRY_URL#https://
command removes the
prefix https://
from the URL and stores the Harbor registry domain in the
REGISTRY_IP
environment variable.
In the last line of the sample code, the kubectl tool fetches the password for
the admin
user.
export KUBECONFIG=ADMIN_CLUSTER_KUBECONFIG_PATH
REGISTRY_URL=$(kubectl get harborcluster harbor -n harbor-system -o=jsonpath='{.spec.externalURL}')
REGISTRY_IP=${REGISTRY_URL#https://}
ADMIN_PASS=$(kubectl -n harbor-system get secret harbor-admin -o jsonpath="{.data.secret}" | base64 -d)
Choose a container image
You must choose a container image to deploy before you can run an ML task. Use the Harbor registry domain and the path in the following table from the Harbor registry to view the list of available container images:
Framework | Processor | Container Image Name |
---|---|---|
Base | GPU | base-cu113 |
Base | CPU | base-cpu |
TensorFlow Enterprise 2.x | GPU | tf2-gpu |
PyTorch | GPU | pytorch-gpu |
This table is organized by framework and processor. To choose a DL container image that can process your ML experiment, follow these steps:
- Identify the framework, which contains the ML tools.
- Choose the processor. You choose the processor based on the kind of ML task to run and the compute intensity of that task. For example, choose one of the GPU processors when you have a compute-intensive ML task, and allocate a GPU resource to the DL container.
Create and deploy a DL container to the user cluster
To create the GDC instance, specify the path to the
kubeconfig
file of the user cluster. The KUBECONFIG
environment variable
specifies to which cluster the kubectl
tool deploys the DL container. The
kubectl apply
command deploys the DL container instance.
Replace CONTAINER_IMAGE_NAME
with the image selected from the
list of images in Choose a container image, and be sure to
supply the tag.
export KUBECONFIG=USER_CLUSTER_KUBECONFIG_PATH
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: dl-container-pod
namespace: NAMESPACE
spec:
containers:
- image: gcr.io/private-cloud-staging/notebooks/deeplearning-platform-release/CONTAINER_IMAGE_NAME:CONTAINER_IMAGE_TAG
command: ["tail", "-f", "/dev/null"]
name: training
EOF
Use a deployed DL container
The following topics provide an example of how to use a DL container image to train and use a model to generate predictions.
Copy the tutorial files to the DL container pod
Copy the quickstart tutorial files into your DL container pod. The beginner.ipynb
contains the steps to train and use a model to make predictions.
The ML training tutorial uses the mnist.npz
data set file to train a model.
cd /tmp/datasets
kubectl cp beginner.ipynb dl-container-pod:/tmp
kubectl cp mnist.npz dl-container-pod:/tmp
Run the ML quickstart tutorial
Run the tutorial with the following commands. Use the first line to enter the
container pod. After you are in the container pod, change directory to tmp
,
and run the papermill
tool that is packaged in the DL container.
The papermill
tool runs the tutorial to create a notebook that generates
predictions.
Enter an interactive terminal into the DL pod:
kubectl exec -it dl-container-pod -- /bin/bash
In the DL pod context, run the following commands:
cd tmp papermill beginner.ipynb result.ipynb
The deployment generates a
result.ipynb
file in the/tmp
directory.View the content and the prediction outputs from the generated ML model:
cat result.ipynb
Optional: Delete the DL container pod
After you finish running your experiment in the DL container pod, delete the pod as a best practice:
kubectl delete pod dl-container-pod