Use secondary boot disks to preload data or container images


This page shows you how to improve workload startup latency by using secondary boot disks in Google Kubernetes Engine (GKE). With secondary boot disks, you can preload data or container images on new nodes. This enables workloads to achieve fast cold start and to improve the overall utilization of provisioned resources.

Overview

Starting in version 1.28.3-gke.1067000, you can configure the node pool with secondary boot disks. You can tell GKE to provision the nodes and preload them with data, such as a machine learning model, or a container image. Using preloaded data or a container image in a secondary disk has the following benefits for your workloads:

  • Faster autoscaling
  • Reduced latency when pulling large images
  • Quicker recovery from disruptions like maintenance events and system errors

Before you begin

Before you start, make sure you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.

Requirements

The following requirements apply to using secondary boot disk:

  1. The feature is available in GKE version of 1.28.3-gke.106700 and later.
  2. When you modify the disk image, you must create a new node pool. You can't update the disk image on existing nodes.

  3. You must configure Image streaming to use the secondary boot disk feature.

Configure the secondary boot disk

The following sections describe how to configure the secondary boot disk:

Preload data

Before you create the GKE cluster and node pool with a secondary boot disk, we recommend that you prepare the disk image when the data is ready during build time, ideally automated in a CI/CD pipeline.

Prepare the disk image that contains the data

Create a custom disk image as the data source by completing the following steps:

  1. Create a VM with a blank disk.
  2. SSH into the VM.
    1. Mount the blank disk.
    2. Download the data onto the blank disk.
  3. Create a custom image from the disk.

Create the GKE cluster and node pool with a secondary boot disk

You can configure a secondary boot disk by using the gcloud CLI:

  1. Create a GKE Standard cluster with image streaming enabled by using the --enable-image-streaming flag:

    gcloud container clusters create CLUSTER_NAME \
        --location LOCATION \
        --cluster-version=CLUSTER_VERSION \
        --enable-image-streaming
    

    Replace the following:

    • CLUSTER_NAME: The name of your cluster.
    • LOCATION: The cluster location.
    • CLUSTER-VERSION: the GKE version to use. Must be 1.28.3-gke.106700 or later.
  2. Create a node pool with a secondary boot disk by using the --secondary-boot-disk=disk-image flag:

    gcloud beta container node-pools create NODE_POOL_NAME \
        --cluster=CLUSTER_NAME \
        --location LOCATION \
        --enable-image-streaming \
        --secondary-boot-disk=disk-image=global/images/DATA_DISK IMAGE
    

    Replace the DISK_IMAGE_NAME with the name of your disk image.

    GKE creates a node pool where each node has a secondary disk with preloaded data. This attaches and mounts the secondary boot disk onto the node.

  3. Optionally, you can mount the secondary disk image in the Pod containers using a hostPath volume mount. Use the following manifest to define a Pod resources and use a hostPath volume mount to preload the data disk in its containers:

    apiVersion: v1
    kind: Pod
    metadata:
      name: pod-name
    spec:
      containers:
      ...
      volumeMounts:
      - mountPath: /usr/local/data_path_sbd
        name: data_path_sbd
    ...
    volumes:
      - name: data_path_sbd
        hostPath:
            path: /mnt/disks/gke-secondary-disks/gke-DISK_IMAGE_NAME-disk
    

    Replace the DISK_IMAGE_NAME with the name of your disk image.

Preload the container image

In this guide, you use gke-disk-image-builder to create a VM instance and pull the container images on a disk. The gke-disk-image-builder creates a disk image from that disk. We recommend that you prepare the disk image right after the container image build step, ideally automated in a CI/CD pipeline.

  1. Create a Cloud Storage bucket to store the execution logs of gke-disk-image-builder.
  2. Create a disk image with preloaded container images.

    go run ./cli \
        --project-name=PROJECT_ID \
        --image-name=DISK_IMAGE_NAME \
        --zone=LOCATION \
        --gcs-path=gs://LOG_BUCKET_NAME \
        --disk-size-gb=10 \
        --container-image=docker.io/library/python:latest \
        --container-image=docker.io/library/nginx:latest
    

    Replace the following:

    • PROJECT_ID: The name of your Google Cloud project.
    • DISK_IMAGE_NAME: The name of the image of the disk. For example, nginx-python-image.
    • LOCATION: The cluster location.
    • LOG_BUCKET_NAME: The name of the Cloud Storage bucket to store the execution logs. For example, gke-secondary-disk-image-logs/.

  3. Create a GKE Standard cluster with image streaming enabled:

    gcloud container clusters create CLUSTER_NAME \
        --location=LOCATION \
        --cluster-version=CLUSTER_VERSION \
        --enable-image-streaming
    
  4. Create a node pool with a secondary boot disk:

    gcloud beta container node-pools create NODE_POOL_NAME \
    --cluster=CLUSTER_NAME \
    --location=LOCATION \ \
    --enable-image-streaming \
    --secondary-boot-disk=disk-image=global/images/DISK_IMAGE_NAME,mode=CONTAINER_IMAGE_CACHE
    
  5. Add a nodeSelector to your Pod template:

    nodeSelector:
        cloud.google.com/gke-nodepool: NODE_POOL_NAME
    
  6. Confirm that the secondary boot disk cache is in use:

    kubectl get events --all-namespaces
    

    The output is similar to the following:

    75s         Normal      SecondaryDiskCachin
    node/gke-pd-cache-demo-default-pool-75e78709-zjfm   Image
    gcr.io/k8s-staging-jobsejt/pytorch-mnist:latest is backed by secondary disk cache
    

    The expected image pull latency for the cached container image should be no more than a few seconds, regardless of image size. You can check the image pull latency by running the following command:

    kubectl describe pod POD_NAME
    

    Replace POD_NAME with the name of the Pod.

    The output is similar to following:

    …
      Normal  Pulled     15m   kubelet            Successfully pulled image "docker.io/library/nginx:latest" in 0.879149587s
    …
    

What's next