Automatically bootstrap GKE nodes with DaemonSets

Standard

This tutorial shows how to customize the nodes of a Google Kubernetes Engine (GKE) cluster by using DaemonSets. A DaemonSet ensures that all (or selected) nodes run a copy of a Pod. This approach lets you use the same tools to orchestrate your workloads that you use to modify your GKE nodes.

If the tools and systems you use to initialize your clusters are different from the tools and systems you use to run your workloads, you increase the effort it takes to manage your environment. For example, if you use a configuration management tool to initialize the cluster nodes, you're relying on a procedure that's outside the runtime environment where the rest of your workloads run.

The goal of this tutorial is to help system administrators, system engineers, or infrastructure operators streamline the initialization of Kubernetes clusters.

For this tutorial, you need to be familiar with the following tools:

In this tutorial, you learn to use Kubernetes labels and selectors to choose which initialization procedure to run based on the labels that are applied to a node. In these steps, you deploy a DaemonSet to run only on nodes that have the default-init label applied. However, to demonstrate the flexibility of this mechanism, you could create another node pool and apply the alternative-init label to the nodes in this new pool. In the cluster, you could then deploy another DaemonSet that is configured to run only on nodes that have the alternative-init label.

Also, you could run multiple initialization procedures on each node, not just one. You can leverage this mechanism to better structure your initialization procedures, clearly separating the concerns of each one.

In this tutorial, as an example, the initialization procedure performs the following actions on each node that is labeled with the default-init label:

Attaches an additional disk to the node.
Installs a set of packages and libraries by using the node's operating system package manager.
Loads a set of Linux kernel modules.

Objectives

In this tutorial you do the following:

Provision and configure a GKE cluster.
Prepare a DaemonSet descriptor to initialize the nodes in the cluster.
Deploy the DaemonSet in the cluster.
Verify that the cluster nodes have been initialized.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, see Clean up.

Before you begin

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

Bootstrap the environment

In this section, you do the following:

Enable the necessary Cloud APIs.
Provision a service account with limited privileges for the nodes in the GKE cluster.
Prepare the GKE cluster.
Grant the user cluster administration privileges.

Enable Cloud APIs

Open Cloud Shell.

OPEN Cloud Shell
Select the Google Cloud project:
```
gcloud config set project project-id
```
Replace project-id with the ID of the Google Cloud project that you created or selected for this tutorial.

Enable the Google Kubernetes Engine API:

gcloud services enable container.googleapis.com

Provision a service account to manage GKE clusters

In this section, you create a service account that is associated with the nodes in the cluster. In this tutorial, GKE nodes use this service account instead of the default service account. As a best practice, grant the service account just the roles and access permissions that are required to run the application.

The roles required for the service account are as follows:

Monitoring Viewer role (roles/monitoring.viewer). This role gives read-only access to the Cloud Monitoring console and API.
Monitoring Metric Writer role (roles/monitoring.metricWriter). This role permits writing monitoring data.
Logs Writer role (roles/logging.logWriter). This role gives just enough permissions to write logs.
Service Account User role (roles/iam.serviceAccountUser). This role gives access to service accounts in a project. In this tutorial, the initialization procedure impersonates the service account to run privileged operations.
Compute Admin role (roles/compute.admin). This role provides full control of all Compute Engine resources. In this tutorial, the service account needs this role to attach additional disks to cluster nodes.

To provision a service account, follow these steps:

In Cloud Shell, initialize an environment variable that stores the service account name:
```
GKE_SERVICE_ACCOUNT_NAME=ds-init-tutorial-gke
```

Create a service account:

gcloud iam service-accounts create "$GKE_SERVICE_ACCOUNT_NAME" \
  --display-name="$GKE_SERVICE_ACCOUNT_NAME"

Initialize an environment variable that stores the service account email account name:

GKE_SERVICE_ACCOUNT_EMAIL="$(gcloud iam service-accounts list \
    --format='value(email)' \
    --filter=displayName:"$GKE_SERVICE_ACCOUNT_NAME")"

Bind the Identity and Access Management (IAM) roles to the service account:

gcloud projects add-iam-policy-binding \
    "$(gcloud config get-value project 2> /dev/null)" \
    --member serviceAccount:"$GKE_SERVICE_ACCOUNT_EMAIL" \
    --role roles/compute.admin
gcloud projects add-iam-policy-binding \
    "$(gcloud config get-value project 2> /dev/null)" \
    --member serviceAccount:"$GKE_SERVICE_ACCOUNT_EMAIL" \
    --role roles/monitoring.viewer
gcloud projects add-iam-policy-binding \
    "$(gcloud config get-value project 2> /dev/null)" \
    --member serviceAccount:"$GKE_SERVICE_ACCOUNT_EMAIL" \
    --role roles/monitoring.metricWriter
gcloud projects add-iam-policy-binding \
    "$(gcloud config get-value project 2> /dev/null)" \
    --member serviceAccount:"$GKE_SERVICE_ACCOUNT_EMAIL" \
    --role roles/logging.logWriter
gcloud projects add-iam-policy-binding \
    "$(gcloud config get-value project 2> /dev/null)" \
    --member serviceAccount:"$GKE_SERVICE_ACCOUNT_EMAIL" \
    --role roles/iam.serviceAccountUser

Prepare the GKE cluster

In this section, you launch the GKE cluster, grant permissions, and finish the cluster configuration.

For this tutorial, a cluster with a relatively low number of small, general purpose nodes is enough to demonstrate the concept of this tutorial. You create a cluster with one node pool (the default one). Then you label all the nodes in the default node pool with the default-init label.

In Cloud Shell, create and launch a regional GKE cluster:

gcloud container clusters create ds-init-tutorial \
    --enable-ip-alias \
    --image-type=ubuntu_containerd \
    --machine-type=n1-standard-2 \
    --metadata disable-legacy-endpoints=true \
    --node-labels=app=default-init \
    --node-locations us-central1-a,us-central1-b,us-central1-c \
    --no-enable-basic-auth \
    --no-issue-client-certificate \
    --num-nodes=1 \
    --region us-central1 \
    --service-account="$GKE_SERVICE_ACCOUNT_EMAIL"

Deploy the DaemonSet

In this section, you do the following:

Create the ConfigMap that stores the initialization procedure.
Deploy the DaemonSet that schedules and executes the initialization procedure.

The DaemonSet does the following:

Configures a volume that makes the contents of the ConfigMap available to the containers that the DaemonSet handles.
Configures the volumes for privileged file system areas of the underlying cluster node. These areas let the containers that the DaemonSet schedules directly interact with the node that runs them.
Schedules and runs an init container that executes the initialization procedure and then is terminated upon completion.
Schedules and runs a container that stays idle and consumes no resources.

The idle container ensures that a node is initialized only once. DaemonSets are designed so that all eligible nodes run a copy of a Pod. If you use a regular container, that container runs the initialization procedure and is then terminated upon completion. By design, the DaemonSet reschedules the Pod. To avoid "continuous rescheduling," the DaemonSet first executes the initialization procedure in an init container, and then leaves a container running.

The following initialization procedure contains privileged and unprivileged operations. By using chroot, you can run commands as if you were executing them directly on the node, not just inside a container.

# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: entrypoint
  labels:
    app: default-init
data:
  entrypoint.sh: |
    #!/usr/bin/env bash

    set -euo pipefail

    DEBIAN_FRONTEND=noninteractive
    ROOT_MOUNT_DIR="${ROOT_MOUNT_DIR:-/root}"

    echo "Installing dependencies"
    apt-get update
    apt-get install -y apt-transport-https curl gnupg lsb-release

    echo "Installing gcloud SDK"
    export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)"
    echo "deb https://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
    curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
    apt-get update
    apt-get install -y google-cloud-sdk

    echo "Getting node metadata"
    NODE_NAME="$(curl -sS http://metadata.google.internal/computeMetadata/v1/instance/name -H 'Metadata-Flavor: Google')"
    ZONE="$(curl -sS http://metadata.google.internal/computeMetadata/v1/instance/zone -H 'Metadata-Flavor: Google' | awk -F  "/" '{print $4}')"

    echo "Setting up disks"
    DISK_NAME="$NODE_NAME-additional"

    if ! gcloud compute disks list --filter="name:$DISK_NAME" | grep "$DISK_NAME" > /dev/null; then
        echo "Creating $DISK_NAME"
        gcloud compute disks create "$DISK_NAME" --size=1024 --zone="$ZONE"
    else
        echo "$DISK_NAME already exists"
    fi

    if ! gcloud compute instances describe "$NODE_NAME" --zone "$ZONE" --format '(disks[].source)' | grep "$DISK_NAME" > /dev/null; then
        echo "Attaching $DISK_NAME to $NODE_NAME"
        gcloud compute instances attach-disk "$NODE_NAME" --device-name=sdb --disk "$DISK_NAME" --zone "$ZONE"
    else
        echo "$DISK_NAME is already attached to $NODE_NAME"
    fi

    # We use chroot to run the following commands in the host root (mounted as the /root volume in the container)
    echo "Installing nano"
    chroot "${ROOT_MOUNT_DIR}" apt-get update
    chroot "${ROOT_MOUNT_DIR}" apt-get install -y nano

    echo "Loading Kernel modules"
    # Load the bridge kernel module as an example
    chroot "${ROOT_MOUNT_DIR}" modprobe bridge
...

We recommend that you carefully review each initialization procedure, because the procedure could alter the state of the nodes of your cluster. Only a small group of individuals should have the right to modify those procedures, because those procedures can greatly affect the availability and the security of your clusters.

To deploy the ConfigMap and the DaemonSet, do the following:

In Cloud Shell, change the working directory to the $HOME directory:
```
cd "$HOME"
```
Clone the Git repository that contains the scripts and the manifest files to deploy and configure the initialization procedure:
```
git clone https://github.com/GoogleCloudPlatform/solutions-gke-init-daemonsets-tutorial
```
Change the working directory to the newly cloned repository directory:
```
cd "$HOME"/solutions-gke-init-daemonsets-tutorial
```
Create a ConfigMap to hold the node initialization script:
```
kubectl apply -f cm-entrypoint.yaml
```
Deploy the DaemonSet:
```
kubectl apply -f daemon-set.yaml
```
Caution: This DaemonSet deploys a privileged init container. For this reason, you should use a non-production cluster for this tutorial.

Verify that the node initialization is completed:

kubectl get ds --watch

Wait for the DaemonSet to be reported as ready and up to date, as indicated by output similar to the following:

NAME              DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
node-initializer   3         3         3         3            3          <none>  2h

Validate and verify the initialization procedure

After each node of the cluster marked with the default-init label executes the initialization procedure, you can verify the results.

For each node, the verification procedure checks for the following:

An additional disk is attached and ready to be used.
The node's operating system package manager installed packages and libraries.
Kernel modules are loaded.

Execute the verification procedure:

In Cloud Shell, run the verification script:

kubectl get nodes -o=jsonpath='{range .items[?(@.metadata.labels.app=="default-init")]}{.metadata.name}{" "}{.metadata.labels.failure-domain\.beta\.kubernetes\.io/zone}{"\n"}{end}' | while IFS= read -r line ; do ./verify-init.sh $line < /dev/null; done

Wait for the script to run and check that each node has been correctly initialized, as indicated by output like the following:

Verifying gke-ds-init-tutorial-default-pool-5464b7e3-nzjm (us-central1-c) configuration
Disk configured successfully on gke-ds-init-tutorial-default-pool-5464b7e3-nzjm (us-central1-c)
Packages installed successfully in gke-ds-init-tutorial-default-pool-5464b7e3-nzjm (us-central1-c)
Kernel modules loaded successfully on gke-ds-init-tutorial-default-pool-5464b7e3-nzjm (us-central1-c)
Verifying gke-ds-init-tutorial-default-pool-65baf745-0gwt (us-central1-a) configuration
Disk configured successfully on gke-ds-init-tutorial-default-pool-65baf745-0gwt (us-central1-a)
Packages installed successfully in gke-ds-init-tutorial-default-pool-65baf745-0gwt (us-central1-a)
Kernel modules loaded successfully on gke-ds-init-tutorial-default-pool-65baf745-0gwt (us-central1-a)
Verifying gke-ds-init-tutorial-default-pool-6b125c50-3xvl (us-central1-b) configuration
Disk configured successfully on gke-ds-init-tutorial-default-pool-6b125c50-3xvl (us-central1-b)
Packages installed successfully in gke-ds-init-tutorial-default-pool-6b125c50-3xvl (us-central1-b)
Kernel modules loaded successfully on gke-ds-init-tutorial-default-pool-6b125c50-3xvl (us-central1-b)

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, you could delete the project you created for this tutorial. If you created a project dedicated to this tutorial, you can delete it entirely. If you used an existing project but don't want to delete it, use the following steps for cleaning up the project.

Clean up the project

To clean up a project without deleting it, you need to remove the resources that you created in this tutorial.

In Cloud Shell, delete the GKE cluster:

gcloud container clusters delete ds-init-tutorial --quiet --region us-central1

Delete the additional disks that you created as part of this example initialization procedure:

gcloud compute disks list --filter="name:additional" --format="csv[no-heading](name,zone)" | while IFS= read -r line ; do DISK_NAME="$(echo $line | cut -d',' -f1)"; ZONE="$(echo $line | cut -d',' -f2)"; gcloud compute disks delete "$DISK_NAME" --quiet --zone "$ZONE" < /dev/null; done

Delete the service account:

gcloud iam service-accounts delete "$GKE_SERVICE_ACCOUNT_EMAIL" --quiet

Delete the cloned repository directory:

rm -rf "$HOME"/solutions-gke-init-daemonsets-tutorial

Delete the project

The easiest way to eliminate billing is to delete the project you created for the tutorial.

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.

What's next

Read about GKE.
Implement a secure software supply chain.
Learn how you can harden your GKE cluster's security.
Explore reference architectures, diagrams, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.