Try GKE on Bare Metal on Compute Engine VMs

This page shows you how to try GKE on Bare Metal in High Availability (HA) mode using Virtual Machines (VMs) running on Compute Engine.

You can try out GKE on Bare Metal quickly and without having to prepare any hardware. Completing the steps on this page provides you with working test environment running on Compute Engine for your GKE on Bare Metal environment.

To try GKE on Bare Metal on Compute Engine VMs, complete the following steps:

  1. Create six VMs in Compute Engine
  2. Create a vxlan network between all VMs with L2 connectivity
  3. Install prerequisites for GKE on Bare Metal
  4. Deploy an GKE on Bare Metal cluster
  5. Verify your cluster

Before you begin

The deployment requires the following resources:

  • Six VMs to deploy GKE on Bare Metal
  • One workstation that is logged into gcloud with Owner permissions for your project.

The steps in this guide are taken from the installation script in the anthos-samples repository. The FAQ section has more information on how to customize this script to work with some popular variations.

Create six VMs in Compute Engine

Complete these steps to create the following VMs:

  • One admin VM used to deploy GKE on Bare Metal to the other machines.
  • Three VMs for the three control plane nodes needed to run the GKE on Bare Metal control plane.
  • Two VMs for the two worker nodes needed to run workloads on the GKE on Bare Metal cluster.
  1. Setup environment variables:

    export PROJECT_ID=PROJECT_ID
    export ZONE=ZONE
    export CLUSTER_NAME=CLUSTER_NAME
    export BMCTL_VERSION=1.12.9
    
  2. Run the following commands to log in with your Google account and set your project as the default:

    gcloud auth login
    gcloud config set project $PROJECT_ID
    gcloud config set compute/zone $ZONE
    
  3. Create the baremetal-gcr service account:

    gcloud iam service-accounts create baremetal-gcr
    
    gcloud iam service-accounts keys create bm-gcr.json \
        --iam-account=baremetal-gcr@"${PROJECT_ID}".iam.gserviceaccount.com
  4. Enable Google Cloud APIs and services:

    gcloud services enable \
        anthos.googleapis.com \
        anthosaudit.googleapis.com \
        anthosgke.googleapis.com \
        cloudresourcemanager.googleapis.com \
        connectgateway.googleapis.com \
        container.googleapis.com \
        gkeconnect.googleapis.com \
        gkehub.googleapis.com \
        serviceusage.googleapis.com \
        stackdriver.googleapis.com \
        monitoring.googleapis.com \
        logging.googleapis.com \
        opsconfigmonitoring.googleapis.com
  5. Give the baremetal-gcr service account additional permissions to avoid needing multiple service accounts for different APIs and services:

    gcloud projects add-iam-policy-binding "$PROJECT_ID" \
      --member="serviceAccount:baremetal-gcr@$PROJECT_ID.iam.gserviceaccount.com" \
      --role="roles/gkehub.connect" \
      --no-user-output-enabled
    
    gcloud projects add-iam-policy-binding "$PROJECT_ID" \
      --member="serviceAccount:baremetal-gcr@$PROJECT_ID.iam.gserviceaccount.com" \
      --role="roles/gkehub.admin" \
      --no-user-output-enabled
    
    gcloud projects add-iam-policy-binding "$PROJECT_ID" \
      --member="serviceAccount:baremetal-gcr@$PROJECT_ID.iam.gserviceaccount.com" \
      --role="roles/logging.logWriter" \
      --no-user-output-enabled
    
    gcloud projects add-iam-policy-binding "$PROJECT_ID" \
      --member="serviceAccount:baremetal-gcr@$PROJECT_ID.iam.gserviceaccount.com" \
      --role="roles/monitoring.metricWriter" \
      --no-user-output-enabled
    
    gcloud projects add-iam-policy-binding "$PROJECT_ID" \
      --member="serviceAccount:baremetal-gcr@$PROJECT_ID.iam.gserviceaccount.com" \
      --role="roles/monitoring.dashboardEditor" \
      --no-user-output-enabled
    
    gcloud projects add-iam-policy-binding "$PROJECT_ID" \
      --member="serviceAccount:baremetal-gcr@$PROJECT_ID.iam.gserviceaccount.com" \
      --role="roles/stackdriver.resourceMetadata.writer" \
      --no-user-output-enabled
    
    gcloud projects add-iam-policy-binding "$PROJECT_ID" \
      --member="serviceAccount:baremetal-gcr@$PROJECT_ID.iam.gserviceaccount.com" \
      --role="roles/opsconfigmonitoring.resourceMetadata.writer" \
      --no-user-output-enabled
  6. Create the variables and arrays needed for all the commands on this page:

    MACHINE_TYPE=n1-standard-8
    VM_PREFIX=abm
    VM_WS=$VM_PREFIX-ws
    VM_CP1=$VM_PREFIX-cp1
    VM_CP2=$VM_PREFIX-cp2
    VM_CP3=$VM_PREFIX-cp3
    VM_W1=$VM_PREFIX-w1
    VM_W2=$VM_PREFIX-w2
    declare -a VMs=("$VM_WS" "$VM_CP1" "$VM_CP2" "$VM_CP3" "$VM_W1" "$VM_W2")
    declare -a IPs=()
  7. Use the following loop to create six VMs:

    for vm in "${VMs[@]}"
    do
        gcloud compute instances create "$vm" \
          --image-family=ubuntu-2004-lts --image-project=ubuntu-os-cloud \
          --zone="${ZONE}" \
          --boot-disk-size 200G \
          --boot-disk-type pd-ssd \
          --can-ip-forward \
          --network default \
          --tags http-server,https-server \
          --min-cpu-platform "Intel Haswell" \
          --enable-nested-virtualization \
          --scopes cloud-platform \
          --machine-type "$MACHINE_TYPE" \
          --metadata "cluster_id=${CLUSTER_NAME},bmctl_version=${BMCTL_VERSION}"
        IP=$(gcloud compute instances describe "$vm" --zone "${ZONE}" \
             --format='get(networkInterfaces[0].networkIP)')
        IPs+=("$IP")
    done
  8. Use the following loop to verify that SSH is ready on all VMs:

    for vm in "${VMs[@]}"
    do
        while ! gcloud compute ssh root@"$vm" --zone "${ZONE}" --command "printf 'SSH to $vm succeeded\n'"
        do
            printf "Trying to SSH into %s failed. Sleeping for 5 seconds. zzzZZzzZZ" "$vm"
            sleep  5
        done
    done

Create a vxlan network with L2 connectivity between VMs

Use the standard vxlan functionality of Linux to create a network that connects all the VMs with L2 connectivity.

The following command contains two loops that perform the following actions:

  1. SSH into each VM
  2. Update and install needed packages
  3. Execute the required commands to configure the network with vxlan

    i=2 # We start from 10.200.0.2/24
    for vm in "${VMs[@]}"
    do
        gcloud compute ssh root@"$vm" --zone "${ZONE}" << EOF
            apt-get -qq update > /dev/null
            apt-get -qq install -y jq > /dev/null
            set -x
            ip link add vxlan0 type vxlan id 42 dev ens4 dstport 0
            current_ip=\$(ip --json a show dev ens4 | jq '.[0].addr_info[0].local' -r)
            printf "VM IP address is: \$current_ip"
            for ip in ${IPs[@]}; do
                if [ "\$ip" != "\$current_ip" ]; then
                    bridge fdb append to 00:00:00:00:00:00 dst \$ip dev vxlan0
                fi
            done
            ip addr add 10.200.0.$i/24 dev vxlan0
            ip link set up dev vxlan0
    
    EOF
        i=$((i+1))
    done

You now have L2 connectivity within the 10.200.0.0/24 network. The VMs have the following IP addresses:

  • Admin VM: 10.200.0.2
  • VMs running the control plane nodes:
    • 10.200.0.3
    • 10.200.0.4
    • 10.200.0.5
  • VMs running the worker nodes:
    • 10.200.0.6
    • 10.200.0.7

Install prerequisites for GKE on Bare Metal

The following tools are needed on the admin machine before installing GKE on Bare Metal:

  • bmctl
  • kubectl
  • Docker
  1. Run the following command to install the needed tools:

    gcloud compute ssh root@$VM_WS --zone "${ZONE}" << EOF
    set -x
    
    export PROJECT_ID=\$(gcloud config get-value project)
    BMCTL_VERSION=\$(curl http://metadata.google.internal/computeMetadata/v1/instance/attributes/bmctl_version -H "Metadata-Flavor: Google")
    export BMCTL_VERSION
    
    gcloud iam service-accounts keys create bm-gcr.json \
      --iam-account=baremetal-gcr@\${PROJECT_ID}.iam.gserviceaccount.com
    
    curl -LO "https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl"
    
    chmod +x kubectl
    mv kubectl /usr/local/sbin/
    mkdir baremetal && cd baremetal
    gsutil cp gs://anthos-baremetal-release/bmctl/$BMCTL_VERSION/linux-amd64/bmctl .
    chmod a+x bmctl
    mv bmctl /usr/local/sbin/
    
    cd ~
    printf "Installing docker"
    curl -fsSL https://get.docker.com -o get-docker.sh
    sh get-docker.sh
    EOF
  2. Run the following commands to ensure that root@10.200.0.x works. The commands perform these tasks:

    1. Generate a new SSH key on the admin machine.
    2. Add the public key to all the other VMs in the deployment.
    gcloud compute ssh root@$VM_WS --zone "${ZONE}" << EOF
    set -x
    ssh-keygen -t rsa -N "" -f /root/.ssh/id_rsa
    sed 's/ssh-rsa/root:ssh-rsa/' ~/.ssh/id_rsa.pub > ssh-metadata
    for vm in ${VMs[@]}
    do
        gcloud compute instances add-metadata \$vm --zone ${ZONE} --metadata-from-file ssh-keys=ssh-metadata
    done
    EOF

Deploy an GKE on Bare Metal cluster

The following code block contains all commands and configurations needed to complete the following tasks:

  1. Create the configuration file for the needed hybrid cluster.
  2. Run the preflight checks.
  3. Deploy the cluster.
gcloud compute ssh root@$VM_WS --zone "${ZONE}" <<EOF
set -x
export PROJECT_ID=$(gcloud config get-value project)
CLUSTER_NAME=\$(curl http://metadata.google.internal/computeMetadata/v1/instance/attributes/cluster_id -H "Metadata-Flavor: Google")
BMCTL_VERSION=\$(curl http://metadata.google.internal/computeMetadata/v1/instance/attributes/bmctl_version -H "Metadata-Flavor: Google")
export CLUSTER_NAME
export BMCTL_VERSION
bmctl create config -c \$CLUSTER_NAME
cat > bmctl-workspace/\$CLUSTER_NAME/\$CLUSTER_NAME.yaml << EOB
---
gcrKeyPath: /root/bm-gcr.json
sshPrivateKeyPath: /root/.ssh/id_rsa
gkeConnectAgentServiceAccountKeyPath: /root/bm-gcr.json
gkeConnectRegisterServiceAccountKeyPath: /root/bm-gcr.json
cloudOperationsServiceAccountKeyPath: /root/bm-gcr.json
---
apiVersion: v1
kind: Namespace
metadata:
  name: cluster-\$CLUSTER_NAME
---
apiVersion: baremetal.cluster.gke.io/v1
kind: Cluster
metadata:
  name: \$CLUSTER_NAME
  namespace: cluster-\$CLUSTER_NAME
spec:
  type: hybrid
  anthosBareMetalVersion: \$BMCTL_VERSION
  gkeConnect:
    projectID: \$PROJECT_ID
  controlPlane:
    nodePoolSpec:
      clusterName: \$CLUSTER_NAME
      nodes:
      - address: 10.200.0.3
      - address: 10.200.0.4
      - address: 10.200.0.5
  clusterNetwork:
    pods:
      cidrBlocks:
      - 192.168.0.0/16
    services:
      cidrBlocks:
      - 172.26.232.0/24
  loadBalancer:
    mode: bundled
    ports:
      controlPlaneLBPort: 443
    vips:
      controlPlaneVIP: 10.200.0.49
      ingressVIP: 10.200.0.50
    addressPools:
    - name: pool1
      addresses:
      - 10.200.0.50-10.200.0.70
  clusterOperations:
    # might need to be this location
    location: us-central1
    projectID: \$PROJECT_ID
  storage:
    lvpNodeMounts:
      path: /mnt/localpv-disk
      storageClassName: node-disk
    lvpShare:
      numPVUnderSharedPath: 5
      path: /mnt/localpv-share
      storageClassName: local-shared
  nodeConfig:
    podDensity:
      maxPodsPerNode: 250
---
apiVersion: baremetal.cluster.gke.io/v1
kind: NodePool
metadata:
  name: node-pool-1
  namespace: cluster-\$CLUSTER_NAME
spec:
  clusterName: \$CLUSTER_NAME
  nodes:
  - address: 10.200.0.6
  - address: 10.200.0.7
EOB

bmctl create cluster -c \$CLUSTER_NAME
EOF

Verify your cluster

You can find your cluster's kubeconfig file on the admin machine in the bmctl-workspace directory. To verify your deployment, complete the following steps.

  1. SSH into the admin workstation:

    gcloud compute ssh root@$VM_WS --zone ${ZONE}
    
  2. Set the KUBECONFIG environment variable with the path to the cluster's configuration file to run kubectl commands on the cluster.

    export clusterid=CLUSTER_NAME
    export KUBECONFIG=$HOME/bmctl-workspace/$clusterid/$clusterid-kubeconfig
    kubectl get nodes
    

Log in to your cluster from Google Cloud console

To observe your workloads on GKE on Bare Metal in the Google Cloud console, you must log in to your admin machine where the cluster's kubeconfig file is stored.

Go to Logging in to a cluster from Google Cloud console to learn more.

Clean up

  1. Connect to the admin machine to reset the cluster VMs to their state prior to installation and unregister the cluster from your Google Cloud Project:

    gcloud compute ssh root@$VM_WS --zone ${ZONE} << EOF
    set -x
    export clusterid=CLUSTER_NAME
    bmctl reset -c \$clusterid
    EOF
    
  2. List all VMs that have abm in their name:

    gcloud compute instances list | grep 'abm'
    
  3. Verify that you're fine with deleting all VMs that contain abm in the name.

    After you've verified, you can delete abm VMs by running the following command:

    gcloud compute instances list --format="value(name)" | grep 'abm'  | xargs gcloud \
        --quiet compute instances delete