Version 1.0. This version is no longer supported as outlined in the Anthos version support policy. For the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware (GKE on-prem), upgrade to a supported version. You can find the most recent version here.

Installing GKE On-Prem

This page explains how to install GKE On-Prem in vSphere. The instructions on this page show you how to create an admin cluster and one user cluster. After you create the admin cluster and initial user cluster, you can create additional user clusters.

Before you begin

  1. Set up your on-prem environment as described in System requirements.

  2. Complete the procedures in Getting started.

  3. Create an admin workstation in vSphere.

  4. Create a private Docker registry, if you want to use one.

  5. Learn how to enable manual load balancing, if you want to use it.

  6. Configure static IPs, if you want to use them.

  7. SSH into your admin workstation:

    ssh -i ~/.ssh/vsphere_workstation ubuntu@[IP_ADDRESS]
    
  8. Authorize gcloud to access Google Cloud:

    gcloud auth login
  9. Register gcloud as a Docker credential helper. (Read more about this command):

    gcloud auth configure-docker
  10. Set a default project. Setting a default Google Cloud causes all Cloud SDK commands to run against the project, so that you don't need to specify your project for each command:

    gcloud config set project [PROJECT_ID]
    

    Replace [PROJECT_ID] with your project ID. (You can find your project ID in Cloud Console, or by running gcloud config get-value project.)

Create service accounts' private keys in your admin workstation

In Getting started, you created four service accounts. Now, you need to create a JSON private key file for each of those service accounts. You'll provide these keys during installation.

List service accounts' email addresses

First, list the service accounts in your Google Cloud project:

gcloud iam service-accounts list

For a Google Cloud project named my-gcp-project, this command's output looks like this:

gcloud iam service-accounts list
NAME                                    EMAIL
                                        access-service-account@my-gcp-project.iam.gserviceaccount.com
                                        register-service-account@my-gcp-project.iam.gserviceaccount.com
                                        connect-service-account@my-gcp-project.iam.gserviceaccount.com
                                        stackdriver-service-account@my-gcp-project.iam.gserviceaccount.com

Take note of each accounts' email address. For each of the following sections, you provide the relevant account's email account.

Access service account

gcloud iam service-accounts keys create access-key-file \
--iam-account [ACCESS_SERVICE_ACCOUNT_EMAIL]

where [ACCESS_SERVICE_ACCOUNT_EMAIL] is the access service account's email address.

Register service account

gcloud iam service-accounts keys create register-key \
--iam-account [REGISTER_SERVICE_ACCOUNT_EMAIL]

where [REGISTER_SERVICE_ACCOUNT_EMAIL] is the register service account's email address.

Connect service account

gcloud iam service-accounts keys create connect-key \
--iam-account [CONNECT_SERVICE_ACCOUNT_EMAIL]

where [CONNECT_SERVICE_ACCOUNT_EMAIL] is the connect service account's email address.

Cloud Monitoring service account

gcloud iam service-accounts keys create stackdriver-key \
--iam-account [STACKDRIVER_SERVICE_ACCOUNT_EMAIL]

where [STACKDRIVER_SERVICE_ACCOUNT_EMAIL] is the Cloud Monitoring service account's email address.

Activating your access service account for Cloud SDK

Activating your access service account for Cloud SDK causes all gcloud and gsutil commands to run as that service account. Since your access service account is allowlisted to access the GKE On-Prem binaries, activating the account for Cloud SDK gives you permission to download GKE On-Prem's binaries from Cloud Storage.

To activate your access service account, run the following command. Be sure to provide the path to the account's key file, if it isn't in the current working directory:

gcloud auth activate-service-account --key-file=access-key.json

Generating a configuration file

To start an installation, you run gkectl create-config to generate a configuration file. You modify the file with your environment's specifications and with the cluster specifications you want.

To generate the file, run the following command, where --config [PATH] is optional and accepts a path and name for the configuration file. Omitting --config creates config.yaml in the current working directory:

gkectl create-config [--config [PATH]]

Modifying the configuration file

Now that you've generated the configuration file, you need to modify it to be suitable for your environment and to meet your expectations for your clusters. The following sections explain each field, the values it expects, and where you might find the information. Some fields are commented out by default. If any of those fields are relevant to your installation, uncomment them and provide values.

bundlepath

A GKE On-Prem bundle is a set of YAML files. Collectively, the YAML files describe all of the components in a particular release of GKE On-Prem.

When you create an admin workstation, it comes with a bundle at /var/lib/gke/bundles/gke-onprem-vsphere-[VERSION]-full.tgz.

Set the value of bundlepath to the path of your bundle file. That is, set bundlepath to:

/var/lib/gke/bundles/gke-onprem-vsphere-[VERSION]-full.tgz

where [VERSION] is the version of GKE On-Prem that you are installing.

Note that you are free to keep your bundle file in a different location or give it a different name. Just make sure that in your configuration file, the value of bundlepath is the path to your bundle file, whatever that might be.

gkeplatformversion

The gkeplatformversion field holds the Kubernetes version of the GKE On-Prem release that you are installing. It has this format:

[KUBERNETES_VERSION]-[GKE_PATCH]

An example of the Kubernetes version is 1.12.7-gke.19.

When you run gkectl create-config, this field is populated for you.

The versioning schemes for bundlepath and gkeplatformversion are different. However, a given bundle version has a corresponding GKE platform version. For example, if the bundle version is 1.0.10, the GKE platform version must be 1.12.7-gke.19.

To learn about the correspondence between a bundle version and the GKE platform version, extract the bundle file and look at the YAML files. In particular, open gke-onprem-vsphere-[VERSION]-images.yaml, and look at the osImages field. You can see the GKE platform version in the name of the OS image file. For example, in the following OS image, you can see that the GKE platform version is 1.12.7-gke.19.

osImages:
  admin: "gs://gke-on-prem-os-ubuntu-release/gke-on-prem-osimage-1.12.7-gke.19-20190516-905ef43658.ova"

vcenter

You use this field to declare global settings for your vCenter Server. GKE On-Prem needs to know the IP address, username, and password of your vCenter Server instance. Set the values under vcenter to provide this information. For example:

vcenter:
  credentials:
    address: "203.0.113.1"
    username: "my-name"
    password: "my-password"

GKE On-Prem needs some information about the structure of your vSphere environment. Set the values under vcenter to provide this information. For example:

vcenter:
  ...
  datacenter: "MY-DATACENTER"
  datastore: "MY-DATASTORE"
  cluster: "MY-VSPHERE-CLUSTER"
  network: "MY-VIRTUAL-NETWORK"
  resourcepool: "my-pool"

GKE On-Prem creates a virtual machine disk (VMDK) to hold the Kubernetes object data for the admin cluster. The installer creates the VMDK for you, but you must provide a name for the VMDK in the vcenter.datadisk field. For example:

vcenter:
  ...
  datadisk: "my-disk.vmdk"

If you want GKE On-Prem to put the VMDK in a directory, you must manually create the directory ahead of time. For example, you could use govc to create a directory:

govc datastore.mkdir my-gke-on-prem-directory

Then you could include the directory in the vcenter.datadisk field:

vcenter:
  ...
  datadisk: "my-gke-on-prem-directory/my-disk.vmdk"

When a client, like GKE On-Prem, sends a request to vCenter Server, the server must prove its identity to the client by presenting a certificate. The certificate is signed by a certificate authority (CA). The client verifies the server's certificate by using the CA's certificate.

Set vcenter.cacertpath to the path of the CA's certificate. For example:

vcenter:
  ...
  cacertpath: "/my-cert-directory/altostrat.crt"

For information about downloading the CA's certificate, see How to download and install vCenter Server root certificates.

If your vCenter server is using a self-signed certificate, you can extract the certificate by connecting to vCenter with openssl from the admin workstation:

true | openssl s_client -connect [VCENTER_IP]:443 -showcerts 2>/dev/null | sed -ne '/-BEGIN/,/-END/p' > vcenter.pem

gcrkeypath

Set the value of gcrkeypath to the path of the JSON key file for your access service account. For example:

gcrkeypath: "/my-key-directory/access-key.json"

lbmode

You can use integrated load balancing or manual load balancing. Your choice of load balancing mode applies to your admin cluster and your initial user cluster. It also applies to any additional user clusters that you create in the future.

Specify your load balancing choice by setting the value of lbmode to Integrated or Manual. For example:

lbmode: Integrated

gkeconnect

The gkeconnect field holds information that GKE On-Prem needs to set up management of your on-prem clusters from Google Cloud Console.

Set gkeconnect.projectid to the project ID of the Google Cloud project where you want to manage your on-prem clusters.

Set the value of gkeconnect.registerserviceaccountkeypath to the path of the JSON key file for your register service account. Set the value of gkeconnect.agentserviceaccountkeypath to the path of the JSON key file for your connect service account.

If you want the Connect agent to use a proxy to communicate with Google Cloud, set the value of gkeconnect.proxy to the URL of the proxy. Use the format http(s)://[PROXY_ADDRESS].

Example:

gkeconnect:
  projectid: "my-project"
  registerserviceaccountkeypath: "/my-key-directory/register-key.json"
  agentserviceaccountkeypath: "/my-key-directory/connect-key.json"
  proxy: https://203.0.113.20

stackdriver

The stackdriver field holds information that GKE On-Prem needs to store log entries generated by your on-prem clusters.

Set stackdriver.projectid to the project ID of the Google Cloud project where you want to view Stackdriver logs that pertain to your on-prem clusters.

Set stackdriver.clusterlocation to a Google Cloud region where you want to store Stackdriver logs. It is a good idea to choose a region that is near your on-prem data center.

Set stackdriver.serviceaccountkeypath to the path of the JSON key file for your Stackdriver Logging service account.

Example:

stackdriver:
  projectid: "my-project"
  clusterlocation: "us-west1"
  proxyconfigsecretname: ""
  enablevpc: false
  serviceaccountkeypath: "/my-key-directory/logging-key.json

privateregistryconfig

If you have a private Docker registry, the privateregistryconfig field holds information that GKE On-Prem uses to push images to your private registry. If you don't specify a private registry, gkectl pulls GKE On-Prem's container images from its Container Registry repository, gcr.io/gke-on-prem-release, during installation.

Under privatedockerregistry.credentials, set address to the IP address of the machine that runs your private Docker registry. Set username and password to the username and password of your private Docker registry.

When Docker pulls an image from your private registry, the registry must prove its identity by presenting a certificate. The registry's certificate is signed by a certificate authority (CA). Docker uses the CA's certificate to validate the registry's certificate.

Set privateregistryconfig.cacertpath to the path of the CA's certificate.

Example:

privateregistryconfig
  ...
  cacertpath: /my-cert-directory/registry-ca.crt

admincluster

The admincluster field holds information that GKE On-Prem needs to create the admin cluster.

vCenter network - admin cluster

In admincluster.vcenter.network, you can choose a different vCenter network for your admin cluster. Note that this overwrites the global setting you provided in vcenter. For example:

admincluster:
  vcenter:
    network: MY-ADMIN-CLUSTER-NETWORK

DHCP or static IP addresses - admin cluster

Decide whether you want to use Dynamic Host Configuration Protocol (DHCP) to assign IP addresses to your admin cluster nodes. The alternative is to use static IP addresses for your cluster nodes. Note that if you have chosen to use manual load balancing mode, you must use static IP addresses for your cluster nodes.

If you choose to use DHCP, leave the admincluster.ipblockfilepath field commented out.

If you choose to use static IP addresses, you must have a host configuration file as described in Configuring static IPs. Provide the path to your host configuration file in the admincluster.ipblockfilepath field. For example:

admincluster:
  ipblockfilepath: "/my-config-directory/my-admin-hostconfig.yaml"

Integrated load balancing - admin cluster

If you are using integrated load balancing mode, GKE On-Prem needs to know the IP address, username, and password of your BIG-IP load balancer. Set the values under admincluster.bigip to provide this information. For example:

admincluster:
  ...
  bigip:
    credentials:
      address: "203.0.113.2"
      username: "my-admin-f5-name"
      password: "rJDlm^%7aOzw"

If you are using integrated load balancing mode, you must create a BIG-IP partition for your admin cluster. Set admincluster.bigip.partition to the name of your partition. For example:

admincluster:
  ...
  bigip:
    partition: "my-admin-f5-partition"

Manual load balancing - admin cluster

If you are using manual load balancing mode, you must use static IP addresses for your cluster nodes. Verify that you have set a value for admincluster.ipblockfilepath. For example:

admincluster:
  ipblockfilepath: "/my-config-directory/my-admin-hostconfig.yaml"

The ingress controller in the admin cluster is implemented as a Service of type NodePort. The Service has one ServicePort for HTTP and another ServicePort for HTTPS. If you are using manual load balancing mode, you must choose nodePort values for these ServicePorts. Specify the nodePort values in ingresshttpnodeport and ingresshttpsnodeport. For example:

admincluster:
  ...
  manuallbspec:
    ingresshttpnodeport: 32527
    ingresshttpsnodeport: 30139

The Kubernetes API server in the admin cluster is implemented as a Service of type NodePort. If you are using manual load balancing, you must choose a nodePort value for the Service. Specify the nodePort value in controlplanenodeport For example:

admincluster:
  ...
  manuallbspec:
    ...
    controlplanenodeport: 30968

The addons server in the admin cluster is implemented as a Service of type NodePort. If you are using manual load balancing, you must choose a nodePort value for the Service. Specify the nodePort value in controlplanenodeport For example:

admincluster:
  manuallbspec:
    ...
    addonsnodeport: 30562

vips - admin cluster

Regardless of whether you are using integrated or manual load balancing for the admin cluster, you need to fill in the admincluster.vips field.

Set the value of admincluster.vips.controlplanevip to the IP address that you have chosen to configure on the load balancer for the Kubernetes API server of the admin cluster. Set the value of ingressvip to the IP address you have chosen to configure on the load balancer for the admin cluster's ingress controller. For example:

admincluster:
  ...
  vips:
    controlplanevip: 203.0.113.3
    ingressvip: 203.0.113.4

serviceiprange and podiprange - admin cluster

The admin cluster must have a range of IP addresses to use for Services and a range of IP addresses to use for Pods. These ranges are specified by the admincluster.serviceiprange and admincluster.podiprange fields. These fields are populated when you run gkectl create-config. If you like, you can change the populated values to values of your choice. For information about choosing Service and Pod IP ranges, see Optimizing IP address allocation.

The Service and Pod ranges must not overlap. Also, the Service and Pod ranges you choose for the admin cluster must not overlap with the Service and Pod ranges you choose for the user cluster.

Example:

admincluster:
  ...
  serviceiprange: 10.96.232.0/24
  podiprange: 192.168.0.0/16

usercluster

The usercluster field holds information that GKE On-Prem needs to create the initial user cluster.

vCenter network - admin cluster

In admincluster.vcenter.network, you can choose a different vCenter network for your user clusters. Note that this overwrites the global setting you provided in vcenter. For example:

usercluster:
  vcenter:
    network: MY-USER-CLUSTER-NETWORK

DHCP or static IP addresses - user cluster

Decide whether you want to use DHCP to assign IP addresses to your user cluster nodes. The alternative is to use static IP addresses for your cluster nodes. Note that if you have chosen the manual load balancing mode, you must use static IP addresses for your cluster nodes.

If you choose to use DHCP, leave the usercluster.ipblockfilepath field commented out.

If you choose to use static IP addresses, you must have a host configuration file as described in Configuring static IPs. Provide the path to your host configuration file in the usercluster.ipblockfilepath field. For example:

usercluster:
  ipblockfilepath: "/my-config-directory/my-user-hostconfig.yaml"

Integrated load balancing - user cluster

If you are using integrated load balancing mode, GKE On-Prem needs to know the IP address, username, and password of the BIG-IP load balancer that you intend to use for the user cluster. Set the values under usercluster.bigip to provide this information. For example:

usercluster:
  ...
  bigip:
    credentials:
      address: "203.0.113.5"
      username: "my-user-f5-name"
      password: "8%jfQATKO$#z"
  ...

If you are using integrated load balancing mode, you must create a BIG-IP partition for your user cluster. Set usercluster.bigip.partition to the name of your partition. For example:

usercluster:
  ...
  bigip:
    partition: "my-user-f5-partition"
  ...

Manual load balancing - user cluster

If you are using manual load balancing mode, you must use static IP addresses for your cluster nodes. Verify that you have set a value for usercluster.ipblockfilepath. For example:

usercluster:
  ipblockfilepath: "/my-config-directory/my-user-hostconfig.yaml"
  ...

The ingress controller in the user cluster is implemented as a Service of type NodePort. The Service has one ServicePort for HTTP and another ServicePort for HTTPS. If you are using manual load balancing mode, you must choose nodePort values for these ServicePorts. Specify the nodePort values in ingresshttpnodeport and ingresshttpsnodeport. For example:

usercluster:
  manuallbspec:
    ingresshttpnodeport: 30243
    ingresshttpsnodeport: 30879

The Kubernetes API server in the user cluster is implemented as a Service of type NodePort. If you are using manual load balancing, you must choose a nodePort value for the Service. Specify the nodePort value in controlplanenodeport. For example:

usercluster:
  ...
  manuallbspec:
    ...
    controlplanenodeport: 30562

vips - user cluster

Regardless of whether you are using integrated or manual load balancing for the user cluster, you need to fill in the usercluster.vips field.

Set the value of usercluster.vips.controlplanevip to the IP address that you have chosen to configure on the load balancer for the Kubernetes API server of the user cluster. Set the value of ingressvip to the IP address you have chosen to configure on the load balancer for the user cluster's ingress controller. For example:

usercluster:
  ...
  vips:
    controlplanevip: 203.0.113.6
    ingressvip: 203.0.113.7

serviceiprange and podiprange - user cluster

The user cluster must have a range of IP addresses to use for Services and a range of IP addresses to use for Pods. These ranges are specified by the usercluster.serviceiprange and usercluster.podiprange fields. These fields are populated when you run gkectl create-config. If you like, you can change the populated values to values of your choice. For information about choosing Service and Pod IP ranges, see Optimizing IP address allocation.

The Service and Pod ranges must not overlap. Also, the Service and Pod ranges you choose for the user cluster must not overlap with the Service and Pod ranges you choose for the admin cluster.

Example:

usercluster:
  ...
  serviceiprange: 10.96.233.0/24
  podiprange: 172.16.0.0/12

clustername

Set the value of usercluster.clustername to a name of your choice. For example:

usercluster:
  ...
  clustername: "my-user-cluster-1"

masternode

The usercluster.masternode.replicas field specifies how many control plane nodes you want the user cluster to have. The control plane nodes for the user cluster run the control plane components for the user cluster. This value must be 1 or 3.

  • Set this field to 1 to run one user control plane.
  • Set this field to 3 if you want to have a highly available user control plane. Three control user control planes will be created.

The usercluster.masternode.cpus and usercluster.masternode.memorymb fields specify how many CPUs and how much memory, in megabytes, is allocated to each control plane node of the user cluster. For example:

usercluster:
  ...
  masternode:
    cpus: 4
    memorymb: 8192

oidc

If you intend for clients of the user cluster to use OIDC authentication, set values for the fields under usercluster.oidc. Configuring OIDC is optional.

In version 1.0.2-gke.3, the following required fields have been added. These fields enable logging in to a cluster from Cloud Console:

  • usercluster.oidc.kubectlredirecturl
  • usercluster.oidc.clientsecret
  • usercluster.oidc.usehttpproxy

If you don't want to log in to a cluster from Cloud Console, but you want to use OIDC, you can pass in placeholder values for these fields:

oidc:
  kubectlredirecturl: "redirect.invalid"
  clientsecret: "secret"
  usehttpproxy: "false"

For more information, see Authenticating with OIDC.

sni

If you want to provide an additional serving certificate for the Kubernetes API server of the user cluster, provide values for usercluster.sni.certpath and usercluster.sni.keypath. For example:

usercluster:
  ...
  sni:
    certpath: "/my-cert-directory/my-second-cert.crt"
    keypath: "/my-cert-directory/my-second-cert.key"

workernode

The usercluster.workernode.replicas field specifies how many worker nodes you want the user cluster to have. The worker nodes run the cluster workloads.

The usercluster.masternode.cpus and usercluster.masternode.memorymb fields specify how many CPUs and how much memory, in megabytes, is allocated to each worker node of the user cluster. For example:

usercluster:
  ...
  workernode:
    cpus: 4
    memorymb: 8192
    replicas: 3

Validating the configuration file

After you've modified the configuration file, run gkectl check-config to verify that the file is valid and can be used for installation:

gkectl check-config --config [PATH_TO_CONFIG]

If the command returns any FAILURE messages, fix the issues and validate the file again.

Skipping validations

The following gkectl commands automatically run validations against your config file:

  • gkectl prepare
  • gkectl create cluster
  • gkectl upgrade

To skip a command's validations, pass in --skip-validation-all. For example, to skip all validations for gkectl prepare:

gkectl prepare --config [PATH_TO_CONFIG] --skip-validation-all

To see all available flags for skipping specific validations:

gkectl check-config --help

Running gkectl prepare

Before you install, you need to run gkectl prepare on your admin workstation to initialize your vSphere environment. The gkectl prepare performs the following tasks:

  • Import the node OS image to vSphere and mark it as a template.

  • If you are using a private Docker registry, push GKE On-Prem images to your registry.

  • Optionally, validate the container images' build attestations, thereby verifying the images were built and signed by Google and are ready for deployment.

Run gkectl prepare with the GKE On-Prem configuration file, where --validate-attestations is optional:

gkectl prepare --config [CONFIG_FILE] --validate-attestations

Positive output from --validate-attestations is Image [IMAGE_NAME] validated.

Installing GKE On-Prem

You've created a configuration file that specifies how your environment looks and how you'd like your clusters to look, and you've validated the file. You ran gkectl prepare to initialize your environment with the GKE On-Prem software. Now you're ready to initiate a fresh installation of GKE On-Prem.

To install GKE On-Prem, run the following command:

gkectl create cluster --config [CONFIG_FILE]

where [CONFIG_FILE] is the configuration file you generated and modified.

You can reuse the configuration file to create additional user clusters.

Connecting clusters to Google

  • When you create a user cluster, it is automatically registered with Google Cloud. You can view a registered GKE On-Prem cluster in Cloud Console's Kubernetes clusters menu. From there, you can sign into the cluster to view its workloads.

  • If you don't see your cluster in Cloud Console within one hour of creating it, refer to Connect troubleshooting.

Enabling ingress

After your user cluster is running, you must enable ingress by creating a Gateway object. The first part of the Gateway manifest is always this:

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: istio-autogenerated-k8s-ingress
  namespace: gke-system
spec:
  selector:
    istio: ingress-gke-system

You can tailor the rest of the manifest according to your needs. For example, this manifest says that clients can send requests on port 80 using the HTTP/2 protocol and any hostname:

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: istio-autogenerated-k8s-ingress
  namespace: gke-system
spec:
  selector:
    istio: ingress-gke-system
  servers:
  - port:
      number: 80
      protocol: HTTP2
      name: http
    hosts:
    - "*"

If you want to accept HTTPS requests, then you must provide one or more certificates that your ingress controller can present to clients.

To provide a certificate:

  1. Create a Secret that holds your certificate and key.
  2. Create a Gateway object, or modify an existing Gateway object, that refers to your Secret. The name of the Gateway object must be istio-autogenerated-k8s-ingress.

For example, suppose you have already created a certificate file, ingress-wildcard.crt, and a key file ingress-wildcard.key.

Create a Secret named ingressgateway-wildcard-certs:

kubectl create secret tls \
    --namespace gke-system \
    ingressgateway-wildcard-certs \
    --cert ./ingress-wildcard.crt \
    --key ./ingress-wildcard.key

Here's a manifest for a Gateway that refers to your Secret. Clients can call on port 443 using the HTTPS protocol and any hostname that matches *.example.com. Note that the hostname in the certificate must match the hostname in the manifest, *.example.com in this example:

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: istio-autogenerated-k8s-ingress
  namespace: gke-system
spec:
  selector:
    istio: ingress-gke-system
  servers:
  - port:
      number: 80
      protocol: HTTP2
      name: http
    hosts:
    - "*"
  - hosts:
    - "*.example.com"
    port:
      name: https-demo-wildcard
      number: 443
      protocol: HTTPS
    tls:
      mode: SIMPLE
      credentialName: ingressgateway-wildcard-certs

You can create multiple TLS certs for different hosts by modifying your Gateway manifest.

Save your manifest to a file named my-gateway.yaml, and create the Gateway:

kubectl apply -f my-gateway.yaml

Now you can use Kubernetes Ingress objects in the standard way.

Troubleshooting

For more information, refer to Troubleshooting.

Diagnosing cluster issues using gkectl

Use gkectl diagnosecommands to identify cluster issues and share cluster information with Google. See Diagnosing cluster issues.

Default logging behavior

For gkectl and gkeadm it is sufficient to use the default logging settings:

  • By default, log entries are saved as follows:

    • For gkectl, the default log file is /home/ubuntu/.config/gke-on-prem/logs/gkectl-$(date).log, and the file is symlinked with the logs/gkectl-$(date).log file in the local directory where you run gkectl.
    • For gkeadm, the default log file is logs/gkeadm-$(date).log in the local directory where you run gkeadm.
  • All log entries are saved in the log file, even if they are not printed in the terminal (when --alsologtostderr is false).
  • The -v5 verbosity level (default) covers all the log entries needed by the support team.
  • The log file also contains the command executed and the failure message.

We recommend that you send the log file to the support team when you need help.

Specifying a non-default location for the log file

To specify a non-default location for the gkectl log file, use the --log_file flag. The log file that you specify will not be symlinked with the local directory.

To specify a non-default location for the gkeadm log file, use the --log_file flag.

Locating Cluster API logs in the admin cluster

If a VM fails to start after the admin control plane has started, you can try debugging this by inspecting the Cluster API controllers' logs in the admin cluster:

  1. Find the name of the Cluster API controllers Pod in the kube-system namespace, where [ADMIN_CLUSTER_KUBECONFIG] is the path to the admin cluster's kubeconfig file:

    kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] -n kube-system get pods | grep clusterapi-controllers
  2. Open the Pod's logs, where [POD_NAME] is the name of the Pod. Optionally, use grep or a similar tool to search for errors:

    kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] -n kube-system logs [POD_NAME] vsphere-controller-manager

Debugging F5 BIG-IP issues using the admin cluster control plane node's kubeconfig

After an installation, GKE On-Prem generates a kubeconfig file in the home directory of your admin workstation named internal-cluster-kubeconfig-debug. This kubeconfig file is identical to your admin cluster's kubeconfig, except that it points directly at the admin cluster's control plane node, where the admin control plane runs. You can use the internal-cluster-kubeconfig-debug file to debug F5 BIG-IP issues.

gkectl check-config validation fails: can't find F5 BIG-IP partitions

Symptoms

Validation fails because F5 BIG-IP partitions can't be found, even though they exist.

Potential causes

An issue with the F5 BIG-IP API can cause validation to fail.

Resolution

Try running gkectl check-config again.

gkectl prepare --validate-attestations fails: could not validate build attestation

Symptoms

Running gkectl prepare with the optional --validate-attestations flag returns the following error:

could not validate build attestation for gcr.io/gke-on-prem-release/.../...: VIOLATES_POLICY
Potential causes

An attestation might not exist for the affected image(s).

Resolution

Try downloading and deploying the admin workstation OVA again, as instructed in Creating an admin workstation. If the issue persists, reach out to Google for assistance.

Debugging using the bootstrap cluster's logs

During installation, GKE On-Prem creates a temporary bootstrap cluster. After a successful installation, GKE On-Prem deletes the bootstrap cluster, leaving you with your admin cluster and user cluster. Generally, you should have no reason to interact with this cluster.

If something goes wrong during an installation, and you did pass --cleanup-external-cluster=false to gkectl create cluster, you might find it useful to debug using the bootstrap cluster's logs. You can find the Pod, and then get its logs:

kubectl --kubeconfig /home/ubuntu/.kube/kind-config-gkectl get pods -n kube-system
kubectl --kubeconfig /home/ubuntu/.kube/kind-config-gkectl -n kube-system get logs [POD_NAME]

What's next