This document describes known issues for version 1.6 of Google Distributed Cloud.
ClientConfig custom resource
gkectl update
reverts any manual changes that you have made to the ClientConfig
custom resource. We strongly recommend that you back up the ClientConfig
resource after every manual change.
kubectl describe CSINode
and gkectl diagnose snapshot
kubectl describe CSINode
and gkectl diagnose snapshot
sometimes fail due to
the
OSS Kubernetes issue
on dereferencing nil pointer fields.
OIDC and the CA certificate
The OIDC provider doesn't use the common CA by default. You must explicitly supply the CA certificate.
Upgrading the admin cluster from 1.5 to 1.6.0 breaks 1.5 user clusters that use
an OIDC provider and have no value for authentication.oidc.capath
in the
user cluster configuration file.
To work around this issue, run the following script.:
USER_CLUSTER_KUBECONFIG=YOUR_USER_CLUSTER_KUBECONFIG IDENTITY_PROVIDER=YOUR_OIDC_PROVIDER_ADDRESS openssl s_client -showcerts -verify 5 -connect $IDENTITY_PROVIDER:443 < /dev/null | awk '/BEGIN CERTIFICATE/,/END CERTIFICATE/{ if(/BEGIN CERTIFICATE/){i++}; out="tmpcert"i".pem"; print >out}' ROOT_CA_ISSUED_CERT=$(ls tmpcert*.pem | tail -1) ROOT_CA_CERT="/etc/ssl/certs/$(openssl x509 -in $ROOT_CA_ISSUED_CERT -noout -issuer_hash).0" cat tmpcert*.pem $ROOT_CA_CERT > certchain.pem CERT=$(echo $(base64 certchain.pem) | sed 's\ \\g') rm tmpcert1.pem tmpcert2.pem kubectl --kubeconfig $USER_CLUSTER_KUBECONFIG patch clientconfig default -n kube-public --type json -p "[{ \"op\": \"replace\", \"path\": \"/spec/authentication/0/oidc/certificateAuthorityData\", \"value\":\"${CERT}\"}]"
Replace the following:
YOUR_OIDC_IDENTITY_PROVICER: The address of your OIDC provider:
YOUR_YOUR_USER_CLUSTER_KUBECONFIG: The path of your user cluster kubeconfig file.
gkectl check-config validation fails: can't find F5 BIG-IP partitions
- Symptoms
Validation fails because F5 BIG-IP partitions can't be found, even though they exist.
- Potential causes
An issue with the F5 BIG-IP API can cause validation to fail.
- Resolution
Try running
gkectl check-config
again.
Disruption for workloads with PodDisruptionBudgets
Upgrading clusters can cause disruption or downtime for workloads that use PodDisruptionBudgets (PDBs).
Nodes fail to complete their upgrade process
If you have PodDisruptionBudget
objects configured that are unable to
allow any additional disruptions, node upgrades might fail to upgrade to the
control plane version after repeated attempts. To prevent this failure, we
recommend that you scale up the Deployment
or HorizontalPodAutoscaler
to
allow the node to drain while still respecting the PodDisruptionBudget
configuration.
To see all PodDisruptionBudget
objects that do not allow any disruptions:
kubectl get poddisruptionbudget --all-namespaces -o jsonpath='{range .items[?(@.status.disruptionsAllowed==0)]}{.metadata.name}/{.metadata.namespace}{"\n"}{end}'
Renewal of certificates might be required before an admin cluster upgrade
Before you begin the admin cluster upgrade process, you should make sure that your admin cluster certificates are currently valid, and renew these certificates if they are not.
Admin cluster certificate renewal process
Make sure that OpenSSL is installed on the admin workstation before you begin.
Set the
KUBECONFIG
variable:KUBECONFIG=ABSOLUTE_PATH_ADMIN_CLUSTER_KUBECONFIG
Replace ABSOLUTE_PATH_ADMIN_CLUSTER_KUBECONFIG with the absolute path to the admin cluster kubeconfig file.
Get the IP address and SSH keys for the admin master node:
kubectl --kubeconfig "${KUBECONFIG}" get secrets -n kube-system sshkeys \ -o jsonpath='{.data.vsphere_tmp}' | base64 -d > \ ~/.ssh/admin-cluster.key && chmod 600 ~/.ssh/admin-cluster.key export MASTER_NODE_IP=$(kubectl --kubeconfig "${KUBECONFIG}" get nodes -o \ jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}' \ --selector='node-role.kubernetes.io/master')
Check if the certificates are expired:
ssh -i ~/.ssh/admin-cluster.key ubuntu@"${MASTER_NODE_IP}" \ "sudo kubeadm alpha certs check-expiration"
If the certificates are expired, you must renew them before upgrading the admin cluster.
Because the admin cluster kubeconfig file also expires if the admin certificates expire, you should back up this file before expiration.
Back up the admin cluster kubeconfig file:
ssh -i ~/.ssh/admin-cluster.key ubuntu@"${MASTER_NODE_IP}"
"sudo cat /etc/kubernetes/admin.conf" > new_admin.conf vi "${KUBECONFIG}"Replace
client-certificate-data
andclient-key-data
in kubeconfig withclient-certificate-data
andclient-key-data
in thenew_admin.conf
file that you created.
Back up old certificates:
This is an optional, but recommended, step.
# ssh into admin master if you didn't in the previous step ssh -i ~/.ssh/admin-cluster.key ubuntu@"${MASTER_NODE_IP}" # on admin master sudo tar -czvf backup.tar.gz /etc/kubernetes logout # on worker node sudo scp -i ~/.ssh/admin-cluster.key \ ubuntu@"${MASTER_NODE_IP}":/home/ubuntu/backup.tar.gz .
Renew the certificates with kubeadm:
# ssh into admin master ssh -i ~/.ssh/admin-cluster.key ubuntu@"${MASTER_NODE_IP}" # on admin master sudo kubeadm alpha certs renew all
Restart static Pods running on the admin master node:
# on admin master cd /etc/kubernetes sudo mkdir tempdir sudo mv manifests/*.yaml tempdir/ sleep 5 echo "remove pods" # ensure kubelet detect those change remove those pods # wait until the result of this command is empty sudo docker ps | grep kube-apiserver # ensure kubelet start those pods again echo "start pods again" sudo mv tempdir/*.yaml manifests/ sleep 30 # ensure kubelet start those pods again # should show some results sudo docker ps | grep -e kube-apiserver -e kube-controller-manager -e kube-scheduler -e etcd # clean up sudo rm -rf tempdir logout
Renew the certificates of admin cluster worker nodes
Check node certificates expiration date
kubectl get nodes -o wide # find the oldest node, fill NODE_IP with the internal ip of that node ssh -i ~/.ssh/admin-cluster.key ubuntu@"${NODE_IP}" openssl x509 -enddate -noout -in /var/lib/kubelet/pki/kubelet-client-current.pem logout
If the certificate is about to expire, renew node certificates by manual node repair.
You must validate the renewed certificates, and validate the certificate of kube-apiserver.
Check certificates expiration:
ssh -i ~/.ssh/admin-cluster.key ubuntu@"${MASTER_NODE_IP}"
"sudo kubeadm alpha certs check-expiration"Check certificate of kube-apiserver:
# Get the IP address of kube-apiserver cat $KUBECONFIG | grep server # Get the current kube-apiserver certificate openssl s_client -showcerts -connect
:
| sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p'
> current-kube-apiserver.crt # check expiration date of this cert openssl x509 -in current-kube-apiserver.crt -noout -enddate # check nodes are ready kubectl --kubeconfig $KUBECONFIG get nodes
Using Google Distributed Cloud with Anthos Service Mesh version 1.7 or later
If you use Google Distributed Cloud with Anthos Service Mesh version 1.7 or later, and you want to upgrade to Google Distributed Cloud version 1.6.0-1.6.3 or Google Distributed Cloud version 1.7.0-1.7.2, you must remove the bundle.gke.io/component-name
and bundle.gke.io/component-version
labels from the following Custom Resource Definitions (CRDs):
destinationrules.networking.istio.io
envoyfilters.networking.istio.io
serviceentries.networking.istio.io
virtualservices.networking.istio.io
Run this command to update the CRD
destinationrules.networking.istio.io
in your user cluster:kubectl edit crd destinationrules.networking.istio.io --kubeconfig USER_CLUSTER_KUBECONFIG
Remove the
bundle.gke.io/component-version
andbundle.gke.io/component-name
labels from the CRD.
Alternatively, you can upgrade to 1.6.4 or 1.7.3 directly.
Upgrading the admin workstation might fail if the data disk is nearly full
If you upgrade the admin workstation with the gkectl upgrade admin-workstation
command, the upgrade might fail if the data disk is nearly full, because the system attempts to back up the current admin workstation locally while upgrading to a new admin workstation. If you cannot clear sufficient space on the data disk, use the gkectl upgrade admin-workstation
command with the additional flag --backup-to-local=false
to prevent making a local backup of the current admin workstation.
Restarting or upgrading vCenter for versions lower than 7.0U2
If the vCenter, for versions lower than 7.0U2, is restarted, after an upgrade or otherwise,
the network name in vm information from vCenter is incorrect, and results in the machine being in an Unavailable
state. This eventually leads to the nodes being auto-repaired to create new ones.
Related govmomi bug: https://github.com/vmware/govmomi/issues/2552
This workaround is provided by VMware support:
1. The issue is fixed in vCenter versions 7.0U2 and above. 2. For lower versions: Right-click the host, and then select Connection > Disconnect. Next, reconnect, which forces an update of the VM's portgroup.