Learn how to migrate Knative serving on VMware to use fleets so that you can upgrade to Anthos Version 1.8.
Knative serving is now a separate experience from the managed Cloud Run product and is now provided as a fleet component in your clusters. Installing Knative serving on VMware features as a component of your fleet enables you to manage and upgrade your installation independently of other fleet components.
At a high-level, to migrate your Knative serving on VMware installation to use a fleet, you must:
- Configure your Knative serving on VMware installation to meet the fleet requirements.
- Enable the Knative serving feature component in your fleet.
Note that the Kubernetes API server is not impacted during this migration.
For details about how to perform a new installation of Knative serving on VMware, see Installing Knative serving on VMware.
Before you begin
You must meet the following requirements:
These steps require that your Knative serving on VMware cluster is registered to a fleet and is visible in the Google Cloud console:
Your installation of Knative serving on VMware is on a cluster running Anthos Version 1.7 or earlier.
Istio is no longer supported in Anthos 1.8. Cloud Service Mesh version 1.18 must be installed in your fleet and your installation of Knative serving must be configured before you upgrade that cluster to Version 1.8.
See the Cloud Service Mesh instructions for details about installing on Google Distributed Cloud.
Note that Cloud Service Mesh requires that your cluster uses a machine type that has at least four vCPUs, such as
e2-standard-4
. If you need to change your cluster's machine type, see Migrating workloads to different machine types.There are two options for migrating Knative serving to Cloud Service Mesh, you can either:
Obtain a new external IP address to which you configure the load balancer.
Reuse your existing load balancer IP address.
Ensure that your command-line environment is configured and up-to-date.
Migrate to fleets
In order to upgrade Anthos to Version 1.8, you must first perform the following steps to ensure that your existing Knative serving on VMware installation is migrated to using the fleet component.
Access your admin cluster
Obtain the path and file name of your admin cluster's kubeconfig file and then
create the ADMIN_KUBECONFIG
environment variable:
export ADMIN_KUBECONFIG=[ADMIN_CLUSTER_KUBECONFIG]
Replace [ADMIN_CLUSTER_KUBECONFIG] with the path and file name to the kubeconfig file of your admin cluster.
Configure each user cluster
Create the following local environment variables for the user cluster:
Create the
USER_KUBECONFIG
environment variable with the path of your user cluster's kubeconfig file:export USER_KUBECONFIG=[USER_CLUSTER_KUBECONFIG]
Replace [USER_CLUSTER_KUBECONFIG] with the path and file name to the kubeconfig file of your user cluster.
Create environment variables for the following configurations:
- ID of your Google Cloud project.
- Location of your Google Cloud resources.
- Name of the user cluster.
export PROJECT_ID=$(kubectl get configmaps --namespace knative-serving config-observability --output jsonpath="{.data['metrics\.stackdriver-project-id']}") export CLUSTER_LOCATION=$(kubectl get configmaps --namespace knative-serving config-observability --output jsonpath="{.data['metrics\.stackdriver-gcp-location']}") export CLUSTER_NAME=$(kubectl get configmaps --namespace knative-serving config-observability --output jsonpath="{.data['metrics\.stackdriver-cluster-name']}")
Remove the
cloudrun
configuration from theOnPremUserCluster
custom resource of your user cluster:Verify that
cloudRun
is set inOnPremUserCluster
:$ kubectl get onpremusercluster \ "${CLUSTER_NAME}" \ --namespace "${CLUSTER_NAME}-gke-onprem-mgmt" \ --kubeconfig="${ADMIN_KUBECONFIG}" \ --output=jsonpath="{.spec.cloudRun}"
Result:
{"enabled":true}
Remove
cloudRun
fromOnPremUserCluster
:kubectl patch onpremusercluster \ "${CLUSTER_NAME}" \ --namespace "${CLUSTER_NAME}-gke-onprem-mgmt" \ --kubeconfig="${ADMIN_KUBECONFIG}" \ --type="merge" \ --patch '{"spec": {"cloudRun": null}}'
Validate that
cloudRun
was successfully removed fromOnPremUserCluster
by running the sameget
command and verifying that no configuration is returned:kubectl get onpremusercluster \ "${CLUSTER_NAME}" \ --namespace "${CLUSTER_NAME}-gke-onprem-mgmt" \ --kubeconfig="${ADMIN_KUBECONFIG}" \ --output=jsonpath="{.spec.cloudRun}"
There should be no output to your terminal.
Update the create-config secret of your user cluster:
Create a local YAML copy of the create-config file:
kubectl get secret create-config \ --kubeconfig="${ADMIN_KUBECONFIG}" \ --namespace "${CLUSTER_NAME}" \ --output=jsonpath={.data.cfg} \ | base64 -d > "${CLUSTER_NAME}_create_secret.yaml"
Open the
${CLUSTER_NAME}_create_secret.yaml
file that you just created in an editor and then remove thecloudrun
field from underspec
.Base64 encode the
${CLUSTER_NAME}_cluster_create_secret.yaml
file into a.b64
file:cat "${CLUSTER_NAME}_create_secret.yaml" | base64 -w0 > "${CLUSTER_NAME}_create_secret.b64"
In your editor, open the local
.b64
file that you just created and then copy the string from under thedata.cfg
attribute for use in the next step.You must ensure that you copy only the contents from the
cfg
attribute. For example, do not include any newlines (\n
).Run the following command to edit the secret on your user cluster:
kubectl edit secret create-config --kubeconfig="${ADMIN_KUBECONFIG}" \ --namespace "${CLUSTER_NAME}"
In the editor that opens, replace the
data[cfg]
field with the the string that you copied from the local.b64
file and then save your changes.Verify that your changes are deployed to your user cluster and that the
cloudrun
attribute was successfully removed from thecreate-config
secrets:kubectl get secret create-config \ --kubeconfig="${ADMIN_KUBECONFIG}" \ --namespace ${CLUSTER_NAME} \ --output=jsonpath={.data.cfg} \ | base64 -d
Configure the
knative-serving
namespace in your user cluster:Delete the
cloudrun-operator
operator from theknative-serving
namespace:kubectl delete deployments.apps --kubeconfig=${USER_KUBECONFIG} --namespace knative-serving cloudrun-operator
Patch the
config-network
configmap in theknative-serving
namespace:kubectl patch configmap --kubeconfig=${USER_KUBECONFIG} --namespace knative-serving config-network --patch '{"metadata": {"annotations":{"knative.dev/example-checksum": null}}}'
Remove the
cloudrun.enabled
configuration from the user cluster's configuration fileuser-config.yaml
of your Google Distributed Cloud installation.The following attributes must be deleted from within your
user-config.yaml
file:cloudRun: enabled: true
When you perform the cluster upgrade to Anthos Version 1.8, this configuration change gets deployed.
If you have multiple user clusters, you must repeat all the steps in this "Configure each user cluster" section for each user cluster.
Configure your fleet component
Enable the Knative serving component in your fleet:
gcloud container fleet cloudrun enable --project=$PROJECT_ID
For details and additional options, see the gcloud container fleet cloudrun enable reference.
Optional: Verify that the Knative serving feature component is enabled:
Console
View if the Knative serving component is Enabled in the Google Cloud console:
Command line
View if the
appdevexperience
state isENABLED
:gcloud container fleet features list --project=$PROJECT_ID
For details and additional options, see the gcloud container fleet features list reference.
Result:
NAME STATE appdevexperience ENABLED
Deploy the
CloudRun
custom resource to install Knative serving on VMware on each of your user clusters. By default, thelatest
version of Knative serving is deployed.Run the following
kubectl apply
command to deploy the default configuration of theCloudRun
custom resource:cat <<EOF | kubectl apply -f - apiVersion: operator.run.cloud.google.com/v1alpha1 kind: CloudRun metadata: name: cloud-run spec: metricscollector: stackdriver: projectid: $PROJECT_ID gcpzone: $CLUSTER_LOCATION clustername: $CLUSTER_NAME secretname: "stackdriver-service-account-key" secretkey: "key.json" EOF
Configure Cloud Service Mesh
Configure the Cloud Service Mesh load balancer for each of your user clusters.
You can configure the ingress gateway of Cloud Service Mesh by either configuring a new external IP address or reusing your existing IP address:
With the new external IP address that you obtained, you configure the load balancer by following the steps in the Cloud Service Mesh documentation.
Note that this option ensures that your Knative serving services are restarted without interruption.
Alternative: Use the following steps to configure the Cloud Service Mesh load balancer to your existing IP address.
Configure the gateway of your services to Cloud Service Mesh by running the following commands:
export CURRENT_INGRESS_IP=$(kubectl get service --namespace gke-system istio-ingress --output jsonpath='{.spec.loadBalancerIP}') kubectl patch service --namespace istio-system istio-ingressgateway --patch "{\"spec\":{\"loadBalancerIP\": \"$CURRENT_INGRESS_IP\"}}" kubectl patch service --namespace gke-system istio-ingress --patch "{\"spec\":{\"loadBalancerIP\": null}}"
Remove the current Istio configuration settings:
kubectl patch configmap --namespace knative-serving config-istio --patch '{"data":{"local-gateway.cluster-local-gateway": null}}' kubectl patch configmap --namespace knative-serving config-istio --patch '{"data":{"gateway.gke-system-gateway": null}}'
Verify migration
You can check to see if the appdevexperience-operator
is up and running to verify that your Knative serving on VMware has been
successfully migrated to your fleet.
For each user cluster, run the following command:
kubectl get deployment -n appdevexperience appdevexperience-operator
The appdevexperience-operator
operator
should show 1/1
as ready, for example:
NAME READY UP-TO-DATE AVAILABLE AGE
appdevexperience-operator 1/1 1 1 1h
If the operator fails to achieve the ready state, you can view your cluster's workloads page in the Google Cloud console to identify resource issues:
Go to Google Kubernetes Engine workloads
Upgrade your cluster
Now that you have migrated your Knative serving on VMware installation to use the fleet component, you can upgrade your cluster to Anthos Version 1.8. Follow the detailed instructions in Upgrading GKE On-Prem.
Troubleshooting
- Upgrade process of your user cluster fails to complete
The
cluster-local-gateway
pod in thegke-system
namespace might prevent your user cluster from completing the upgrade to Anthos Version 1.8. Thecluster-local-gateway
pod is no longer needed and can be safely removed.To manually assist the upgrade process, you can manually remove the
cluster-local-gateway
pod by scaling down your deployment replicas to0
. For example:Scale down the
cluster-local-gateway
:kubectl scale deployment cluster-local-gateway --replicas 0 --namespace gke-system
The
cluster-local-gateway
pod in thegke-system
namespace and all workloads in theknative-serving
namespace are removed.Wait for the upgrade process to complete.