Provision managed Cloud Service Mesh with asmcli
Overview
Managed Cloud Service Mesh with asmcli
is a managed control plane and
a managed data plane that you simply configure. Google handles their reliability,
upgrades, scaling and security for you in a backward-compatible manner. This
guide explains how to set up or migrate applications to managed Cloud Service Mesh
in a single or multi-cluster configuration with asmcli
.
To learn about the supported features and limitations of managed Cloud Service Mesh, see Managed Cloud Service Mesh supported features.
Prerequisites
As a starting point, this guide assumes that you have:
- A Cloud project
- A Cloud Billing account
- Obtained the required permissions to provision Cloud Service Mesh
- The
asmcli
installation tool,kpt
, and other tools specified in Install the required tools
For a faster provisioning, your clusters must have Workload Identity enabled. If Workload Identity isn't enabled, the provision will automatically enable it.
Requirements
- One or more clusters with a supported version of GKE, in one of the supported regions.
Note that managed Cloud Service Mesh uses GKE release channels to balance between stability and upgrade speed. New changes to Cloud Service Mesh in-cluster components (including CNI, MDPC, proxies, and Istio CRDs) will rollout to clusters that subscribe GKE rapid channel first. They will then be promoted to GKE regular channel, and finally the GKE stable channel, once they demonstrate enough stability.
- Managed Cloud Service Mesh doesn't support changing GKE release channels safely.
- If you do change the GKE release channel, Cloud Service Mesh automatically upgrades/downgrades the in-cluster components (CNI, MDPC, default injected proxy version and Istio CRDs) to align with the current GKE release channel.
Ensure that your cluster has enough capacity for the required components that managed Cloud Service Mesh installs in the cluster.
- The
mdp-controller
deployment inkube-system
namespace requests cpu: 50m, memory: 128Mi. - The
istio-cni-node
daemonset inkube-system
namespace requests cpu: 100m, memory: 100Mi on each node.
- The
Ensure that the organizational policy
constraints/compute.disableInternetNetworkEndpointGroup
is disabled. If the policy is enabled, ServiceEntry may not work.Ensure that the client machine that you provision managed Cloud Service Mesh from has network connectivity to the API server.
Your clusters must be registered to a fleet. This step can be done separately prior to the provision or as part of the provision by passing the
--enable-registration
and--fleet-id
flags.Your project must have the Service Mesh fleet feature enabled. You could enable it as part of the provision by passing
--enable-gcp-components
, or by running the following command:gcloud container fleet mesh enable --project=FLEET_PROJECT_ID
where FLEET_PROJECT_ID is the project-id of the fleet host project.
GKE Autopilot is only supported with GKE version 1.21.3+.
Cloud Service Mesh can use multiple GKE clusters in a single-project single-network environment or a multi-project single-network environment.
- If you join clusters that are not in the same project, they must be registered to the same fleet host project, and the clusters must be in a shared VPC configuration together on the same network.
- For a single-project multi-cluster environment, the fleet project can be the same as the cluster project. For more information about fleets, see Fleets Overview.
- For a multi-project environment, we recommend that you host the fleet in a separate project from the cluster projects. If your organizational policies and existing configuration allow it, we recommend that you use the shared VPC project as the fleet host project. For more information, see Setting up clusters with Shared VPC.
- If your organization uses VPC Service Controls and you are provisioning
Cloud Service Mesh on GKE clusters with a release
greater or equal to 1.22.1-gke.10, then you may need to take additional
configuration steps:
- If you are provisioning Cloud Service Mesh on the
regular or stable
release channel, then
you must use the additional
--use-vpcsc
flag when applying the managed control plane and follow the VPC Service Controls (preview) guide. Otherwise, the provision will fail security controls. - If you are provisioning Cloud Service Mesh on the rapid
release channel, then
you do not need to use the additional
--use-vpcsc
flag when applying the managed control plane, but you do need to follow the VPC Service Controls (GA) guide.
- If you are provisioning Cloud Service Mesh on the
regular or stable
release channel, then
you must use the additional
Roles required to install Cloud Service Mesh
The following table describes the roles that are required to install managed Cloud Service Mesh.
Role name | Role ID | Grant location | Description |
---|---|---|---|
GKE Hub Admin | roles/gkehub.admin | Fleet project | Full access to GKE Hubs and related resources. |
Service Usage Admin | roles/serviceusage.serviceUsageAdmin | Fleet project | Ability to enable, disable, and inspect service states, inspect operations, and consume quota and billing for a consumer project. (Note 1) |
CA Service Admin Beta | roles/privateca.admin | Fleet project | Full access to all CA Service resources. (Note 2) |
Limitations
We recommend that you review the list of Cloud Service Mesh supported features and limitations. In particular, note the following:
The
IstioOperator
API isn't supported since its main purpose is to control in-cluster components.For GKE Autopilot clusters, cross-project setup is only supported with GKE 1.23 or later.
For GKE Autopilot clusters, in order to adapt to the GKE Autopilot resource limit, the default proxy resource requests and limits are set to 500m CPU and 512 Mb memory. You can override the default values using custom injection.
During the provisioning process for a managed control plane, Istio CRDs are provisioned in the specified cluster. If there are existing Istio CRDs in the cluster, they will be overwritten.
Istio CNI and Cloud Service Mesh are not compatible with GKE Sandbox. Therefore, managed Cloud Service Mesh with the
TRAFFIC_DIRECTOR
implementation does not support clusters with GKE Sandbox enabled.The
asmcli
tool must have access to the Google Kubernetes Engine (GKE) endpoint. You can configure access through a "jump" server, such as a Compute Engine VM within the Virtual Private Cloud (VPC) giving specific access.
Before you begin
Configure gcloud
Complete the following steps even if you are using Cloud Shell.
Authenticate with the Google Cloud CLI:
gcloud auth login --project PROJECT_ID
where PROJECT_ID is the unique identifier of your cluster project. Run the following command to get your PROJECT_ID:
gcloud projects list --filter="<PROJECT ID>" --format="value(PROJECT_NUMBER)" ```
Update the components:
gcloud components update
Configure
kubectl
to point to the cluster.gcloud container clusters get-credentials CLUSTER_NAME \ --zone CLUSTER_LOCATION \ --project PROJECT_ID
Download the installation tool
Download the latest version of the tool to the current working directory:
curl https://storage.googleapis.com/csm-artifacts/asm/asmcli > asmcli
Make the tool executable:
chmod +x asmcli
Configure each cluster
Use the following steps to configure managed Cloud Service Mesh for each cluster in your mesh.
Apply the managed control plane
Before you apply the managed control plane, you must select a release channel. Your Cloud Service Mesh channel is determined by your GKE cluster channel at time of provisioning managed Cloud Service Mesh. Note that multiple channels in the same cluster at the same time is not supported.
Run the installation tool for each cluster that will use managed Cloud Service Mesh. We recommend that you include both of the following options:
--enable-registration --fleet_id FLEET_PROJECT_ID
These two flags register the cluster to a fleet, where the FLEET_ID is the project-id of the fleet host project. If using a single-project, the FLEET_PROJECT_ID is the same as PROJECT_ID, the fleet host project and the cluster project are the same. In more complex configurations like multi-project, we recommend using a separate fleet host project.--enable-all
. This flag enables both required components and registration.
The asmcli
tool configures the managed control plane directly using tools and
logic inside of the CLI tool. Use the set of instructions below depending on
your preferred CA.
Certificate Authorities
Select a Certificate Authority to use for your mesh.
Mesh CA
Run the following command to install the control plane with default features and Mesh CA. Enter your values in the provided placeholders.
./asmcli install \
-p PROJECT_ID \
-l LOCATION \
-n CLUSTER_NAME \
--fleet_id FLEET_PROJECT_ID \
--managed \
--verbose \
--output_dir DIR_PATH \
--enable-all
CA Service
- Follow the steps in Configure Certificate Authority Service.
- Run the following command to install the control plane with default features and Certificate Authority Service. Enter your values in the provided placeholders.
./asmcli install \
-p PROJECT_ID \
-l LOCATION \
-n CLUSTER_NAME \
--fleet_id FLEET_PROJECT_ID \
--managed \
--verbose \
--output_dir DIR_PATH \
--enable-all \
--ca gcp_cas \
--ca_pool pool_name
The tool downloads all the files for configuring the managed control plane
to the specified --output_dir
, installing the istioctl
tool and sample
applications. The steps in this guide assume that you run istioctl
from the
--output_dir
location you specified when running asmcli install
, with
istioctl
present in its <Istio release dir>/bin
subdirectory.
If you rerun asmcli
on the same cluster, it overwrites the
existing control plane configuration. Be sure to specify the same options and
flags if you want the same configuration.
Verify the control plane has been provisioned
After a few minutes, verify that the control plane status is ACTIVE
:
gcloud container fleet mesh describe --project FLEET_PROJECT_ID
The output is similar to:
membershipStates: projects/746296320118/locations/us-central1/memberships/demo-cluster-1: servicemesh: controlPlaneManagement: details: - code: REVISION_READY details: 'Ready: asm-managed' state: ACTIVE ... state: code: OK description: 'Revision(s) ready for use: asm-managed.'
If the status does not reach ACTIVE
` in a few minutes, then refer to
Check the managed control plane status
for more information about possible errors.
Zero-touch upgrades
Once the managed control plane is installed, Google will automatically upgrade it when new releases or patches become available.
Managed data plane
If you use managed Cloud Service Mesh, Google fully manages upgrades of your proxies.
With the managed data plane feature enabled, the sidecar proxies and injected gateways are actively and automatically updated in conjunction with the managed control plane by restarting workloads to re-inject new versions of the proxy. This starts after the control plane has been upgraded and normally completes within 2 weeks after starting.
Note that the managed data plane relies on the GKE release channel. If you change the GKE release channel while the managed data plane is enabled, managed Cloud Service Mesh will update the proxies of all existing workloads like an managed data plane rollout.
If disabled, proxy management is done passively - driven by the natural lifecycle of the pods in the cluster and must be manually triggered by the user to control the update rate.
The managed data plane upgrades proxies by evicting pods that are running earlier versions of the proxy. The evictions are done gradually, honoring the pod disruption budget and controlling the rate of change.
The managed data plane doesn't manage the following:
- Uninjected pods
- Manually injected pods
- Jobs
- StatefulSets
- DaemonSets
If you have provisioned managed Cloud Service Mesh on an older cluster, you can enable data plane management for the entire cluster:
kubectl annotate --overwrite controlplanerevision -n istio-system \
REVISION_LABEL \
mesh.cloud.google.com/proxy='{"managed":"true"}'
Alternatively, you can enable the managed data plane selectively for a specific control plane revision, namespace, or pod by annotating it with the same annotation. If you control individual components selectively, then the order of precedence is control plane revision, then namespace, then pod.
It could take up to ten minutes for the service to be ready to manage the proxies in the cluster. Run the following command to check the status:
gcloud container fleet mesh describe --project FLEET_PROJECT_ID
Expected output
membershipStates:
projects/PROJECT_NUMBER/locations/global/memberships/CLUSTER_NAME:
servicemesh:
dataPlaneManagement:
details:
- code: OK
details: Service is running.
state: ACTIVE
state:
code: OK
description: 'Revision(s) ready for use: asm-managed-rapid.'
If the service does not become ready within ten minutes, see Managed data plane status for next steps.
Disable the managed data plane (optional)
If you are provisioning managed Cloud Service Mesh on a new cluster, then you can disable the managed data plane completely, or for individual namespaces or pods. The managed data plane will continue to be disabled for existing clusters where it was disabled by default or manually.
To disable the managed data plane at the cluster level and revert back to managing the sidecar proxies yourself, change the annotation:
kubectl annotate --overwrite controlplanerevision -n istio-system \
mesh.cloud.google.com/proxy='{"managed":"false"}'
To disable the managed data plane for a namespace:
kubectl annotate --overwrite namespace NAMESPACE \
mesh.cloud.google.com/proxy='{"managed":"false"}'
To disable the managed data plane for a pod:
kubectl annotate --overwrite pod POD_NAME \
mesh.cloud.google.com/proxy='{"managed":"false"}'
Enable maintenance notifications
You can request to be notified about upcoming managed data plane maintenance up to a week before maintenance is scheduled. Maintenance notifications are not sent by default. You must also Configure a GKE maintenance window before you can receive notifications. When enabled, notifications are sent at least two days before the upgrade operation.
To opt in to managed data plane maintenance notifications:
Go to the Communication page.
In the Cloud Service Mesh Upgrade row, under the Email column, select the radio button to turn maintenance notifications ON.
Each user who wants to receive notifications must opt in separately. If you want to set an email filter for these notifications, the subject line is:
Upcoming upgrade for your Cloud Service Mesh cluster "CLUSTER_LOCATION/CLUSTER_NAME"
.
The following example shows a typical managed data plane maintenance notification:
Subject Line: Upcoming upgrade for your Cloud Service Mesh cluster "
<location/cluster-name>
"Dear Cloud Service Mesh user,
The Cloud Service Mesh components in your cluster ${instance_id} (https://console.cloud.google.com/kubernetes/clusters/details/${instance_id}/details?project=${project_id}) are scheduled to upgrade on ${scheduled_date_human_readable} at ${scheduled_time_human_readable}.
You can check the release notes (https://cloud.google.com/service-mesh/docs/release-notes) to learn about the new update.
In the event that this maintenance gets canceled, you'll receive another email.
Sincerely,
The Cloud Service Mesh Team
(c) 2023 Google LLC 1600 Amphitheater Parkway, Mountain View, CA 94043 You have received this announcement to update you about important changes to Google Cloud Platform or your account. You can opt out of maintenance window notifications by editing your user preferences: https://console.cloud.google.com/user-preferences/communication?project=${project_id}
Configure endpoint discovery (only for multi-cluster installations)
If your mesh has only one cluster, skip these multi-cluster steps and proceed to Deploy applications or Migrate applications.
Before you continue, ensure that Cloud Service Mesh is configured on each cluster.
Public clusters
Configure endpoint discovery between public clusters
If you are operating on public clusters (non-private clusters), you can either Configure endpoint discovery between public clusters or more simply Enable endpoint discovery between public clusters.
Private clusters
Configure endpoint discovery between private clusters
When using GKE private clusters, you must configure the cluster control plane endpoint to be the public endpoint instead of the private endpoint. Please refer to Configure endpoint discovery between private clusters.
For an example application with two clusters, see HelloWorld service example.
Deploy applications
Enable the namespace for injection. The steps depend on your control plane implementation.
Managed (TD)
- Apply the default injection label to the namespace:
kubectl label namespace NAMESPACE \
istio.io/rev- istio-injection=enabled --overwrite
Managed (Istiod)
Recommended: Run the following command to apply the default injection label to the namespace:
kubectl label namespace NAMESPACE \
istio.io/rev- istio-injection=enabled --overwrite
If you are an existing user with the Managed Istiod control plane: We recommend that you use default injection, but revision-based injection is supported. Use the following instructions:
Run the following command to locate the available release channels:
kubectl -n istio-system get controlplanerevision
The output is similar to the following:
NAME AGE asm-managed-rapid 6d7h
NOTE: If two control plane revisions appear in the list above, remove one. Having multiple control plane channels in the cluster is not supported.
In the output, the value under the
NAME
column is the revision label that corresponds to the available release channel for the Cloud Service Mesh version.Apply the revision label to the namespace:
kubectl label namespace NAMESPACE \ istio-injection- istio.io/rev=REVISION_LABEL --overwrite
At this point, you have successfully configured managed Cloud Service Mesh. If you have any existing workloads in labeled namespaces, then restart them so they get proxies injected.
If you deploy an application in a multi-cluster setup, replicate the Kubernetes and control plane configuration in all clusters, unless you plan to limit that particular config to a subset of clusters. The configuration applied to a particular cluster is the source of truth for that cluster.
Customize injection (optional)
You can override default values and customize injection settings but this can lead to unforeseen configuration errors and resulting issues with sidecar containers. Before you customize injection, read the information after the sample for notes on particular settings and recommendations.
Per-pod configuration is available to override these options on individual pods.
This is done by adding an istio-proxy
container to your pod. The sidecar
injection will treat any configuration defined here as an override to the
default injection template.
For example, the following configuration customizes a variety of settings,
including lowering the CPU requests, adding a volume mount, and adding a
preStop
hook:
apiVersion: v1
kind: Pod
metadata:
name: example
spec:
containers:
- name: hello
image: alpine
- name: istio-proxy
image: auto
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "200m"
memory: "256Mi"
volumeMounts:
- mountPath: /etc/certs
name: certs
lifecycle:
preStop:
exec:
command: ["sleep", "10"]
volumes:
- name: certs
secret:
secretName: istio-certs
In general, any field in a pod can be set. However, care must be taken for certain fields:
- Kubernetes requires the
image
field to be set before the injection has run. While you can set a specific image to override the default one, we recommend that you set theimage
toauto
, which will cause the sidecar injector to automatically select the image to use. - Some fields in
containers
are dependent on related settings. For example, must be less than or equal to the CPU limit. If both fields are not properly configured, the pod may fail to start. - Kubernetes lets you set both
requests
andlimits
for resources in your Podspec
. GKE Autopilot only considersrequests
. For more information, see Setting resource limits in Autopilot.
Additionally, certain fields are configurable by annotations on the Pod, although it is recommended to use the above approach to customizing settings. Take additional care for the following annotations:
- For GKE Standard, if
sidecar.istio.io/proxyCPU
is set, make sure to explicitly setsidecar.istio.io/proxyCPULimit
. Otherwise the sidecar's CPU limit will be set as unlimited. - For GKE Standard, if
sidecar.istio.io/proxyMemory
is set, make sure to explicitly setsidecar.istio.io/proxyMemoryLimit
. Otherwise the sidecar's memory limit will be set as unlimited. - For GKE Autopilot, configuring resource
requests
andlimits
using annotations might overprovision resources. Use the image template approach to avoid. See Resource modification examples in Autopilot.
For example, see the below resources annotation:
spec:
template:
metadata:
annotations:
sidecar.istio.io/proxyCPU: "200m"
sidecar.istio.io/proxyCPULimit: "200m"
sidecar.istio.io/proxyMemory: "256Mi"
sidecar.istio.io/proxyMemoryLimit: "256Mi"
Verify control plane metrics
You can view the version of the control plane and data plane in Metrics Explorer.
To verify that your configuration works as expected:
In the Google Cloud console, view the control plane metrics:
Choose your workspace and add a custom query using the following parameters:
- Resource type: Kubernetes Container
- Metric: Proxy Clients
- Filter:
container_name="cr-REVISION_LABEL"
- Group By:
revision
label andproxy_version
label - Aggregator: sum
- Period: 1 minute
When you run Cloud Service Mesh with both a Google-managed and an in-cluster control plane, you can tell the metrics apart by their container name. For example, managed metrics have
container_name="cr-asm-managed"
, while unmanaged metrics havecontainer_name="discovery"
. To display metrics from both, remove the Filter oncontainer_name="cr-asm-managed"
.Verify the control plane version and proxy version by inspecting the following fields in Metrics Explorer:
- The revision field indicates the control plane version.
- The proxy_version field indicates the
proxy_version
. - The value field indicates the number of connected proxies.
For the current channel to Cloud Service Mesh version mapping, see Cloud Service Mesh versions per channel.
Migrate applications to managed Cloud Service Mesh
Prepare for migration
To prepare to migrate applications from in-cluster Cloud Service Mesh to managed Cloud Service Mesh, perform the following steps:
Run the tool as indicated in the Apply the Google-managed control plane section.
(Optional) If you want to use the Google-managed data plane, enable data plane management:
kubectl annotate --overwrite controlplanerevision REVISION_TAG \ mesh.cloud.google.com/proxy='{"managed":"true"}'
Migrate applications
To migrate applications from in-cluster Cloud Service Mesh to managed Cloud Service Mesh, perform the following steps:
- Replace the current namespace label. The steps depend on your control plane implementation.
Managed (TD)
- Apply the default injection label to the namespace:
kubectl label namespace NAMESPACE \
istio.io/rev- istio-injection=enabled --overwrite
Managed (Istiod)
Recommended: Run the following command to apply the default injection label to the namespace:
kubectl label namespace NAMESPACE \
istio.io/rev- istio-injection=enabled --overwrite
If you are an existing user with the Managed Istiod control plane: We recommend that you use default injection, but revision-based injection is supported. Use the following instructions:
Run the following command to locate the available release channels:
kubectl -n istio-system get controlplanerevision
The output is similar to the following:
NAME AGE asm-managed-rapid 6d7h
NOTE: If two control plane revisions appear in the list above, remove one. Having multiple control plane channels in the cluster is not supported.
In the output, the value under the
NAME
column is the revision label that corresponds to the available release channel for the Cloud Service Mesh version.Apply the revision label to the namespace:
kubectl label namespace NAMESPACE \ istio-injection- istio.io/rev=REVISION_LABEL --overwrite
Perform a rolling upgrade of deployments in the namespace:
kubectl rollout restart deployment -n NAMESPACE
Test your application to verify that the workloads function correctly.
If you have workloads in other namespaces, repeat the previous steps for each namespace.
If you deployed the application in a multi-cluster setup, replicate the Kubernetes and Istio configuration in all clusters, unless there is a desire to limit that configuration to a subset of clusters only. The configuration applied to a particular cluster is the source of truth for that cluster.
If you are satisfied that your application works as expected, you can remove the
in-cluster istiod
after you switch all namespaces to the managed control
plane, or keep them as a backup - istiod
will automatically scale down to use
fewer resources. To remove, skip to
Delete old control plane.
If you encounter problems, you can identify and resolve them by using the information in Resolving managed control plane issues and if necessary, roll back to the previous version.
Delete old control plane
After you install and confirm that all namespaces use the Google-managed control plane, you can delete the old control plane.
kubectl delete Service,Deployment,HorizontalPodAutoscaler,PodDisruptionBudget istiod -n istio-system --ignore-not-found=true
If you used istioctl kube-inject
instead of automatic injection, or if
you installed additional gateways, check the metrics for the control plane,
and verify that the number of connected endpoints is zero.
Roll back
Perform the following steps if you need to roll back to the previous control plane version:
Update workloads to be injected with the previous version of the control plane. In the following command, the revision value
asm-191-1
is used only as an example. Replace the example value with the revision label of your previous control plane.kubectl label namespace NAMESPACE istio-injection- istio.io/rev=asm-191-1 --overwrite
Restart the Pods to trigger re-injection so the proxies have the previous version:
kubectl rollout restart deployment -n NAMESPACE
The managed control plane will automatically scale to zero and not use any resource when not in use. The mutating webhooks and provisioning will remain and do not affect cluster behavior.
The gateway is now set to the asm-managed
revision. To roll back, re-run
the Cloud Service Mesh install command, which will re-deploy gateway pointing back
to your in-cluster control plane:
kubectl -n istio-system rollout undo deploy istio-ingressgateway
Expect this output on success:
deployment.apps/istio-ingressgateway rolled back
Uninstall
Managed control plane auto-scales to zero when no namespaces are using it. For detailed steps, see Uninstall Cloud Service Mesh.
Troubleshooting
To identify and resolve problems when using managed control plane, see Resolving managed control plane issues.
What's next?
- Learn about release channels.
- Migrate from
IstioOperator
. - Migrate a gateway to managed control plane.
- Learn how to Enable optional managed Cloud Service Mesh features, such as: