Provision Cloud Service Mesh on a GKE Autopilot cluster

This guide describes how to set up managed Cloud Service Mesh on a Google Kubernetes Engine (GKE) Autopilot cluster. Cloud Service Mesh is a fully managed service mesh based on Istio.

This tutorial shows you how to configure a production-ready service mesh running on a single GKE Autopilot cluster with default settings. We recommend that you also consult the full Cloud Service Mesh provisioning guide when you design your environment.

Advantages of running managed Cloud Service Mesh with GKE Autopilot

When you use GKE in Autopilot mode, Google handles setting up and managing your cluster automatically. Autopilot mode streamlines the experience of operating a cluster and lets you focus on your applications. In the same way, managed Cloud Service Mesh is a fully managed service mesh you can provision by following a few steps.

You provision managed Cloud Service Mesh using the Fleet API - without the need for client-side tools like istioctl.
Cloud Service Mesh automatically injects sidecar proxies into workloads without the need for granting elevated privileges to your containers.
You can view rich dashboards for your mesh and services without any extra configuration and then use these metrics to configure service level objectives (SLOs) and alerts to monitor the health of your applications
The managed Cloud Service Mesh control plane is upgraded automatically to ensure that you get the latest security patches and features
The Cloud Service Mesh managed data plane automatically upgrades the sidecar proxies in your workloads so that you don't need to restart services yourself when proxy upgrades and security patches are available
Cloud Service Mesh is a supported product and can be configured using standard open source Istio APIs. See supported features.

Set up your environment

You can set up your environment using the gcloud CLI or Terraform.

gcloud

Set environment variables:

PROJECT_ID=PROJECT_ID
gcloud config set project ${PROJECT_ID}

Enable the Mesh API:

gcloud services enable mesh.googleapis.com

Enabling mesh.googleapis.com enables the following APIs:

API	Purpose	Can Be Disabled
`meshconfig.googleapis.com`	Cloud Service Mesh uses the Mesh Configuration API to relay configuration data from your mesh to Google Cloud. Additionally, enabling the Mesh Configuration API allows you to access the Cloud Service Mesh pages in the Google Cloud console and to use the Cloud Service Mesh certificate authority.	No
`meshca.googleapis.com`	Related to Cloud Service Mesh certificate authority used by managed Cloud Service Mesh.	No
`container.googleapis.com`	Required to create Google Kubernetes Engine (GKE) clusters.	No
`gkehub.googleapis.com`	Required to manage the mesh as a fleet.	No
`monitoring.googleapis.com`	Required to capture telemetry for mesh workloads.	No
`stackdriver.googleapis.com`	Required to use the Services UI.	No
`opsconfigmonitoring.googleapis.com`	Required to use the Services UI for off-Google Cloud clusters.	No
`connectgateway.googleapis.com`	Required so that the managed Cloud Service Mesh control plane can access mesh workloads.	Yes*
`trafficdirector.googleapis.com`	Enables a highly available and scalable managed control plane.	Yes*
`networkservices.googleapis.com`	Enables a highly available and scalable managed control plane.	Yes*
`networksecurity.googleapis.com`	Enables a highly available and scalable managed control plane.	Yes*

Warning: Disabling connectgateway.googleapis.com, trafficdirector.googleapis.com, networkservices.googleapis.com, and/or networksecurity.googleapis.com causes managed Cloud Service Mesh control plane to stop working. If the fleet does not use managed Cloud Service Mesh on any cluster, then these APIs can be disabled.

Terraform

gcloud config set project PROJECT_ID
GOOGLE_CLOUD_PROJECT=$(gcloud config get-value project)
export GOOGLE_CLOUD_PROJECT

Create a GKE cluster

Create a GKE cluster in Autopilot mode.

gcloud

Create a cluster, registered as a member of a Fleet:

gcloud container clusters create-auto asm-cluster \
    --location="us-central1" \
    --enable-fleet

Verify the cluster is registered with the Fleet:
```
gcloud container fleet memberships list
```
The output is similar to the following:
```
NAME: asm-cluster
EXTERNAL_ID:
LOCATION: us-central1
```
Make note of the membership name, as you need it to configure Cloud Service Mesh.

Terraform

To create a GKE cluster, you can use the google_container_cluster resource. You set the fleet block so that the cluster is added to a fleet when it is created.

resource "google_container_cluster" "cluster" {
  name                = "asm-cluster"
  location            = var.region
  deletion_protection = false # Warning: Do not set deletion_protection to false for production clusters

  enable_autopilot = true
  fleet {
    project = data.google_project.project.name
  }
}

data "google_project" "project" {}

To learn how to apply or remove a Terraform configuration, see Basic Terraform commands.

Provision managed Cloud Service Mesh

You provision managed Cloud Service Mesh using the servicemesh feature on the fleet membership for your cluster.

gcloud

Enable the Cloud Service Mesh fleet feature on the project:
```
gcloud container fleet mesh enable
```
Enable automatic management of the mesh:
```
gcloud container fleet mesh update \
    --management=automatic \
    --memberships=MEMBERSHIP_NAME \
    --location=us-central1
```
Replace MEMBERSHIP_NAME with the membership name listed when you verified that your cluster is registered to the fleet.

Terraform

To enable the mesh API, you can use the google_project_service resource.

You use the google_gke_hub_feature, and google_gke_hub_feature_membership resources to configure managed Cloud Service Mesh on your cluster.

resource "google_project_service" "mesh_api" {
  service = "mesh.googleapis.com"

  disable_dependent_services = true
}

resource "google_gke_hub_feature" "feature" {
  name     = "servicemesh"
  location = "global"

  depends_on = [
    google_project_service.mesh_api
  ]
}

resource "google_gke_hub_feature_membership" "feature_member" {
  location   = "global"
  feature    = google_gke_hub_feature.feature.name
  membership = google_container_cluster.cluster.fleet.0.membership
  membership_location = google_container_cluster.cluster.location
  mesh {
    management = "MANAGEMENT_AUTOMATIC"
  }
}

To learn how to apply or remove a Terraform configuration, see Basic Terraform commands.

Verify the control plane is active

Wait until the controlPlaneManagement.state is ACTIVE. This might take up to 15 minutes.

watch -n 30 gcloud container fleet mesh describe

The output is similar to:

membershipSpecs:
  projects/746296320118/locations/us-central1/memberships/asm-cluster:
    mesh:
      management: MANAGEMENT_AUTOMATIC
membershipStates:
  projects/746296320118/locations/us-central1/memberships/asm-cluster:
    servicemesh:
      controlPlaneManagement:
        details:
        - code: REVISION_READY
          details: 'Ready: asm-managed'
        state: ACTIVE
      dataPlaneManagement:
        details:
        - code: PROVISIONING
          details: Service is provisioning.
        state: PROVISIONING
    state:
      code: OK
      description: 'Revision(s) ready for use: asm-managed.'

The dataPlaneManagement section remains in the PROVISIONING state until you deploy the ingress gateway, because Autopilot clusters don't provision any nodes until you deploy a workload.

Deploy a mesh ingress gateway

In this section, you deploy a mesh ingress gateway to handle incoming traffic for the sample application. An ingress gateway is a load balancer operating at the edge of the mesh, receiving incoming or outgoing HTTP/TCP connections.

You deploy the gateway to a dedicated namespace and label the deployment to ensure that your gateway can be securely managed and automatically upgraded by the Cloud Service Mesh control plane.

Download credentials so that you can access the cluster:

gcloud container clusters get-credentials asm-cluster --location=us-central1

Create a namespace for the gateway deployment:
```
kubectl create namespace bank-gateways
```
Add a label to the namespace so that the Cloud Service Mesh control plane automatically injects the gateway configuration into the deployment.
```
kubectl label namespace bank-gateways istio-injection=enabled
```

Deploy the ingress gateway to the namespace:

Helm

helm repo add istio https://istio-release.storage.googleapis.com/charts
helm repo update
helm install --wait --namespace bank-gateways \
    --set resources.requests.cpu=250m \
    --set resources.requests.memory=512Mi \
    --set resources.requests.ephemeral-storage=1Gi \
    --set resources.limits.cpu=250m \
    --set resources.limits.memory=512Mi \
    --set resources.limits.ephemeral-storage=1Gi \
    istio-ingressgateway istio/gateway

kubectl

git clone https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages
kubectl apply -n bank-gateways \
    -f ./anthos-service-mesh-packages/samples/gateways/istio-ingressgateway
kubectl -n bank-gateways wait "deployment/istio-ingressgateway"  \
    --for=condition=available --timeout=240s

Ensure that you set adequate resource requests when you deploy to a production environment. GKE Autopilot only considers resource values set in requests and not limits. The Istio project publishes information on performance and scalability.

Deploy the sample application

Create a Kubernetes namespace for the deployment:
```
kubectl create namespace bank-sample
```
Add a label to the namespace so that Cloud Service Mesh automatically injects sidecar proxies into the sample Pods:
```
kubectl label namespace bank-sample istio-injection=enabled
```

Deploy the sample application:

git clone https://github.com/GoogleCloudPlatform/bank-of-anthos.git
kubectl apply -n bank-sample -f bank-of-anthos/extras/jwt/jwt-secret.yaml
kubectl apply -n bank-sample -f bank-of-anthos/kubernetes-manifests/

Wait for the application to be ready. It will take several minutes.

watch kubectl -n bank-sample get pods

When the application is ready, the output is similar to the following:

NAME                                 READY   STATUS    RESTARTS   AGE
accounts-db-0                        2/2     Running   0          2m16s
balancereader-5c695f78f5-x4wlz       2/2     Running   0          3m8s
contacts-557fc79c5-5d7fg             2/2     Running   0          3m7s
frontend-7dd589c5d7-b4cgq            2/2     Running   0          3m7s
ledger-db-0                          2/2     Running   0          3m6s
ledgerwriter-6497f5cf9b-25c6x        2/2     Running   0          3m5s
loadgenerator-57f6896fd6-lx5df       2/2     Running   0          3m5s
transactionhistory-6c498965f-tl2sk   2/2     Running   0          3m4s
userservice-95f44b65b-mlk2p          2/2     Running   0          3m4s

Create Istio Gateway and VirtualService resources to expose the application behind the ingress gateway:
```
kubectl apply -n bank-sample -f bank-of-anthos/extras/istio/frontend-ingress.yaml
```

Get a link to the sample application:

INGRESS_HOST=$(kubectl -n bank-gateways get service istio-ingressgateway \
    -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo "http://$INGRESS_HOST"

In a browser, follow the link to open the sample application. Login with the default username and password to view the application.

Enforce mutual TLS

Make sure that STRICT mutual TLS (mTLS) mode is enabled. Apply a default PeerAuthentication policy for the mesh in the istio-system namespace.

Save the following manifest as mesh-peer-authn.yaml:

apiVersion: "security.istio.io/v1beta1"
kind: "PeerAuthentication"
metadata:
  name: "default"
  namespace: "istio-system"
spec:
  mtls:
    mode: STRICT

Apply the manifest to the cluster:
```
kubectl apply -f mesh-peer-authn.yaml
```

You can override this configuration by creating PeerAuthentication resources in specific namespaces.

Explore the Cloud Service Mesh dashboards

In Google Cloud console, go to Cloud Service Mesh to view the dashboards for your mesh:

Go to Cloud Service Mesh
Select the project from the drop-down list on the menu bar.

You see an overview table with all of the microservices in your mesh and a graphical visualization of the connections between the microservices. For each microservice, the table shows three of the SRE "golden signals":
- Traffic - requests per second
- Error rate - a percentage
- Latency - milliseconds
These metrics are based on the actual traffic being handled by the microservices. Constant test traffic is automatically sent to the frontend service by a loadgenerator client deployed as part of the sample application. Cloud Service Mesh automatically sends metrics, logs, and (optionally) traces to Google Cloud Observability.
Click the frontend service in the table to see an overview dashboard for the service. You see additional metrics for the service and a visualization of inbound and outbound connections. You can also create a Service Level Object (SLO) for monitoring and alerting on the service.

Verify that mTLS is enabled

Click the security link in the panel to see a security overview for the frontend service. The table and the visualization show a green lock icon for each of the inbound and outbound connections between microservices. This icon indicates that the connection is using mTLS for authentication and encryption.