Installation and migration on GKE

This guide explains how to install or migrate to Anthos Service Mesh version 1.6.14 for a mesh containing one or more GKE clusters that are in the same project. You use a Google-provided script, which configures your project and cluster, and then installs Anthos Service Mesh.

You can use this guide for the following use cases:

  • New installations of Anthos Service Mesh. If you have a previous version of Anthos Service Mesh installed, refer to Upgrading Anthos Service Mesh on GKE. The 1.6 script doesn't handle upgrades.

  • Migrating from open source Istio 1.6 to Anthos Service Mesh. Migrating from an earlier version of Istio isn't supported. The 1.7 version of the script supports migrating from Istio 1.6 or 1.7 to Anthos Service Mesh 1.7. Since you're migrating, you might prefer to migrate to Anthos Service Mesh 1.7.

  • Migrating from the 1.6 version of the Istio on GKE add-on to Anthos Service Mesh. Before you can migrate to Anthos Service Mesh, you need to Upgrade to Istio 1.6 with Operator. For complete migration steps from the add-on, see Migrating to Anthos Service Mesh in the Istio on GKE documentation.

You need to use the Advanced installation and migration on GKE guide for the following use cases:

  • When you need to customize the installation to override settings in the asm-gcp profile, and you have more than one overlay IstioOperator YAML file. The script lets you specify only one YAMl file.

  • For a multi-cluster mesh where the clusters are in different projects.

Before you begin

This guide assumes that you already have:

If you migrating from Istio, be sure to review Preparing to migrate from Istio.

Anthos and Anthos Service Mesh differences

  • GKE Enterprise subscribers, be sure to enable the GKE Enterprise API.

    Enable the API

  • If you aren't an GKE Enterprise subscriber, you can still install Anthos Service Mesh, but certain UI elements and features in Google Cloud console are only available to GKE Enterprise subscribers. For information about what is available to subscribers and non-subscribers, see GKE Enterprise and Anthos Service Mesh UI differences. For information about Anthos Service Mesh pricing for non-subscribers, see Pricing.

Requirements

  • Your GKE cluster must meet the following requirements:

    • A machine type that has at least four vCPUs, such as e2-standard-4. If the machine type for your cluster doesn't have at least four vCPUs, change the machine type as described in Migrating workloads to different machine types.

    • The minimum number of nodes depends on your machine type. Anthos Service Mesh requires at least eight vCPUs. If the machine type has four vCPUs, your cluster must have at least two nodes. If the machine type has eight vCPUs, the cluster only needs one node. If you need to add nodes, see Resizing a cluster.

    • The script enables Workload Identity on your cluster. Workload Identity is the recommended method of calling Google APIs. Enabling Workload Identity changes the way calls from your workloads to Google APIs are secured, as described in Workload Identity limitations.

    • Optional but recommended, enroll the cluster in a release channel. We recommend that you enroll in the Regular release channel because other channels might be based on a GKE version that isn't supported with Anthos Service Mesh 1.6.14. For more information, see Supported environments. Follow the instructions in Enrolling an existing cluster in a release channel if you have a static GKE version.

  • To be included in the service mesh, service ports must be named, and the name must include the port's protocol in the following syntax: name: protocol[-suffix] where the square brackets indicate an optional suffix that must start with a dash. For more information, see Naming service ports.

  • If you are installing Anthos Service Mesh on a private cluster, you must open port 15017 in the firewall to get the webhook used with automatic sidecar injection to work properly. For more information, see Opening a port on a private cluster.

  • If you have created a service perimeter in your organization, you might need to add the Mesh CA service to the perimeter. See Adding Mesh CA to a service perimeter for more information.

  • For migrations, istiod must be installed in the istio-system namespace, which is typically the case.

Restrictions

A Google Cloud project can only have one mesh associated with it.

Choosing a certificate authority

For both new installations and migrations, you can use Anthos Service Mesh certificate authority (Mesh CA) or Citadel (now incorporated in istiod) as the certificate authority (CA) for issuing mutual TLS (mTLS) certificates.

We generally recommend that you use Mesh CA for the following reasons:

  • Mesh CA is a highly reliable and scalable service that is optimized for dynamically scaled workloads on Google Cloud.
  • With Mesh CA, Google manages the security and availability of the CA backend.
  • Mesh CA lets you rely on a single root of trust across clusters.
For new installations of Anthos Service Mesh, by default, the script enables Mesh CA.

However, there are cases where you might want to consider using Citadel, such as the following:

  • If you have a custom CA.
  • If you're migrating from Istio or the Istio on GKE add-on.

    If you choose Citadel, there's no downtime because mTLS traffic isn't interrupted during the migration. If you choose Mesh CA, you need to schedule downtime for the migration because mTLS traffic fails until you restart all Pods in all namespaces.

Certificates from Mesh CA include the following data about your application's services:

  • The Google Cloud project ID
  • The GKE namespace
  • The GKE service account name

Installing required tools

You can run the script on Cloud Shell or on your local machine running Linux or macOS. Cloud Shell pre-installs all the required tools.

To run the script locally:

  1. Make sure you have the following tools installed:

  2. Authenticate with the gcloud CLI:

    gcloud auth login
    
  3. Update the components:

    gcloud components update
    
  4. Make sure that git is in your path so that kpt can find it.

Running the script

This section describes how to download the script, set the required and optional parameters, and how to run the script. For a detailed explanation of what the script does, see Understanding the script.

  1. Download the script to the current working directory:

    curl https://storage.googleapis.com/csm-artifacts/asm/install_asm_1.6 > install_asm
    
  2. Download the SHA-256 of the file to the current working directory:

    curl https://storage.googleapis.com/csm-artifacts/asm/install_asm_1.6.sha256 > install_asm.sha256
    
  3. With both files in the same directory, verify the download:

    sha256sum -c --ignore-missing install_asm.sha256
    

    If the verification is successful, the command outputs: install_asm: OK

    For compatibility, the install_asm.sha256 file includes the checksum twice to allow any version of the script to be renamed to install_asm. If you get an error that --ignore-missing does not exist, rerun the previous command without the --ignore-missing flag.

  4. Make the script executable:

    chmod +x install_asm
    
  5. Set the options and specify the flags to run the script. You always include the following options: project_id, cluster_name, cluster_location, and mode. Depending on the mode, you might need to include the ca option.

    • The project_id, cluster_name, and cluster_location options identify the cluster on which to install Anthos Service Mesh.
    • The mode is either install or migrate.
    • The ca specifies the Certificate Authority to either mesh_ca or citadel.

    The following section provides typical examples for running the script. For a complete description of the script's arguments, see Option and flags.

  6. To complete setting up Anthos Service Mesh, you need to enable automatic sidecar injection and deploy or redeploy workloads.

Examples

This section shows examples of running the script in each mode and some additional arguments that you might find useful. See the navigation bar on the right for a list of the examples.

Only validate

The following example shows running the script with the --only_validate option. With this option, the script doesn't make any changes to your cluster, and it doesn't install Anthos Service Mesh. The script validates that:

  • Your environment has the required tools.
  • You have the required permission on the specified project.
  • The cluster meets the minimum requirements.
  • The project has all the required Google APIs enabled.

By default, the script downloads and extracts the installation file and downloads the asm configuration package from GitHub to a temp directory. Before exiting, the script outputs a message that provides the name of the temp directory. You can specify an existing directory for the downloads with the --output_dir DIR_PATH option. The --output_dir option makes it convenient for you to use the istioctl command-line tool if you need it.

  1. Create a directory called asm-packages:

    mkdir asm-packages
    
  2. Run the following command to validate your configuration and download the installation file and asm package to the asm-packages directory:

    ./install_asm \
    --project_id PROJECT_ID \
    --cluster_name CLUSTER_NAME \
    --cluster_location CLUSTER_LOCATION \
    --mode install \
    --output_dir ./asm-packages \
    --only_validate

On success, the script outputs the following:

./install_asm \
install_asm: Setting up necessary files...
install_asm: Creating temp directory...
install_asm: Generating a new kubeconfig...
install_asm: Checking installation tool dependencies...
install_asm: Downloading ASM..
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 57.0M  100 57.0M    0     0  30.6M      0  0:00:01  0:00:01 --:--:-- 30.6M
install_asm: Downloading ASM kpt package...
fetching package /asm from https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages to asm
install_asm: Checking for project PROJECT_ID...
install_asm: Confirming cluster information...
install_asm: Confirming node pool requirements...
install_asm: Fetching/writing GCP credentials to kubeconfig file...
Fetching cluster endpoint and auth data.
kubeconfig entry generated for cluster-1.
install_asm: Checking Istio installations...
install_asm: Checking required APIs...
install_asm: Successfully validated all requirements to install ASM from this computer.

If one of the tests fails the validation, the script outputs an error message. For example, if your project doesn't have all of the required Google APIs enabled, you see the following error:

ERROR: One or more APIs are not enabled. Please enable them and retry, or
re-run the script with the '--enable_apis' flag to allow the script to enable
them on your behalf.

New installation

The following command runs the script for a new installation, enables Mesh CA (the default CA for new installs, so you don't need the ca option in this case), and allows the script to enable the required Google APIs.

./install_asm \
  --project_id PROJECT_ID \
  --cluster_name CLUSTER_NAME \
  --cluster_location CLUSTER_LOCATION \
  --mode install \
  --enable_apis

New installation with an overlay file

The following example does a new installation and includes a YAML file that enables an optional feature.

./install_asm \
  --project_id PROJECT_ID \
  --cluster_name CLUSTER_NAME \
  --cluster_location CLUSTER_LOCATION \
  --mode install \
  --enable_apis \
  --operator_overlay egressgateways.yaml

Migration from Istio

If you are migrating from open source Istio, you are using Citadel as the CA. The following command runs the script for a migration to Anthos Service Mesh and enables Citadel as the CA. This migration only deploys the control plane. It doesn't change the root CA and is not interruptive to your existing workloads.

./install_asm \
  -p PROJECT_ID \
  -n CLUSTER_NAME \
  -l CLUSTER_LOCATION \
  -m migrate \
  -c citadel \
  --enable_apis

Options and flags

Options

-p|--project_id CLUSTER_PROJECT_ID
The project ID that the cluster was created in.
-n|--cluster_name CLUSTER_NAME
The name of the cluster.
-l|--cluster_location CLUSTER_LOCATION
Either the zone (for single-zone clusters) or region (for regional clusters) that the cluster was created in.
-m|--mode {install|migrate}
Enter install if you are doing a new installation of Anthos Service Mesh. Enter migrate if you are migrating from Istio or the Istio on GKE add-on to Anthos Service Mesh.
-c|--ca {mesh_ca|citadel}
If you are doing a new installation, this parameter defaults to Mesh CA, and you don't have to include it. If you are migrating from Istio, you must specify citadel or mesh_ca. If you can schedule downtime for the migration, we recommend that you use mesh_ca. If you can't schedule downtime for the migration, use citadel.
-o|--operator_overlay YAML_FILE
The name of the YAML file to enable a feature that isn't enabled in the asm-gcp profile. The script must be able to locate the YAML file. So the file either needs to be in the same directory as the script, or you can specify a relative path such as: ../manifests/asm-features.yaml
-s|--service_account ACCOUNT
The name of a service account used to install Anthos Service Mesh. If not specified, the active user account in the current gcloud configuration is used. If you need to change the active user account, run gcloud auth login.
-k|--key_file FILE PATH
The key file for a service account. Omit this option if you aren't using a service account.
-D|--output_dir DIR_PATH
If not specified, the script creates a temporary directory where it downloads files and configurations necessary for installing Anthos Service Mesh. Specify the --output-dir flag to designate an existing directory to use instead. Upon completion, the specified directory contains the asm and the istio-1.6.14-asm.2 subdirectories. The asm directory contains the configuration for the installation. The istio-1.6.14-asm.2 directory contains the extracted contents of installation file, which contains istioctl, samples, and manifests.

Flags

-e|--enable_apis
Allow the script to enable the Google APIs that Anthos Service Mesh requires. Without this flag, the script exits if the required APIs aren't already enabled. For a list of the APIs that the script enables, see set_up_project.
-v|--verbose
Print commands before and after execution.
--dry_run
Print commands, but don't execute them.
--only_validate
Run validation but don't install Anthos Service Mesh.
--disable_canonical_service
By default, the script deploys the Canonical Service controller to your cluster. If you don't want the script to deploy the controller, specify --disable_canonical_service. For more information, refer to Enabling and disabling the Canonical Service controller.
-h|--help
Show a help message describing the options and flags and exit.

Deploying and redeploying workloads

Your installation isn't complete until you enable automatic sidecar proxy injection (auto-injection).

  • For new installations, you need to enable auto-injection and restart the Pods for any workloads that were running on your cluster before you installed Anthos Service Mesh.

  • Migrations from Istio follow the dual control plane upgrade process (referred to as "canary upgrades" in the Istio documentation). With a dual control plane upgrade, the script installs a new version of istiod alongside the existing istiod. You then move some of your workloads to the new version. This process allows you to monitor the effect of the new version with a small percentage of the workloads before migrating all of your traffic to the new version.

  • Before you deploy new workloads, make sure to enable auto-injection so that Anthos Service Mesh can monitor and secure traffic.

To enable auto-injection, you get the revision label that the script applied to istiod and label your namespaces with the revision label. The following sections provide details.

Get the revision label

The script adds a revision label in the format istio.io/rev=asm-1614-2 to istiod. To enable auto-injection, you add a matching revision label to your namespace(s). The revision label is used by the sidecar injector webhook to associate injected sidecars with a particular istiod revision. After adding the label, any existing Pods in the namespace must be restarted for sidecars to be injected.

  1. Set the current context for kubectl:

    gcloud container clusters get-credentials CLUSTER_NAME \
        --project=PROJECT_ID
    
  2. Display the labels on istiod to get the revision label set by the script:

    kubectl -n istio-system get pods -l app=istiod --show-labels
    

    The output from the command is similar to the following.

    NAME                                READY   STATUS    RESTARTS   AGE   LABELS
    istiod-7744bc8dd7-qhlss             1/1     Running   0          49m   app=istiod,istio.io/rev=default,istio=pilot,pod-template-hash=7744bc8dd7
    istiod-asm-1614-2-85d86774f7-flrt2   1/1     Running   0          26m   app=istiod,istio.io/rev=asm-1614-2,istio=istiod,pod-template-hash=85d86774f7
    istiod-asm-1614-2-85d86774f7-tcwtn   1/1     Running   0          26m   app=istiod,istio.io/rev=asm-1614-2,istio=istiod,pod-template-hash=85d86774f7

    In the output, under the LABELS column, note the value of the istiod revision label, which follows the prefix istio.io/rev=. In this example, the value is asm-1614-2, but you might have a different value.

    For migrations, also note the value in the revision label for the old istiod version. You need this to delete the old version of istiod when you finish the migration. In the example output, the value in the revision label for the old version of istiod is default, but you might have a different value.

Enabling auto-injection

Follow these steps for new installations and migrations to enable auto-injection.

  1. Get the value in the revision label for istiod.

  2. Add the revision label to a namespace and remove the istio-injection label. In the following command, change REVISION to the value that matches the revision on istiod.

    kubectl label namespace NAMESPACE istio.io/rev=REVISION istio-injection- --overwrite
  3. Restart the Pods to trigger re-injection.

    kubectl rollout restart deployment -n NAMESPACE
  4. Verify that your Pods are configured to point to the new version of istiod.

    kubectl get pods -n NAMESPACE -l istio.io/rev=REVISION
  5. Test your application to verify that the workloads are working correctly.

  6. If you have workloads in other namespaces, repeat the steps to label the namespace and restart Pods.

For migration:

Complete the transition

For migrations, you need to remove the old version of istiod. If you are satisfied that your application is working as expected, remove the old control plane to complete the transition to the new version.

  1. Get the value in the revision label for the old version of istiod.

  2. Delete the old version of istiod. In the following command, replace OLD_REVISION with the revision from the previous step.

    kubectl delete Service,Deployment,HorizontalPodAutoscaler,PodDisruptionBudget istiod-OLD_REVISION  -n istio-system --ignore-not-found=true
    

Rollback to the previous version

For migrations, if you encountered an issue when testing your application with the new version of istiod, follow these steps to rollback to the previous version:

  1. Update workloads to be injected with the previous version of istiod:

    kubectl label namespace NAMESPACE istio.io/rev- istio-injection=enabled --overwrite
  2. Restart the Pods to trigger re-injection so the proxies have the previous version:

    kubectl rollout restart deployment -n NAMESPACE
  3. Redeploy the previous version of the istio-ingressgateway:

    kubectl -n istio-system rollout undo deploy istio-ingressgateway
    
  4. Remove the new istiod:

    kubectl delete Service,Deployment,HorizontalPodAutoscaler,PodDisruptionBudget istiod-REVISION -n istio-system --ignore-not-found=true
    
  5. If you didn't include the --disable_canonical_service flag, the script enabled the Canonical Service controller. Follow the steps in Enabling and disabling the Canonical Service controller to disable it.

Viewing the Anthos Service Mesh dashboards

This section is applicable only if you installed Anthos Service Mesh with the asm-gcp configuration profile. If you used the asm-gcp-multiproject profile to install Anthos Service Mesh, telemetry data won't be available on the Anthos Service Mesh dashboards in the Google Cloud console.

After you have workloads deployed on your cluster with the sidecar proxies injected, you can explore the Anthos Service Mesh pages in the Google Cloud console to see all of the observability features that Anthos Service Mesh offers. Note that it takes about one or two minutes for telemetry data to be displayed in the Google Cloud console after you deploy workloads.

Access to Anthos Service Mesh in the Google Cloud console is controlled by Identity and Access Management (IAM). To access the Anthos Service Mesh pages, a Project Owner must grant users the Project Editor or Viewer role, or the more restrictive roles described in Controlling access to Anthos Service Mesh in the Google Cloud console.

  1. In the Google Cloud console, go to Anthos Service Mesh.

    Go to Anthos Service Mesh

  2. Select the Google Cloud project from the drop-down list on the menu bar.

  3. If you have more than one service mesh, select the mesh from the Service Mesh drop-down list.

To learn more, see Exploring Anthos Service Mesh in the Google Cloud console.

In addition to the Anthos Service Mesh pages, metrics related to your services (such as the number of requests received by a particular service) are sent to Cloud Monitoring, where they appear in the Metrics Explorer.

To view metrics:

  1. In the Google Cloud console, go to the Monitoring page:

    Go to Monitoring

  2. Select Resources > Metrics Explorer.

For a full list of metrics, see Istio metrics in the Cloud Monitoring documentation.

Registering your cluster

You must register your cluster with the project's fleet to gain access to the unified user interface in the Google Cloud console. A fleet provides a unified way to view and manage the clusters and their workloads, including clusters outside Google Cloud.

See Registering clusters to the fleet for information on registering your cluster.

Understanding the script

Although you download the script from a secure Cloud Source Repositories location, the script is also available on GitHub so that you can see what it does before you download it. The script validates that your cluster meets the requirements, and it automates all the steps that you would do manually in Installing Anthos Service Mesh on GKE,

validate_args and validate_dependencies

validate_args() {
  if [[ "${MODE}" == "install" && -z "${CA}" ]]; then
    CA="mesh_ca"
  fi

  local MISSING_ARGS=0
  while read -r REQUIRED_ARG; do
    if [[ -z "${!REQUIRED_ARG}" ]]; then
      MISSING_ARGS=1
      warn "Missing value for ${REQUIRED_ARG}"
    fi
    readonly "${REQUIRED_ARG}"
  done <<EOF
validate_dependencies() {
  validate_cli_dependencies

  validate_gcp_resources
  # configure kubectl does have side effects but we've generated a temprorary
  # kubeconfig so we're not breaking the promise that --only_validate gives
  configure_kubectl
  validate_expected_control_plane
  if [[ "${MODE}" = "migrate" ]]; then
    validate_istio_version
  fi
  if [[ "${ENABLE_APIS}" -eq 0 || "${ONLY_VALIDATE}" -eq 1 ]]; then
    exit_if_apis_not_enabled
  fi
}

The validate_args and validate_dependencies functions:

  • Check that all the required tools are installed.
  • Verify that the project ID, cluster name, and cluster location that you entered as parameter values are valid.
  • Ensures that the cluster meets the minimum required machine type and number of nodes.

set_up_project

If you included the --enable_apis flag, the set_up_project function enables the required APIS:

required_apis() {
    cat << EOF
container.googleapis.com
compute.googleapis.com
monitoring.googleapis.com
logging.googleapis.com
cloudtrace.googleapis.com
meshca.googleapis.com
meshtelemetry.googleapis.com
meshconfig.googleapis.com
iamcredentials.googleapis.com
gkeconnect.googleapis.com
gkehub.googleapis.com
cloudresourcemanager.googleapis.com
EOF
}

set_up_cluster

set_up_cluster(){
  add_cluster_labels
  enable_workload_identity

  # this is project scope but requires workload identity
  if [[ "${CA}" = "mesh_ca" ]]; then
    init_meshca
  fi

  enable_stackdriver_kubernetes
  bind_user_to_cluster_admin
  ensure_istio_namespace_exists
}

The set_up_cluster function does the following updates to your cluster:

  • Enables Workload Identity, which is the recommended way to safely access Google Cloud services from GKE applications.

  • Enables Cloud Monitoring and Cloud Logging on GKE.

  • Sets the mesh_id label on the cluster, which is required for metrics to get displayed on the Anthos Service Mesh pages in the Google Cloud console.

  • Sets a label like asmv=asm-1614-2 so that you can tell that the cluster was modified by the script.

  • Binds the GCP user or service account running the script to the cluster-admin role on your cluster.

install_asm

install_asm(){

  local CA_OPT
  CA_OPT=""
  if [[ "${CA}" = "citadel" ]]; then
    CA_OPT="-citadel"
  fi

  info "Configuring kpt package..."
  run kpt cfg set asm gcloud.container.cluster "${CLUSTER_NAME}"
  run kpt cfg set asm gcloud.core.project "${PROJECT_ID}"
  run kpt cfg set asm gcloud.project.environProjectNumber "${PROJECT_NUMBER}"
  run kpt cfg set asm gcloud.compute.location "${CLUSTER_LOCATION}"
  run kpt cfg set asm anthos.servicemesh.rev "${REVISION_LABEL}"
  if [[ -n "${_CI_ASM_IMAGE_LOCATION}" ]]; then
    run kpt cfg set asm anthos.servicemesh.hub "${_CI_ASM_IMAGE_LOCATION}"
    run kpt cfg set asm anthos.servicemesh.tag "${RELEASE}"
  fi

  local PARAMS
  PARAMS="-f ${OPERATOR_MANIFEST}"
  if [[  -f "$CUSTOM_OVERLAY" ]]; then
    PARAMS="${PARAMS} -f ${CUSTOM_OVERLAY}"
  fi
  PARAMS="${PARAMS} --set revision=${REVISION_LABEL}"
  PARAMS="${PARAMS} -c ${KUBECONFIG}"

  info "Installing ASM control plane..."
  # shellcheck disable=SC2086
  retry 5 run ./"${ISTIOCTL_REL_PATH}" install $PARAMS

  # Prevent the stderr buffer from ^ messing up the terminal output below
  sleep 1
  info "...done!"

  if ! does_istiod_exist; then
    info "Installing validation webhook fix..."
    retry 3 run kubectl apply -f "${VALIDATION_FIX_SERVICE}"
  fi

  if [[ "$DISABLE_CANONICAL_SERVICE" -ne 1 ]]; then
    info "Installing ASM CanonicalService controller in asm-system namespace..."
    retry 3 run kubectl apply -f asm/canonical-service/controller.yaml
    info "Waiting for deployment..."
    retry 3 run kubectl wait --for=condition=available --timeout=600s \
        deployment/canonical-service-controller-manager -n asm-system
    info "...done!"
  fi

  outro
}

The install_asm function:

  • Downloads the kpt package to a temp directory.
  • Runs the kpt setters to configure the istio-operator.yaml file.
  • Installs Anthos Service Mesh.

Differences with the 1.7 script

1.7 script 1.6 script
Supports upgrades. Doesn't do upgrades.
Supports migrations from Istio 1.6 and Istio 1.7. Supports migration from Istio 1.6.
--print_config
Provides the configuration that you used when you installed using the install_asm script. This flag makes it easier for you to reinstall the same Anthos Service Meshversion (which the script doesn't allow) with the same configuration that you used when you installed previously.
Not available
--custom_overlay
Allows multiple overlay files.
--custom_overlay
Allows only one overlay file.
--option
Pulls overlay files from the asm package on GitHub.
Not available.
Supports custom CA with the following options:

--ca_cert
--ca_key
--root_cert
--cert_chain

Not available.