Setting up Traffic Director service security with Envoy

Use the instructions in this guide to configure authentication and authorization for services deployed with Traffic Director and Envoy proxies. For complete information about Traffic Director service security, see Traffic Director service security.

Requirements

Before you configure service security for Traffic Director with Envoy, make sure that your setup meet the following prerequisites:

Preparing for setup

The following sections describe the tasks you need to complete before you set up Traffic Director security service. These tasks are:

  • Updating the gcloud command-line tool
  • Setting up variables
  • Enabling the APIs required for Traffic Director to work with Certificate Authority Service

Updating the gcloud command-line tool

To update the gcloud command-line tool, run the following on your local machine:

gcloud components update

Setting up variables

Set the following variables so that you can copy and paste code with consistent values as you work through the example in this document. Use the following values.

  • PROJECT_ID: Substitute the ID of your project
  • CLUSTER_NAME: Substitute secure-td-cluster for the cluster name.
  • ZONE: Substitute us-east1-d for the zone in which your cluster is located.
  • GKE_CLUSTER_URL: Substitute https://container.googleapis.com/v1/projects/PROJECT_ID/locations/ZONE/clusters/CLUSTER_NAME
  • WORKLOAD_POOL: Substitute PROJECT_ID.svc.id.goog
  • K8S_NAMESPACE: Substitute default.
  • DEMO_CLIENT_KSA: Substitute alice.
  • DEMO_SERVER_KSA: Substitute bob.
  • PROJNUM: Substitute the project number of your project, which you can determine from the Google Cloud Console or with this command:

    gcloud projects describe PROJECT_ID --format="value(projectNumber)"
    
  • SA_GKE: Substitute service-PROJNUM@container-engine-robot.iam.gserviceaccount.com

  • CLUSTER_LOCATION: Substitute us-east1.

  • CLUSTER_VERSION: Substitute the most recent version available. You can find this in the Rapid channel release notes. The minimum required version is 1.20.6-gke.1000. This is the GKE cluster version to use in this example.

Set the values here:

# Substitute your project ID
PROJECT_ID=PROJECT_ID


# GKE cluster name and zone for this example.
CLUSTER_NAME=CLUSTER_NAME
ZONE=ZONE

# GKE cluster URL derived from the above
GKE_CLUSTER_URL="https://container.googleapis.com/v1/projects/PROJECT_ID/locations/ZONE/clusters/CLUSTER_NAME"

# Workload pool to be used with the GKE cluster
WORKLOAD_POOL="PROJECT_ID.svc.id.goog"

# Kubernetes namespace to run client and server demo.
K8S_NAMESPACE=K8S_NAMESPACE
DEMO_CLIENT_KSA=DEMO_CLIENT_KSA
DEMO_SERVER_KSA=DEMO_SERVER_KSA

# Compute other values
# Project number for your project
PROJNUM=PROJNUM

CLUSTER_VERSION=CLUSTER_VERSION
CLUSTER_LOCATION=CLUSTER_LOCATION
SA_GKE=service-PROJNUM@container-engine-robot.iam.gserviceaccount.com

Enabling the APIs

Use the gcloud services enable command to enable all of the APIs you need to set up Traffic Director security with Certificate Authority Service.

gcloud services enable \
   container.googleapis.com \
   cloudresourcemanager.googleapis.com \
   compute.googleapis.com \
   trafficdirector.googleapis.com \
   networkservices.googleapis.com \
   networksecurity.googleapis.com \
   privateca.googleapis.com

Creating a GKE cluster

Traffic Director security depends on the CA Service integration with GKE. The GKE cluster must meet the following requirements in addition to any requirements for configuring Traffic Director with Envoy:

  • The GKE cluster must be enabled and configured with workload certificates, as described in Creating certificate authorities to issue certificates.
  • Use the most recent cluster version that is available in the rapid channel. The minimum required cluster version is 1.20.6-gke.1000.
  • A special service account in your project for the cluster to use (impersonate).

Before you create the cluster, read the Rapid channel release notes and make sure that you already have the most recent version installed. Otherwise, click the link in the rapid channel release notes to the repository and install the most recent version.

  1. Run the following commands to create a GKE cluster with Workload Identity and the CA Service integration enabled.

    # Create a GKE cluster with GKE managed workload certificates.
    gcloud beta container clusters create CLUSTER_NAME \
      --release-channel=rapid \
      --scopes=cloud-platform \
      --image-type=cos_containerd \
      --zone=ZONE \
      --workload-pool=PROJECT_ID.svc.id.goog \
      --enable-workload-certificates \
      --cluster-version=CLUSTER_VERSION \
      --enable-ip-alias \
      --workload-metadata=GKE_METADATA
    
  2. Run the following command to switch to the new cluster as the default cluster for your kubectl commands:

    gcloud container clusters get-credentials CLUSTER_NAME --zone ZONE
    

Registering newly-created clusters with the GKE Hub

Register the cluster you created in Creating a GKE cluster with a fleet, also known as the GKE Hub. Registering the cluster makes it easier for you to configure clusters across multiple projects.

Note that these steps can take up to ten minutes each to complete.

  1. Enable the GKE Hub API for your project:

    gcloud services enable \
      gkehub.googleapis.com
    
  2. Register your cluster with the Hub:

    gcloud beta container hub memberships register CLUSTER_NAME \
      --gke-cluster=ZONE/CLUSTER_NAME \
      --enable-workload-identity   --manifest-output-file=MANIFEST-FILE_NAME
    

    Replace the variables as follows:

    • CLUSTER_NAME: Your cluster's name.
    • ZONE: Your cluster's zone.
    • MANIFEST-FILE_NAME: The file path where these commands generate the manifest for registration.

    When the registration process succeeds, you see a message such as the following:

    Finished registering the cluster CLUSTER_NAME with the Hub.
  3. Apply the generated manifest file to your cluster:

    kubectl apply -f MANIFEST-FILE_NAME
    

    When the application process succeeds, you see messages such as the following:

    namespace/gke-connect created
    serviceaccount/connect-agent-sa created
    podsecuritypolicy.policy/gkeconnect-psp created
    role.rbac.authorization.k8s.io/gkeconnect-psp:role created
    rolebinding.rbac.authorization.k8s.io/gkeconnect-psp:rolebinding created
    role.rbac.authorization.k8s.io/agent-updater created
    rolebinding.rbac.authorization.k8s.io/agent-updater created
    role.rbac.authorization.k8s.io/gke-connect-agent-20210416-01-00 created
    clusterrole.rbac.authorization.k8s.io/gke-connect-impersonation-20210416-01-00 created
    clusterrolebinding.rbac.authorization.k8s.io/gke-connect-impersonation-20210416-01-00 created
    clusterrolebinding.rbac.authorization.k8s.io/gke-connect-feature-authorizer-20210416-01-00 created
    rolebinding.rbac.authorization.k8s.io/gke-connect-agent-20210416-01-00 created
    role.rbac.authorization.k8s.io/gke-connect-namespace-getter created
    rolebinding.rbac.authorization.k8s.io/gke-connect-namespace-getter created
    secret/http-proxy created
    deployment.apps/gke-connect-agent-20210416-01-00 created
    service/gke-connect-monitoring created
    secret/creds-gcp create
    
  4. Get the membership resource from the cluster:

    kubectl get memberships membership -o yaml
    

    The output should include the Workoad Identity pool assigned by the Hub, where PROJECT_ID is your project ID:

    workload_identity_pool: PROJECT_ID.svc.id.goog
    

    This means that the cluster registered successfully.

Creating certificate authorities to issue certificates

To issue certificates to your Pods, create the following certificate authorities (CAs) using CA Service:

  • Root CA. This is the root of trust for all issued workload certificates. You can use an existing root CA if you have one.
  • Subordinate CA. This CA issues certificates for workloads. Create the subordinate CA in region where your cluster is deployed.

Creating a subordinate CA is optional, but we strongly recommend creating one rather than using your root CA to issue GKE workload certificates.

The subordinate CA can be in a different region from your cluster, but we strongly recommend creating it in the same region as your cluster to optimize performance. You can, however, create the root and subordinate CAs in different regions without any impact to performance or availability.

These regions are supported during the public preview for CA Service:

Region name Region description
asia-southeast1 Singapore
europe-west1 Belgium
europe-west4 Netherlands
us-central1 Iowa
us-east1 South Carolina
us-west1 Oregon

The list of supported locations can also be checked by running the following command:

gcloud beta privateca locations list
  1. Create a root CA:

    gcloud beta privateca roots create ROOT_CA_NAME \
      --subject "CN=ROOT_CA_NAME, O=ROOT_CA_ORGANIZATION" \
      --location ROOT_CA_LOCATION \
      --tier enterprise
    

    For this demonstration setup, use the following values for the variables:

    • ROOT_CA_NAME: "company-root"
    • ROOT_CA_ORGANIZATION: "TestOrgLLC"
    • ROOT_CA_LOCATION: "us-east1"
  2. Create a subordinate CA in the same region as your cluster:

    gcloud beta privateca subordinates create SUBORDINATE_CA_NAME \
      --issuer ROOT_CA_NAME \
      --issuer-location ROOT_CA_LOCATION \
      --subject "CN=SUBORDINATE_CA_NAME, O=SUBORDINATE_CA_ORGANIZATION" \
      --location SUBORDINATE_CA_LOCATION \
      --tier devops
    

    For this demonstration setup, use the following values for the variables:

    • SUBORDINATE_CA_NAME: "td-ca"
    • SUBORDINATE_CA_ORGANIZATION: "TestOrgLLC"
    • SUBORDINATE_CA_LOCATION: "us-east1"
  3. Grant the role privateca.admin for CA Service to individuals who need to modify IAM policies, where MEMBER is an individual who needs this access, specifically, any individuals who perform the steps below that grant the privateca.auditor and privateca.certificateManager roles:

    gcloud projects add-iam-policy-binding PROJECT_ID \
      --member=MEMBER \
      --role=roles/privateca.admin
    
  4. Grant the IAM privateca.auditor role for the root CA to allow access from the GKE service account:

    # Grant GKE the privateca.auditor IAM role for your root CA so that it can
    # get the root CA certificate:
    gcloud beta privateca roots add-iam-policy-binding ROOT_CA_NAME \
     --location ROOT_CA_LOCATION \
     --role roles/privateca.auditor \
     --member="serviceAccount:service-PROJNUM@container-engine-robot.iam.gserviceaccount.com"
    
  5. Grant the IAM privateca.certificateManager role for the subordinate CA to allow access from the GKE service account:

    # Grant GKE the privateca.certificateManager IAM role for the subordinate CA
    # so that it can request certificates for your workloads:
    gcloud beta privateca roots add-iam-policy-binding SUBORDINATE_CA_NAME \
      --location SUBORDINATE_CA_LOCATION \
      --role roles/privateca.certificateManager \
      --member="serviceAccount:service-PROJNUM@container-engine-robot.iam.gserviceaccount.com"
    
  6. Save the following WorkloadCertificateConfig YAML configuration to tell your cluster how to issue workload certificates:

    apiVersion: security.cloud.google.com/v1alpha1
    kind: WorkloadCertificateConfig
    metadata:
      name: default
    spec:
      # Required. The SPIFFE trust domain. This must match your cluster's
      # Workload Identity pool.
      trustDomain: PROJECT_ID.svc.id.goog
    
      # Required. The CA service that issues your certificates.
      certificateAuthorityConfig:
        certificateAuthorityServiceConfig:
          endpointURI: //privateca.googleapis.com/projects/PROJECT_ID/locations/SUBORDINATE_CA_LOCATION/certificateAuthorities/SUBORDINATE_CA_NAME
    
      # Required. The key algorithm to use. Choice of RSA or ECDSA.
      #
      # To maximize compatibility with various TLS stacks, your workloads
      # should use keys of the same family as your root and subordinate CAs.
      #
      # To use RSA, specify configuration such as:
      #   keyAlgorithm:
      #     rsa:
      #       modulusSize: 4096
      #
      # Currently, the only supported ECDSA curves are "P256" and "P384", and the only
      # supported RSA modulus sizes are 2048, 3072 and 4096.
      keyAlgorithm:
        rsa:
          modulusSize: 4096
    
      # Optional. Validity duration of issued certificates, in seconds.
      #
      # Defaults to 86400 (1 day) if not specified.
      validityDurationSeconds: 86400
    
      # Optional. Try to start rotating the certificate once this
      # percentage of validityDurationSeconds is remaining.
      #
      # Defaults to 50 if not specified.
      rotationWindowPercentage: 50
    
    

    Replace the following if the field is not auto-populated:

    • PROJECT_ID: the project ID of the project in which your cluster runs.
  7. Save the following TrustConfig YAML configuration to tell your cluster how to trust the issued certificates:

    apiVersion: security.cloud.google.com/v1alpha1
    kind: TrustConfig
    metadata:
      name: default
    spec:
      # You must include a trustStores entry for the trust domain that
      # your cluster is enrolled in.
      trustStores:
      - trustDomain: PROJECT_ID.svc.id.goog
        # Trust identities in this trustDomain if they appear in a certificate
        # that chains up to this root CA.
        trustAnchors:
        - certificateAuthorityServiceURI: //privateca.googleapis.com/projects/PROJECT_ID/locations/ROOT_CA_LOCATION/certificateAuthorities/ROOT_CA_NAME
    

    Replace the following if the field is not auto-populated:

    • PROJECT_ID: the project ID of your cluster.
  8. Apply the configurations to your cluster:

    kubectl apply -f WorkloadCertificateConfig.yaml
    kubectl apply -f TrustConfig.yaml
    

Configuring Identity and Access Management

The following instructions let the default service account access the Traffic Director Security API and create the Kubernetes service accounts.

  1. Configure IAM to allow the default service account to access the Traffic Director security API.

    GSA_EMAIL=$(gcloud iam service-accounts list --format='value(email)' \
       --filter='displayName:Compute Engine default service account')
    
    gcloud projects add-iam-policy-binding PROJECT_ID \
      --member serviceAccount:${GSA_EMAIL} \
      --role roles/trafficdirector.client
    
  2. Set up Kubernetes service accounts. The client and server deployments in the following sections use the Kubernetes service accounts alice and bob.

    kubectl create serviceaccount --namespace K8S_NAMESPACE DEMO_SERVER_KSA
    kubectl create serviceaccount --namespace K8S_NAMESPACE DEMO_CLIENT_KSA
    
  3. Allow the Kubernetes service accounts to impersonate the default Compute Engine service account by creating an IAM policy binding between the two. This binding allows the Kubernetes service account to act as the default Compute Engine service account.

    gcloud iam service-accounts add-iam-policy-binding  \
      --role roles/iam.workloadIdentityUser \
      --member "serviceAccount:PROJECT_ID.svc.id.goog[K8S_NAMESPACE/DEMO_SERVER_KSA]" ${GSA_EMAIL}
    
    gcloud iam service-accounts add-iam-policy-binding  \
      --role roles/iam.workloadIdentityUser  \
      --member "serviceAccount:PROJECT_ID.svc.id.goog[K8S_NAMESPACE/DEMO_CLIENT_KSA]" ${GSA_EMAIL}
    
  4. Annotate the Kubernetes service accounts to associate them with the default Compute Engine service account.

    kubectl annotate --namespace K8S_NAMESPACE \
      serviceaccount DEMO_SERVER_KSA \
      iam.gke.io/gcp-service-account=${GSA_EMAIL}
    
    kubectl annotate --namespace K8S_NAMESPACE \
      serviceaccount DEMO_CLIENT_KSA \
      iam.gke.io/gcp-service-account=${GSA_EMAIL}
    

Setting up Traffic Director

Use these instructions to deploy Traffic Director, set up a test service, and complete other deployment tasks.

Installing the Envoy sidecar injector in the cluster

Use the instructions in both of the following sections of the Traffic Director setup for GKE Pods with automatic Envoy injection to deploy and enable Envoy sidecar injection in your cluster:

Setting up a test service

Use these instructions to set up a test service for your deployment.

wget -q -O -  https://storage.googleapis.com/traffic-director/security/public-preview/service_sample.yaml | sed -e s/DEMO_SERVER_KSA_PLACEHOLDER/DEMO_SERVER_KSA/g > service_sample.yaml

kubectl apply -f service_sample.yaml

The file service_sample.yaml contains the podspec for your demo server application. There are some annotations that are specific to Traffic Director security.

Traffic Director proxy metadata

The podspec specifies the proxyMetadata annotation:

spec:
...
      annotations:
        cloud.google.com/proxyMetadata: '{"app": "payments"}'
...

When the Pod is initialized, the sidecar proxy picks up this annotation and transmits it to Traffic Director. Traffic Director can then use this information to send back filtered configuration:

  • Later in this guide, note that the endpoint policy specifies an endpoint matcher.
  • The endpoint matcher specifies that only clients that present a label with name app and value payments receive the filtered configuration.

Using workload certificates and keys signed by CA Service

The podspec specifies the enableManagedCerts annotation:

spec:
...
      annotations:
        ...
        cloud.google.com/enableManagedCerts: "true"
...

When the Pod is initialized, CA Service signed certificates and keys are automatically mounted on the local sidecar proxy filesystem.

Configuring the inbound traffic interception port

The podspec specifies the includeInboundPorts annotation:

spec:
...
      annotations:
        ...
        cloud.google.com/includeInboundPorts: "8000"
...

This is the port on which your server application listens for connections. When the Pod is initialized, the sidecar proxy picks up this annotation and transmits it to Traffic Director. Traffic Director can then use this information to send back filtered configuration which intercepts all incoming traffic to this port and can apply security policies on it.

The health check port must be different from the application port. Otherwise, the same security policies will apply to incoming connections to the health check port which may lead to the connections being declined which results in the server incorrectly marked as unhealthy.

Configuring GKE services with NEGs

GKE services must be exposed through network endpoint groups (NEGs) so that you can configure them as backends of a Traffic Director backend service. The service_sample.yaml package provided with this setup guide uses the NEG name service-test-neg in the following annotation:

...
metadata:
  annotations:
    cloud.google.com/neg: '{"exposed_ports": {"80":{"name": "service-test-neg"}}}'
spec:
  ports:
  - port: 80
    name: service-test
    protocol: TCP
    targetPort: 8000

You do not need to change the service_sample.yaml file.

Saving the NEG's name

Save the NEG's name in the NEG_NAME variable:

NEG_NAME="service-test-neg"

Deploying a client application to GKE

Run the following command to launch a demonstration client with an Envoy proxy as a sidecar, which you need to demonstrate the security features.

wget -q -O -  https://storage.googleapis.com/traffic-director/security/public-preview/client_sample.yaml | sed -e s/DEMO_CLIENT_KSA_PLACEHOLDER/DEMO_CLIENT_KSA/g > client_sample.yaml

kubectl apply -f client_sample.yaml

The client podspec only includes the enableManagedCerts annotation. This is required to mount the necessary volumes for GKE managed workload certificates and keys which are signed by the CA Service instance.

Configuring Traffic Director Google Cloud resources

Follow the steps in Configuring Traffic Director with Cloud Load Balancing components. Make sure to verify that the traffic from the sample client is routed to the sample service.

Traffic Director configuration is complete and you can now configure authentication and authorization policies.

Setting up service-to-service security

Use the instructions in the following sections to set up service-to-service security.

Enabling mTLS in the mesh

To set up mTLS in your mesh, you must secure outbound traffic to the backend service and secure inbound traffic to the endpoint.

Securing outbound traffic to the backend service

To secure outbound traffic, you first create a client TLS policy that does the following:

  • Uses google_cloud_private_spiffe as the plugin for clientCertificate, which programs Envoy to use GKE managed workload certificates as the client identity.
  • Uses google_cloud_private_spiffe as the plugin for serverValidationCa which programs Envoy to use GKE managed workload certificates for server validation.

Next, you attach the client TLS policy to the backend service. This does the following:

  • Applies the authentication policy from the client TLS policy to outbound connections to endpoints of the backend service.
  • SAN (Subject Alternative Names) instructs the client to assert the exact identity of the server that it's connecting to.
  1. Create the client TLS policy in a file client_mtls_policy.yaml:

    name: "client_mtls_policy"
    clientCertificate:
      certificateProviderInstance:
        pluginInstance: google_cloud_private_spiffe
    serverValidationCa:
    - certificateProviderInstance:
        pluginInstance: google_cloud_private_spiffe
    
  2. Import the client TLS policy:

    gcloud beta network-security client-tls-policies import client_mtls_policy \
        --source=client_mtls_policy.yaml --location=global
    
  3. Attach the client TLS policy to the backend service. This enforces mTLS authentication on all outbound requests from the client to this backend service.

    gcloud compute backend-services export td-gke-service \
        --global --destination=demo-backend-service.yaml
    

    Append the following lines to demo-backend-service.yaml:

    securitySettings:
      clientTlsPolicy: projects/PROJECT_ID/locations/global/clientTlsPolicies/client_mtls_policy
      subjectAltNames:
        - "spiffe://PROJECT_ID.svc.id.goog/ns/K8S_NAMESPACE/sa/DEMO_SERVER_KSA"
    
  4. Import the values:

    gcloud compute backend-services import td-gke-service \
        --global --source=demo-backend-service.yaml
    
  5. Optionally, run the following command to check whether the request fails. This is an expected failure, because the client expects certificates from the endpoint, but the endpoint is not programmed with a security policy.

    # Get the name of the Podrunning Busybox.
    BUSYBOX_POD=$(kubectl get po -l run=client -o=jsonpath='{.items[0].metadata.name}')
    
    # Command to execute that tests connectivity to the service service-test.
    TEST_CMD="wget -q -O - service-test; echo"
    
    # Execute the test command on the pod.
    kubectl exec -it $BUSYBOX_POD -c busybox -- /bin/sh -c "$TEST_CMD"
    

    You see output such as this:

    wget: server returned error: HTTP/1.1 503 Service Unavailable
    

Securing inbound traffic to the endpoint

To secure inbound traffic, you first create a server TLS policy that does the following:

  • Uses google_cloud_private_spiffe as the plugin for serverCertificate, which programs Envoy to use GKE managed workload certificates as the server identity.
  • Uses google_cloud_private_spiffe as the plugin for clientValidationCa, which programs Envoy to use GKE managed workload certificates for client validation.
  1. Save the server TLS policy values in a file called server_mtls_policy.yaml.

    name: "server_mtls_policy"
    serverCertificate:
      certificateProviderInstance:
        pluginInstance: google_cloud_private_spiffe
    mtlsPolicy:
      clientValidationCa:
      - certificateProviderInstance:
          pluginInstance: google_cloud_private_spiffe
    
  2. Create the server TLS policy:

    gcloud beta network-security server-tls-policies import server_mtls_policy \
        --source=server_mtls_policy.yaml --location=global
    
  3. Create a file called ep_mtls.yaml that contains the endpoint matcher and attach the server TLS policy.

    endpointMatcher:
      metadataLabelMatcher:
        metadataLabelMatchCriteria: MATCH_ALL
        metadataLabels:
        - labelName: app
          labelValue: payments
    name: "ep"
    serverTlsPolicy: projects/PROJECT_ID/locations/global/serverTlsPolicies/server_mtls_policy
    type: SIDECAR_PROXY
    
  4. Import the endpoint matcher.

    gcloud beta network-services endpoint-policies import ep \
        --source=ep_mtls.yaml --location=global
    

Validating the setup

Run the following curl command. If the request finishes successfully, you see x-forwarded-client-cert in the output. The header is printed only when the connection is an mTLS connection.

# Get the name of the Podrunning Busybox.
BUSYBOX_POD=$(kubectl get po -l run=client -o=jsonpath='{.items[0].metadata.name}')

# Command to execute that tests connectivity to the service service-test.
TEST_CMD="wget -q -O - service-test; echo"

# Execute the test command on the pod.
kubectl exec -it $BUSYBOX_POD -c busybox -- /bin/sh -c "$TEST_CMD"

You see ooutput such as the following:

GET /get HTTP/1.1
Host: service-test
content-length: 0
x-envoy-internal: true
accept: */*
x-forwarded-for: 10.48.0.6
x-envoy-expected-rq-timeout-ms: 15000
user-agent: curl/7.35.0
x-forwarded-proto: http
x-request-id: redacted
x-forwarded-client-cert: By=spiffe://PROJECT_ID.svc.id.goog/ns/K8S_NAMESPACE/sa/DEMO_SERVER_KSA;Hash=Redacted;Subject="Redacted;URI=spiffe://PROJECT_ID.svc.id.goog/ns/K8S_NAMESPACE/sa/DEMO_CLIENT_KSA

Note that the x-forwarded-client-cert header is inserted by the server side Envoy and contains its own identity (server) and the identity of the source client. Because we see both the client and server identities, this is a signal of a mTLS connection.

Configuring service-level access with an authorization policy

These instructions create an authorization policy that allows requests that are sent by the DEMO_CLIENT_KSA account in which the hostname is service-test, the port is 8000, and the HTTP method is GET.

  1. Create an authorization policy by creating a file called authz_policy.yaml.

    action: ALLOW
    name: authz_policy
    rules:
    - sources:
      - principals:
        - spiffe://PROJECT_ID.svc.id.goog/ns/K8S_NAMESPACE/sa/DEMO_CLIENT_KSA
      destinations:
      - hosts:
        - service-test
        ports:
        - 8000
        methods:
        - GET
    
  2. Import the policy:

    gcloud beta network-security authorization-policies import authz_policy \
      --source=authz_policy.yaml \
      --location=global
    
  3. Update the endpoint policy to reference the new authorization policy by appending the following to the file ep_mtls.yaml:

    authorizationPolicy: projects/PROJECT_ID/locations/global/authorizationPolicies/authz_policy
    

    The endpoint policy now specifies that both mTLS and the authorization policy must be enforced on inbound requests to Pods whose Envoy sidecar proxies present the label app:payments.

  4. Import the policy:

    gcloud beta network-services endpoint-policies import ep \
        --source=ep_mtls.yaml --location=global
    

Validating the setup

Run the following commands to validate the setup.

# Get the name of the Podrunning Busybox.
BUSYBOX_POD=$(kubectl get po -l run=client -o=jsonpath='{.items[0].metadata.name}')

# Command to execute that tests connectivity to the service service-test.
# This is a valid request and will be allowed.
TEST_CMD="wget -q -O - service-test; echo"

# Execute the test command on the pod.
kubectl exec -it $BUSYBOX_POD -c busybox -- /bin/sh -c "$TEST_CMD"

The expected output is similar to this:

GET /get HTTP/1.1
Host: service-test
content-length: 0
x-envoy-internal: true
accept: */*
x-forwarded-for: redacted
x-envoy-expected-rq-timeout-ms: 15000
user-agent: curl/7.35.0
x-forwarded-proto: http
x-request-id: redacted
x-forwarded-client-cert: By=spiffe://PROJECT_ID.svc.id.goog/ns/K8S_NAMESPACE/sa/DEMO_SERVER_KSA;Hash=Redacted;Subject="Redacted;URI=spiffe://PROJECT_ID.svc.id.goog/ns/K8S_NAMESPACE/sa/DEMO_CLIENT_KSA

Run the following commands to test whether the authorization policy is correctly refusing invalid requests:

# Failure case
# Command to execute that tests connectivity to the service service-test.
# This is an invalid request and server will reject because the server
# authorization policy only allows GET requests.
TEST_CMD="wget -q -O - service-test --post-data='' ; echo"

# Execute the test command on the pod.
kubectl exec -it $BUSYBOX_POD -c busybox -- /bin/sh -c "$TEST_CMD"

The expected output is similar to this:

<RBAC: access denied HTTP/1.1 403 Forbidden>

Setting up ingress gateway security

This section assumes that you completed the service-to-service security section, including setting up your GKE cluster with the sidecar auto-injector, creating a certificate authority, and creating an endpoint policy.

In this section, you deploy an Envoy proxy as an ingress gateway that terminates TLS connections and authorizes requests from a cluster's internal clients.

Terminating TLS at an ingress gateway (click to enlarge)
Terminating TLS at an ingress gateway (click to enlarge)

To set up an ingress gateway to terminate TLS, do the following:

  1. Deploy a Kubernetes service that is reachable using a cluster internal IP address.
    1. The deployment consists of a standalone Envoy proxy that is exposed as a Kubernetes service and connects to Traffic Director.
  2. Create a server TLS policy to to terminate TLS.
  3. Create an authorization policy to authorize incoming requests.

Deploying an ingress gateway service to GKE

Run the following command to deploy the ingress gateway service on GKE:

wget -q -O -  https://storage.googleapis.com/traffic-director/security/public-preview/gateway_sample_xdsv3.yaml | sed -e s/PROJECT_NUMBER_PLACEHOLDER/PROJNUM/g | sed -e s/NETWORK_PLACEHOLDER/default/g | sed -e s/DEMO_CLIENT_KSA_PLACEHOLDER/DEMO_CLIENT_KSA/g > gateway_sample.yaml

kubectl apply -f gateway_sample.yaml

The file gateway_sample.yaml is the spec for the ingress gateway. The following sections describe some additions to the spec.

Disabling Traffic Director sidecar injection

The gateway_sample.yaml spec deploys an Envoy proxy as the sole container. In previous steps, Envoy was injected as a sidecar to an application container. To avoid having multiple Envoys handle requests, you can disable sidecar injection for this Kubernetes service using the following statement:

sidecar.istio.io/inject: "false"

Mounting the correct volume

The gateway_sample.yaml spec mounts the volume gke-workload-certificates. This volume is used in sidecar deployment as well, but it is added automatically by the sidecar injector when it sees the annotation cloud.google.com/enableManagedCerts: "true". The gke-workload-certificates volume contains the GKE-managed SPIFFE certs and keys that are signed by the CA Service instance that you set up.

Setting the cluster's internal IP address

Configure the ingress gateway with a service of type ClusterInternal. This creates an internally-resolvable DNS hostname for mesh-gateway. When a client sends a request to mesh-gateway:443, Kubernetes immediately routes the request to the ingress gateway Envoy deployment's port 8080.

Optionally, if you want to use an external IP address, change the type to LoadBalancer and then use this command to extract the externally-accessible IP address:

# Optional. Only do this if you intend to use an external (reachable from
# outside your GKE cluster) IP address.
export mesh-gateway=$(kubectl get svc mesh-gateway  -o jsonpath="{.status.loadBalancer.ingress[0].ip}")

Enabling TLS on an ingress gateway

Use these instructions to enable TLS on an ingress gateway.

  1. Create a server TLS policy resource to terminate TLS connections, with the values in a file called server_tls_policy.yaml:

    description: tls server policy
    name: server_tls_policy
    serverCertificate:
      certificateProviderInstance:
        pluginInstance: google_cloud_private_spiffe
    
  2. Import the server TLS policy:

    gcloud beta network-security server-tls-policies import server_tls_policy \
        --source=server_tls_policy.yaml --location=global
    
  3. Create a new URL map that routes all requests to the td-gke-service backend service. The ingress gateway handles incoming requests and sends them to Pods belonging to the td-gke-service backend service.

    gcloud compute url-maps create td-gke-ig-url-map \
       --default-service=td-gke-service
    
  4. Create a new target HTTPS proxy in the file td-gke-https-proxy.yaml and attach the previously created URL map and server TLS policy. This configures the Envoy proxy ingress gateway to terminate incoming TLS traffic.

    kind: compute#targetHttpsProxy
    name: td-gke-https-proxy
    proxyBind: true
    urlMap: https://www.googleapis.com/compute/beta/projects/PROJECT_ID/global/urlMaps/td-gke-ig-url-map
    serverTlsPolicy: projects/PROJECT_ID/locations/global/serverTlsPolicies/server_tls_policy
    
  5. Import the policy:

    gcloud compute target-https-proxies import td-gke-https-proxy \
       --global --source=td-gke-https-proxy.yaml
    
  6. Create a new forwarding rule and attach the target HTTPS proxy. This configures the Envoy proxy to listen on port 8080 and apply the routing and security policies defined in td-gke-https-proxy.

    gcloud compute forwarding-rules create td-gke-gateway-forwarding-rule --global \
      --load-balancing-scheme=INTERNAL_SELF_MANAGED --address=0.0.0.0 \
      --target-https-proxy=td-gke-https-proxy --ports 8080 \
      --network default
    
  7. Optionally, update the authorization policy on the backends to allow requests when all of the following conditions are met:

    • Requests sent by DEMO_CLIENT_KSA. (The ingress gateway deployment uses the DEMO_CLIENT_KSA service account.)
    • Requests with host mesh-gateway or service-test
    • Port: 8000

    You do not need to run these commands unless you configured an authorization policy for your backends. If there is no authorization policy on the endpoint or it does not contain host or source principal match in the authorization policy, then request are allowed without this step. Add these values to authz_policy.yaml.

    action: ALLOW
    name: authz_policy
    rules:
    - sources:
      - principals:
        - spiffe://PROJECT_ID.svc.id.goog/ns/K8S_NAMESPACE/sa/DEMO_CLIENT_KSA
      destinations:
      - hosts:
        - service-test
        - mesh-gateway
        ports:
        - 8000
        methods:
        - GET
    
  8. Import the policy:

    gcloud beta network-security authorization-policies import authz_policy \
      --source=authz_policy.yaml \
      --location=global
    

Validating the ingress gateway deployment

You use a new container called debug to send requests to the ingress gateway to validate the deployment.

In the following spec, the annotation "sidecar.istio.io/inject":"false" keeps the Traffic Director sidecar injector from automatically injecting a sidecar proxy. There is no sidecar to help the debug container in request routing. The container must connect to the ingress gateway for routing.

The spec includes the --no-check-certificate flag, which ignores server certificate validation. The debug container does not have the certificate authority validation certificates necessary to valid certificates signed by CA Service that are used by the ingress gateway to terminate TLS.

In a production environment, we recommend that you download the CA Service validation certificate and mount or install it on your client. After you install the validation certificate, remove the --no-check-certificate option of the wget command.

Run the following command:

kubectl run -i --tty --rm debug --image=busybox --restart=Never  --overrides='{ "metadata": {"annotations": { "sidecar.istio.io/inject":"false" } } }'  -- /bin/sh -c "wget --no-check-certificate -qS -O - https://mesh-gateway; echo"

You see output similar to this:

GET / HTTP/1.1
Host: 10.68.7.132
x-forwarded-client-cert: By=spiffe://PROJECT_ID.svc.id.goog/ns/K8S_NAMESPACE/sa/DEMO_SERVER_KSA;Hash=Redacted;Subject="Redacted;URI=spiffe://PROJECT_ID.svc.id.goog/ns/K8S_NAMESPACE/sa/DEMO_CLIENT_KSA
x-envoy-expected-rq-timeout-ms: 15000
x-envoy-internal: true
x-request-id: 5ae429e7-0e18-4bd9-bb79-4e4149cf8fef
x-forwarded-for: 10.64.0.53
x-forwarded-proto: https
content-length: 0
user-agent: Wget

Run the following negative test command:

# Negative test
# Expect this to fail because gateway expects TLS.
kubectl run -i --tty --rm debug --image=busybox --restart=Never  --overrides='{ "metadata": {"annotations": { "sidecar.istio.io/inject":"false" } } }'  -- /bin/sh -c "wget --no-check-certificate -qS -O - http://mesh-gateway:443/headers; echo"

You see output similar to the following:

wget: error getting response: Connection reset by peer

Run the following negative test command:

# Negative test.
# AuthorizationPolicy applied on the endpoints expect a GET request. Otherwise
# the request is denied authorization.
kubectl run -i --tty --rm debug --image=busybox --restart=Never  --overrides='{ "metadata": {"annotations": { "sidecar.istio.io/inject":"false" } } }'  -- /bin/sh -c "wget --no-check-certificate -qS -O - https://mesh-gateway --post-data=''; echo"

You see output similar to the following:

HTTP/1.1 403 Forbidden
wget: server returned error: HTTP/1.1 403 Forbidden

Setting up an authorization policy for the ingress gateway

The authorization policy that you set up here lets the ingress gateway allow requests into the mesh when all of the following conditions are met:

  • Host: mesh-gateway
  • Port: 8080
  • path: *
  • HTTP method GET
  1. Create an authorization policy in the file authz_gateway_policy.yaml:

    action: ALLOW
    name: authz_policy
    rules:
    - destinations:
      - hosts:
        - mesh-gateway
        ports:
        - 8080
        methods:
        - GET
    
  2. Import the values in the file:

    gcloud beta network-security authorization-policies import authz_gateway_policy \
       --source=authz_gateway_policy.yaml  --location=global
    
  3. Edit the file td-gke-https-proxy.yaml by adding this to it:

    authorizationPolicy: projects/PROJECT_ID/locations/global/authorizationPolicies/authz_gateway_policy
    
  4. Import the file td-gke-https-proxy.yaml again:

    gcloud compute target-https-proxies import td-gke-https-proxy \
       --global --source=td-gke-https-proxy.yaml
    

Validating the deployment

Run the following command to validate your deployment.

# On your localhost.
kubectl run -i --tty --rm debug --image=busybox --restart=Never  --overrides='{ "metadata": {"annotations": { "sidecar.istio.io/inject":"false" } } }'  -- /bin/sh -c "wget --no-check-certificate -qS -O - https://mesh-gateway; echo"

You see output similar to the following:

GET / HTTP/1.1
Host: 35.196.50.2
x-forwarded-client-cert: By=spiffe://PROJECT_ID.svc.id.goog/ns/K8S_NAMESPACE/sa/DEMO_SERVER_KSA;Hash=Redacted;Subject="Redacted;URI=spiffe://PROJECT_ID.svc.id.goog/ns/K8S_NAMESPACE/sa/DEMO_CLIENT_KSA
x-envoy-expected-rq-timeout-ms: 15000
user-agent: curl/7.72.0
x-forwarded-proto: https
content-length: 0
x-envoy-internal: true
x-request-id: 98bec135-6df8-4082-8edc-b2c23609295a
accept: */*
x-forwarded-for: 10.142.0.7

Run the following negative test command:

# Negative test. Expect failure because only POST method is allowed by \
# authz_gateway_policy
kubectl run -i --tty --rm debug --image=busybox --restart=Never  --overrides='{ "metadata": {"annotations": { "sidecar.istio.io/inject":"false" } } }'  -- /bin/sh -c "wget --no-check-certificate -qS -O - https://mesh-gateway/ --post-data=''; echo"

You see output similar to the following:

wget: server returned error: HTTP/1.1 403 Forbidden

Deleting the deployment

You can optionally run these commands to delete the deployment you created using this guide.

To delete the cluster, run this command:

gcloud container clusters delete CLUSTER_NAME --zone ZONE --quiet

To delete the resources you created, run these commands:

gcloud compute forwarding-rules delete td-gke-forwarding-rule --global --quiet
gcloud compute forwarding-rules delete td-gke-gateway-forwarding-rule --global \
    --quiet
gcloud compute target-http-proxies delete td-gke-proxy  --quiet
gcloud compute target-https-proxies delete td-gke-https-proxy  --quiet
gcloud compute target-https-proxies delete td-gke-https-proxy-authz  --quiet
gcloud compute url-maps delete td-gke-url-map  --quiet
gcloud compute url-maps delete td-gke-ig-url-map  --quiet
gcloud compute backend-services delete td-gke-service --global --quiet
cloud compute network-endpoint-groups delete service-test-neg --zone ZONE --quiet
gcloud compute firewall-rules delete fw-allow-health-checks --quiet
gcloud compute health-checks delete td-gke-health-check --quiet
gcloud beta network-services endpoint-policies delete ep \
    --location=global --quiet
gcloud beta network-security authorization-policies delete authz_gateway_policy \
   --location=global --quiet
gcloud beta network-security authorization-policies delete authz_policy \
    --location=global --quiet
gcloud beta network-security client-tls-policies delete client_mtls_policy \
    --location=global --quiet
gcloud beta network-security server-tls-policies delete server_tls_policy \
    --location=global --quiet
gcloud beta network-security server-tls-policies delete server_mtls_policy \
    --location=global --quiet

Troubleshooting

This section contains information on how to fix issues you encounter during security service setup.

Connection failures

If the connection fails with anupstream connect error or disconnect/reset before headers error, examine the Envoy logs, where you might see one of the following log messages:

gRPC config stream closed: 5, Requested entity was not found

gRPC config stream closed: 2, no credential token is found

If you see these errors in the Envoy log, it is likely that the service account token is mounted incorrectly, or it is using a different audience, or both.

For more information, see Error messages in the Envoy logs indicate a configuration problem.

Pods not created

To troubleshoot this issue, see Troubleshooting automatic deployments for GKE Pods.

Envoy not authenticating with Traffic Director

When Envoy (envoy-proxy) connects to Traffic Director to fetch the xDS configuration, it uses Workload Identity (WI) and the Compute Engine VM default service account (unless the bootstrap was changed). If the authentication fails, then Envoy does not get into the ready state.

Unable to create a cluster with --workload-identity-certificate-authority flag

If you see this error, make sure that you're running the most recent version of the gcloud command-line tool:

gcloud components update

###

If the Pods stay in a pending state during the setup process, increase the CPU and memory resources for the Pods in your deployment spec.

Unable to create cluster with the --enable-mesh-certificates flag

Ensure that you are running the latest version of the gcloud tool:

gcloud components update

Note that the --enable-mesh-certificates flag works only with gcloud beta.

Pods don't start

Pods that use GKE workload certificates might fail to start if certificate provisioning is failing. This can happen in situations like the following:

  • The WorkloadCertificateConfig or the TrustConfig is misconfigured or missing.
  • CSRs aren't being approved.

You can check whether certificate provisioning is failing by checking the Pod events.

  1. Check the status of your Pod:

    kubectl get pod -n POD_NAMESPACE POD_NAME
    

    Replace the following:

    • POD_NAMESPACE: the namespace of your Pod.
    • POD_NAME: the name of your Pod.
  2. Check recent events for your Pod:

    kubectl describe pod -n POD_NAMESPACE POD_NAME
    
  3. If certificate provisioning is failing, you will see an event with Type=Warning, Reason=FailedMount, From=kubelet, and a Message field that begins with MountVolume.SetUp failed for volume "gke-workload-certificates". The Message field contains troubleshooting information.

    Events:
      Type     Reason       Age                From       Message
      ----     ------       ----               ----       -------
      Warning  FailedMount  13s (x7 over 46s)  kubelet    MountVolume.SetUp failed for volume "gke-workload-certificates" : rpc error: code = Internal desc = unable to mount volume: store.CreateVolume, err: unable to create volume "csi-4d540ed59ef937fbb41a9bf5380a5a534edb3eedf037fe64be36bab0abf45c9c": caPEM is nil (check active WorkloadCertificateConfig)
    
  4. See the following troubleshooting steps if the reason your Pods don't start is because of misconfigured objects, or because of rejected CSRs.

WorkloadCertificateConfig or TrustConfig is misconfigured

Ensure that you created the WorkloadCertificateConfig and TrustConfig objects correctly. You can diagnose misconfigurations on either of these objects using kubectl.

  1. Retrieve the current status.

    For WorkloadCertificateConfig:

    kubectl get WorkloadCertificateConfig default -o yaml
    

    For TrustConfig:

    kubectl get TrustConfig default -o yaml
    
  2. Inspect the status output. A valid object will have a condition with type: Ready and status: "True".

    status:
      conditions:
      - lastTransitionTime: "2021-03-04T22:24:11Z"
        message: WorkloadCertificateConfig is ready
        observedGeneration: 1
        reason: ConfigReady
        status: "True"
        type: Ready
    

    For invalid objects, status: "False" appears instead. Thereasonand message field contain additional troubleshooting details.

CSRs are not approved

If something goes wrong during the CSR approval process, you can check the error details in the type: Approved and type: Issued conditions of the CSR.

  1. List relevant CSRs using kubectl:

    kubectl get csr \
      --field-selector='spec.signerName=spiffe.gke.io/spiffe-leaf-signer'
    
  2. Choose a CSR that is either Approved and not Issued, or is not Approved.

  3. Get details for the selected CSR using kubectl:

    kubectl get csr CSR_NAME -o yaml
    

    Replace CSR_NAME with the name of the CSR you chose.

A valid CSR has a condition with type: Approved and status: "True", and a valid certificate in the status.certificate field:

status:
  certificate: <base64-encoded data>
  conditions:
  - lastTransitionTime: "2021-03-04T21:58:46Z"
    lastUpdateTime: "2021-03-04T21:58:46Z"
    message: Approved CSR because it is a valid SPIFFE SVID for the correct identity.
    reason: AutoApproved
    status: "True"
    type: Approved

Troubleshooting information for invalid CSRs appears in the message and reason fields.

Pods are missing certificates

  1. Get the Pod spec for your Pod:

    kubectl get pod -n POD_NAMESPACE POD_NAME -o yaml
    

    Replace the following:

    • POD_NAMESPACE: the namespace of your Pod.
    • POD_NAME: the name of your Pod.
  2. Verify that the Pod spec contains the security.cloud.google.com/use-workload-certificates annotation described in Configure Pods to receive mTLS credentials.

  3. Verify that the GKE workload certificates admission controller successfully injected a CSI driver volume of type workloadcertificates.security.cloud.google.com into your Pod spec:

    volumes:
    ...
    -csi:
      driver: workloadcertificates.security.cloud.google.com
      name: gke-workload-certificates
    ...
    
  4. Check for the presence of a volume mount in each of the containers:

    containers:
    - name: ...
      ...
      volumeMounts:
      - mountPath: /var/run/secrets/workload-spiffe-credentials
        name: gke-workload-certificates
        readOnly: true
      ...
    
  5. Verify that the following certificate bundles and the private key are available at the following locations in the Pod:

    • Certificate chain bundle: /var/run/secrets/workload-spiffe-credentials/certificates.pem
    • Private key: /var/run/secrets/workload-spiffe-credentials/private_key.pem
    • CA trust anchor bundle: /var/run/secrets/workload-spiffe-credentials/ca_certificates.pem
  6. If the files are not available, perform the following steps:

    1. Retrieve the CA Service (Preview) instance for the cluster:

      kubectl get workloadcertificateconfigs default -o jsonpath '{.spec.certificateAuthorityConfig.certificateAuthorityServiceConfig.endpointURI}'
      
    2. Retrieve the status of the CA Service (Preview) instance:

      gcloud privateca ISSUING_CA_TYPE describe ISSUING_CA_NAME \
        --location ISSUING_CA_LOCATION
      

      Replace the following:

      • ISSUING_CA_TYPE: the issuing CA type, which must be either subordinates or roots.
      • ISSUING_CA_NAME: the name of the issuing CA.
      • ISSUING_CA_LOCATION: the region of the issuing CA.
    3. Get the IAM policy for the root CA:

      gcloud privateca roots get-iam-policy ROOT_CA_NAME
      

      Replace ROOT_CA_NAME with the name of your root CA.

    4. In the IAM policy, verify that the privateca.auditor policy binding exists:

      ...
      - members:
        - serviceAccount:service-PROJECT_NUMBER@container-engine-robot.iam.gserviceaccount.com
        role: roles/privateca.auditor
      ...
      

      In this example, PROJECT_NUMBER is your cluster's project number.

    5. Get the IAM policy for the subordinate CA:

      gcloud privateca subordinates get-iam-policy SUBORDINATE_CA_NAME
      

      Replace SUBORDINATE_CA_NAME with the subordinate CA name.

    6. In the IAM policy, verify that the privateca.certificateManager policy binding exists:

      ...
      - members:
        - serviceAccount: service-PROJECT_NUMBER@container-engine-robot.iam.gserviceaccount.com
        role: roles/privateca.certificateManager
      ...
      

      In this example, PROJECT_NUMBER is your cluster's project number.

Applications cannot use issued mTLS credentials

  1. Verify that the certificate has not expired:

    cat /var/run/secrets/workload-spiffe-credentials/certificates.pem | openssl x509 -text -noout | grep "Not After"
    
  2. Check that the key type you used is supported by your application.

    cat /var/run/secrets/workload-spiffe-credentials/certificates.pem | openssl x509 -text -noout | grep "Public Key Algorithm" -A 3
    
  3. Check that the issuing CA uses the same key family as the certificate key.

    1. Get the status of the CA Service (Preview) instance:

      gcloud privateca ISSUING_CA_TYPE describe ISSUING_CA_NAME \
        --location ISSUING_CA_LOCATION
      

      Replace the following:

      • ISSUING_CA_TYPE: the issuing CA type, which must be either subordinates or roots.
      • ISSUING_CA_NAME: the name of the issuing CA.
      • ISSUING_CA_LOCATION: the region of the issuing CA.
    2. Check that the keySpec.algorithm in the output is the same key algorithm you defined in the WorkloadCertificateConfig YAML manifest. The output looks like this:

      config:
        ...
        subjectConfig:
          commonName: td-sub-ca
          subject:
            organization: TestOrgLLC
          subjectAltName: {}
      createTime: '2021-05-04T05:37:58.329293525Z'
      issuingOptions:
        includeCaCertUrl: true
      keySpec:
        algorithm: RSA_PKCS1_2048_SHA256
       ...
      

Certificates get rejected

  1. Verify that the peer application uses the same trust bundle to verify the certificate.
  2. Verify that the certificate has not expired:

    cat /var/run/secrets/workload-spiffe-credentials/certificates.pem | openssl x509 -text -noout | grep "Not After"
    
  3. Verify that the client code, if not using the gRPC Go Credentials Reloading API, periodically refreshes the credentials from the filesystem.

  4. Verify that your workloads are in the same trust domain as your CA. GKE workload certificates supports communication between workloads in a single trust domain.