Explore Anthos

The Anthos Sample Deployment on Google Cloud (Preview) is a Google Cloud Marketplace solution that you can preview now. It deploys a real Anthos hands-on environment with a GKE cluster, service mesh, and an application with multiple microservices. This tutorial guides you through these features, letting you learn about and explore Anthos deployed on Google Cloud by following the Anthos story of a fictional online retailer.

If you want to learn more about Anthos and its components first, see our technical overview. However, you don't need to be familiar with Anthos to follow this tutorial. You should be familiar with basic Kubernetes concepts such as clusters; if you're not, see Kubernetes basics, the Google Kubernetes Engine (GKE) documentation, and Preparing an application for Anthos Service Mesh.

When you're ready for a real production installation, see our Setup section.

When you complete this tutorial, please complete our survey.

Your journey

You are the platform lead at the Online Boutique, an online retailer of trendy lifestyle products. Online Boutique started as a small business running an e- commerce website on two servers almost ten years ago. Since then, it has grown into a successful national enterprise with thousands of employees and a growing engineering organization. Online Boutique now wants to expand globally.

Throughout this period, you and your team have found yourself spending more time and money on maintaining infrastructure than on creating new business value. You have decades of cumulative experience invested in your existing stack; however, you know it's not the right technology to meet the scale of global deployment that your company needs as it expands.

Your team adopted Kubernetes a year ago, and since then has been running an on-premises cluster with a modest number of services. Your deployments still involve a number of manual stages, however, and this process has unfortunately resulted in occasional outages due to differences between developer and production environments.

Your development team wants the confidence to be able to deploy more frequently without worrying that their changes may cause problems elsewhere in the application.

You've adopted Anthos to modernize your application and migrate successfully to the cloud to achieve your expansion goals.

Objectives

In this tutorial, you're introduced to some of the key features of Anthos through the following tasks:

  • Deploy your Anthos environment with clusters, applications, and Anthos components: Anthos Service Mesh and Anthos Config Management.

  • Use the Google Cloud Console to explore the Anthos GKE resources used by your application.

  • Use Anthos Service Mesh to observe application services.

  • Use a service level objective (SLO) to monitor for unexpected behavior.

  • Enforce mutual TLS (mTLS) in your service mesh by using Anthos Config Management to ensure end-to-end secure communication.

  • Set up a security guardrail that ensures that pods with privileged containers are not inadvertently deployed.

  • Clean up the Anthos Sample Deployment environment.

Costs

The Anthos Sample Deployment on Google Cloud is not intended for production use. Anthos for non-production use, including exploring the Anthos Sample Deployment on Google Cloud, is free for up to 100 vCPUs until August 31, 2020.

On September 1, 2020 (or thirty days after first enabling the Anthos API, whichever comes later), you will be converted to pay-as-you-go and be charged the fees for Anthos on Google Cloud listed at Google Cloud Platform SKUs unless you have an Anthos subscription. You are responsible for other costs, such as Compute Engine, Cloud Logging, and Cloud Monitoring.

Before you begin

The Anthos Sample Deployment on Google Cloud requires that you use a new project with no existing resources.

The following additional project requirements apply:

  • You must have enough quota in the target deployment project and zone for at least 7 vCPUs, 24.6 GB of memory, 310-GB of disk space, one VPC, two firewall rules, and one Cloud NAT.
  • Your organization does not have a policy that explicitly restricts the use of click-to-deploy images.

Before you start the tutorial:

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. In the Cloud Console, on the project selector page, select or create a Cloud project.

    Go to the project selector page

  3. Zorg dat facturering is ingeschakeld voor uw project.

    Meer informatie over het inschakelen van facturering

  4. Ensure Service Management API is enabled.

    Enable Service Management API

Then do the following to ensure that your project meets the requirements for running the Anthos Sample Deployment:

  1. In your new project, launch Cloud Shell by clicking Activate Cloud Shell Activate Shell Button in the top toolbar.

    Cloud Shell is an interactive shell environment for Google Cloud that lets you manage your projects and resources from your web browser. You will use Cloud Shell again later in this tutorial to update your application's configuration.

  2. Configure Cloud Shell with the target deployment zone, replacing ZONE in the following command:

    gcloud config set compute/zone ZONE
    
  3. Enter the following command to run a script that checks that your project meets the necessary requirements:

    curl -sL https://github.com/GoogleCloudPlatform/anthos-sample-deployment/releases/latest/download/asd-prereq-checker.sh | sh -
    

    Output (example):

    Your active configuration is: [cloudshell-4100]
    Checking project my-project-id, region us-central1, zone us-central1-c
    
    PASS: User has permission to create service account with the required IAM policies.
    PASS: Org Policy will allow this deployment.
    PASS: Service Management API is enabled.
    PASS: Anthos Sample Deployment does not already exist.
    PASS: Project ID is valid, does not contain colon.
    PASS: Project has sufficient quota to support this deployment.
    

If anything doesn't PASS, see our troubleshooting guide. If you don't fix these errors, you might not be able to deploy the sample.

What's deployed?

The Anthos Sample Deployment on Google Cloud provisions your project with the following:

  • One GKE cluster running on Google Cloud: anthos-sample-cluster1.

  • Anthos Service Mesh installed on the cluster. You will use Anthos Service Mesh to manage the service mesh on anthos-sample-cluster1.

  • Online Boutique application running on the cluster. This is a web-based e-commerce app that uses a number of microservices written in various programming languages, including Java, Go, Python, and JavaScript.

  • A single Compute Engine instance (virtual machine) that performs a number of automated tasks to jump-start the tutorial environment after the cluster is created: asd-jump-server.

  • A VPC with a subnetwork within the target deployment region for the GKE cluster and Compute Engine instance. A Cloud NAT gateway on a Cloud Router, and firewall rules for connectivity to and between the deployment's components.

Launch the Anthos Sample Deployment on Google Cloud

Launch the Anthos Sample Deployment on Google Cloud through the Cloud Marketplace:

  1. Open the Anthos Sample Deployment on Google Cloud.

    Go to the Anthos Sample Deployment on Google Cloud

  2. Select and confirm the Google Cloud project to use. This should be the project that you created in the Before you begin section.

  3. Click LAUNCH. It can take several minutes to progress to the deployment configuration screen while the solution enables a few APIs.

  4. (Optional) In the deployment configuration screen, specify your chosen deployment name, zone, and Service Account. However, for your first deployment, we recommend that you accept all of the provided default values, including creating a new Service Account.

  5. Click Deploy. Deploying the trial can take up to 15 minutes, so don't be concerned if you have to wait for a while.

While the deployment is progressing, the Cloud Console transitions to the Deployment Manager view. After the sample is deployed, you can review the full deployment. You should see a list of all enabled resources, including one GKE cluster (anthos-sample-cluster1) and one Compute Engine instance (asd-jump-server).

If you encounter any deployment errors, see our troubleshooting guide.

Using the Anthos Dashboard

Anthos provides an out-of-the-box structured view of all your applications' resources, including clusters, services, and workloads, giving you an at-a-glance view of your resources at a high level, while letting you drill down when necessary to find the low-level information that you need. To see your deployment's top-level dashboard, go to your project's Anthos Dashboard in the Google Cloud Console.

Go to the Anthos Dashboard

You should see:

  • A Service mesh section that tells you that you have 11 services (but that they need action to see their health). You'll find out more about what this means later in the tutorial.

  • A Cluster status section that tells you that you have one healthy GKE cluster.

Screenshot of Anthos Dashboard

Explore Anthos GKE resources

The Anthos Clusters page shows you all the clusters in your project registered to Anthos, including clusters outside Google Cloud. You can also use the Google Kubernetes Engine Clusters page to see all the clusters in your project. In fact, the Anthos Clusters page lets you drill down to the GKE pages if you need to see more cluster and node details.

In this section, you'll take a closer look at the Online Boutique's GKE resources.

Cluster management

  1. In the Google Cloud Console, go to the Anthos Clusters page.

    Go to the Clusters page

  2. Click the anthos-sample-cluster1 cluster to view its basic details in the right pane, including its Type, Master version, and Location. You can also see which Anthos features are enabled in this cluster in the Cluster features section.

  3. For more detailed information about this cluster, click More details in GKE. This brings you to the cluster's page in the Google Kubernetes Engine console, with all the current settings for the cluster.

  4. In the Google Kubernetes Engine console, click the Nodes tab to view all the worker machines in your cluster. From here, you can drill down even further to see the workload Pods running on each node, as well as a resource summary of the node (CPU, memory, storage).

You can find out more about GKE clusters and nodes in the GKE documentation.

Cluster workloads

The Google Kubernetes Engine console has a Workloads view that shows an aggregated view of the workloads (Pods) running on all your GKE clusters.

Workloads from the GKE cluster and namespaces are shown. For example, workloads in the onlineboutique namespace are running in anthos-sample-cluster1.

Services & Ingress

Finally the Services & Ingress view shows the project's Service and Ingress resources. A Service exposes a set of pods as a network service with an endpoint, while an Ingress manages external access to the services in a cluster. However, rather than a regular Kubernetes Ingress, the Online Boutique uses an ingress gateway service for traffic to the shop, which Anthos Service Mesh meshes can use to add more complex traffic routing to their inbound traffic. You can see this in action when you use the service mesh observability features later in this tutorial.

  1. In the Google Kubernetes Engine console, go to the Services & Ingress page.

    Go to the Services & Ingress page

  2. To find the Online Boutique ingress gateways, scroll down the list of available services to find the service with the name istio-ingressgateway.

  3. Select the ingress gateway service for anthos-sample-cluster1 in the list to open its Service details view, which shows more information about the service including all of its external endpoints. An ingress gateway manages inbound traffic for your application service mesh, so in this case we can use its details to visit the Online Boutique's web frontend.

  4. In the Service details view for istio-ingressgateway, click the external endpoint using port 80. You should be able to explore the Online Boutique web interface.

Observing services

Anthos's service management and observability is provided by Anthos Service Mesh, a suite of tools powered by Istio that helps you monitor and manage a reliable service mesh. To find out more about Anthos Service Mesh and how it helps you manage microservices, see the Anthos Service Mesh documentation. If you're not familiar with using microservices with containers and what they can do for you, see Preparing an application for Anthos Service Mesh.

In our example, the cluster in the sample deployment has the microservice-based Online Boutique sample application running on it. The application also includes a loadgenerator utility that simulates a small amount of load to the cluster so that you can see metrics and traffic in the dashboard.

In this section, you'll use the Anthos Service Mesh page to look at this application's services and traffic.

Observe the Services table view

  1. Go to the Anthos Service Mesh page.

    Go to the Anthos Service Mesh page

  2. The page displays the table view by default, which shows a list of all your project's microservices. To filter to only the Online Boutique services, select onlineboutique from the Namespace drop-down at the top of the page.

Each row in the table is one of the services that makes up the Online Boutique application; for example, the frontend service renders the application's web user interface, and the cartservice service tracks a user's cart of items for purchase.

Each service listing shows up-to-date metrics, such as Error rate and key latencies, for that service. These metrics are collected out-of-the-box for services deployed on Anthos. You do not need to write any application code to see these statistics.

You can drill down from this view to see even more details about each service. For example, to learn more about the shippingservice service:

  1. Click shippingservice in the services list. The service details page shows all the telemetry available for this service.

  2. On the shippingservice page, on the Navigation menu, select Connected Services. Here you can see both the Inbound and Outbound connections for the service. An unlocked lock icon indicates that some traffic has been observed on this port that is not using mutual TLS (mTLS). Different colors are used to indicate whether the traffic has a mix of plaintext and mTLS (orange) or only plaintext (red). We'll look in more detail at how this works in Enforcing mTLS in your service mesh.

Screenshot of Anthos Service Mesh Connected Services view

Observe the Services topology view

The table view isn't the only way to observe your services in Anthos. The topology view lets you focus on how the services interact.

  1. If you haven't done so already, return to the table view from the service details view by clicking the back arrow at the top of the page.

  2. At the top-right of the page, click Topology to switch from the table view to the workload/service graph visualization. As you can see from the legend, the graph shows both the application's Anthos Service Mesh services and the GKE workloads that implement them.

    Screenshot of Anthos Service Mesh topology view

Now you can explore the topology graph. Anthos Service Mesh automatically observes which services are communicating with each other to show service-to-service connections details:

  • Hold your mouse pointer over an item to see additional details, including outbound QPS from each service.

  • Drag nodes with your mouse to improve your view of particular parts of the graph.

  • Click service nodes for more service information.

  • Click Expand when you hold the pointer over a workload node to drill down for even more details, including the number of instances of this workload that are currently running.

Using SLOs to monitor for unexpected behavior

According to Google's Site Reliability Engineering (SRE) book:

It's impossible to manage a service correctly, let alone well, without understanding which behaviors really matter for that service and how to measure and evaluate those behaviors. To this end, we would like to define and deliver a given level of service to our users, whether they use an internal API or a public product.

Google SRE teams use service level indicators (SLIs), service level objectives (SLOs), and service level agreements (SLAs) to structure and guide the metrics that inform their work. An SLI is a quantitative measure of some aspect of how your service is performing, such as its latency or availability, while an SLO is a target value ("this should happen x% of the time") for a service level that is measured by an SLI. Anthos Service Mesh makes it easy to define and refine SLOs for your own services. It gives you the information that you need to identify appropriate SLIs and SLOs, and notifies you when your service isn't meeting its SLOs.

To find out more about SLOs and SLIs in Anthos Service Mesh, see the SLO overview and Designing SLOs.

Identify SLIs

The first step in this journey is gathering the SLIs and defining your SLOs. Anthos Service Mesh makes this a simple, straightforward task. In this example, you first find information that you could use to identify an SLI for the Online Boutique's shoppingcartservice.

  1. Ensure that you're in the default Anthos Service Mesh table view. If you're still in the topology view, click Table view to return to the table view.

    The top part of this view shows the current status of services along with indicators for alerts and SLOs, including the count of services without SLOs; currently all of the services are under No SLOs set. In addition, in the Status column, all of the services have a black circle indicator. If you hold the pointer over that indicator for any service, you're informed that no SLO is set for the service.

  2. Note the value in ms for 99% latency for shoppingcartservice (you may need to scroll down and across to see it). This metric means that one out of every 100 requests experiences this level of delay. You will use this value in the next section.

Create an SLO

Now create an SLO against a latency SLI for the service. To see what happens when a service exceeds its error budget, set a threshold that's deliberately low, based on the information that you saw in the previous section. For a real production service you'd try to find a threshold latency value no lower than is necessary for your users to have a good experience from your application.

  1. In the Anthos Service Mesh Table view, click shoppingcartservice to go to the service overview page.

  2. Under Service status, click Create an SLO.

  3. In the SLI Type list, select Latency.

  4. Set Latency Threshold to an arbitrarily low value, such as 50 ms (something significantly lower than the 99% latency value you observed earlier).

  5. In SLO Goal, set the Compliance target to 90%. Anthos Service Mesh uses this value to calculate the error budget that you have for this SLO; that is, the maximum percentage of requests that should exceed your specified latency threshold.

  6. In Compliance Period, set Period Type to Rolling, and Period Length to 1 Day. The panel How your SLO would have performed appears. The Name your SLO section suggests a default name for your new SLO.

  7. To create the SLO and go to the Health page for the shoppingcartservice, click Submit.

Click the drop-down arrow to see more details about your SLO. You should see that the SLO is Out of Error Budget based on your settings. You can also edit or delete the SLO from this view.

Screenshot of Anthos Service Mesh service health view

Recheck SLO and alert indicators

  1. On the service overview page, click the back arrow to return to the table view. Now you can see that the service count for No SLOs set has been reduced by one and that SLOs out of error budget is no longer 0.

  2. If you scroll down to shoppingcartservice, notice that the adjacent indicator has changed to an orange warning triangle. If you hold the pointer over that indicator, you're told to investigate service reliability. Clicking the indicator brings you back to the service's Health page to review your SLO details. The same indicator also appears for your service in the topology view.

Screenshot of Anthos Service Mesh service list with SLO warning

Setting up your Cloud Shell environment

For the rest of this tutorial, you will use the Cloud Shell command line and editor to make changes to the cluster configuration.

To initialize the shell environment for the tutorial, the Anthos Sample Deployment provides a script that does the following:

  • Installs any missing command-line tools for interactively working with and verifying changes to the deployment:

  • Sets the Kubernetes context for anthos-sample-cluster1

  • Clones the repository that Anthos Config Management uses for synchronizing your configuration changes to your cluster. Changes that you commit and push to the upstream repository are synchronized to your infrastructure by Anthos Config Management. This is the recommended best practice for applying changes to your infrastructure.

To set up your environment:

  1. Ensure that you have an active Cloud Shell session. If you closed Cloud Shell after running the prerequisites script, you can relaunch Cloud Shell by clicking Activate Cloud Shell Activate Shell Button from the Cloud Console in your tutorial project.

  2. Create a directory to work in:

    mkdir tutorial
    cd tutorial
    
  3. Download the initialization script:

    curl -sLO https://github.com/GoogleCloudPlatform/anthos-sample-deployment/releases/latest/download/init-anthos-sample-deployment.env
    
  4. Source the initialization script into your Cloud Shell environment:

    source init-anthos-sample-deployment.env
    

    Output:

    /google/google-cloud-sdk/bin/gcloud
    /google/google-cloud-sdk/bin/kubectl
    Your active configuration is: [cloudshell-13605]
    export PROJECT as anthos-launch-demo-1
    export KUBECONFIG as ~/.kube/anthos-launch-demo-1.anthos-trial-gcp.config
    Fetching cluster endpoint and auth data.
    kubeconfig entry generated for anthos-sample-cluster1.
    Copying gs://config-management-release/released/latest/linux_amd64/nomos...
    \ [1 files][ 40.9 MiB/ 40.9 MiB]
    Operation completed over 1 objects/40.9 MiB.
    Installed nomos into ~/bin.
    Cloned ACM config repo: ./anthos-sample-deployment-config-repo
    
  5. Change the directory to the configuration repository and use it as the working directory for the remainder of this tutorial:

    cd anthos-sample-deployment-config-repo
    

Enforcing mTLS in your service mesh

In anticipation of global expansion, your CIO has mandated that all user data must be encrypted in transit to safeguard sensitive information to be in compliance with regional data privacy and encryption laws.

Earlier, when you looked at the connected services for the shippingservice service, you saw an unlocked lock icon, indicating that traffic had been observed on a request port that was not using mutual TLS (mTLS). mTLS is a security protocol that ensures that traffic is secure and trusted in both directions between two services. Each service accepts only encrypted traffic from authenticated services.

With Anthos, you're only a few steps away from being in compliance. Rather than make changes at the source code level and rebuild and redeploy your application to address this situation, you can apply the new encryption policy declaratively through configuration by using Anthos Config Management to automatically deploy your new configuration from a central Git repository.

In this section, you'll do the following:

  1. Adjust the policy configuration in your Git repository to enforce that services use encrypted communications through mTLS.

  2. Rely on Anthos Config Management to automatically pick up the policy change from the repository and adjust the Anthos Service Mesh policy.

  3. Verify that the policy change occurred on your cluster that is configured to sync with the repository.

Confirm Anthos Config Management setup

  1. The nomos command is a command-line tool that lets you interact with the Config Management Operator and perform other useful Anthos Config Management tasks from your local machine or Cloud Shell. To verify that Anthos Config Management is properly installed and configured on your cluster, run nomos status:

    nomos status
    

    Output:

    Connecting to clusters...
    Current   Context                  Sync Status  Last Synced Token   Sync Branch   Resource Status
    -------   -------                  -----------  -----------------   -----------   ---------------
    *         anthos-sample-cluster1   SYNCED       abef0b01            master        Healthy
    

    The output confirms that Anthos Config Management is configured to sync your cluster to the master branch of your configuration repository. The asterisk in the first column indicates that the current context is set to anthos-sample-cluster1. If you don't see this, switch the current context to anthos-sample-cluster1:

    kubectl config use-context anthos-sample-cluster1
    

    Output:

    Switched to context "anthos-sample-cluster1".
    
  2. Ensure that you're on the master branch:

    git checkout master
    

    Output:

    Already on 'master'
    Your branch is up to date with 'origin/master'.
    
  3. Verify your upstream configuration repository:

    git remote -v
    

    Output:

    origin  https://source.developers.google.com/.../anthos-sample-deployment-config-repo (fetch)
    origin  https://source.developers.google.com/.../anthos-sample-deployment-config-repo (push)
    

You are now ready to commit policy changes to your repository. When you push these commits to your upstream repository (origin), Anthos Config Management ensures that these changes are applied to the cluster that you have configured it to manage.

Update a policy to encrypt all service traffic

Configuration for Anthos Service Mesh is specified declaratively by using YAML files. To encrypt all service traffic, you need to modify both the YAML that specifies the types of traffic that services can accept, and the YAML that specifies the type of traffic that services send to particular destinations.

  1. The first YAML file that you need to look at is namespaces/istio-system/peer-authentication.yaml, which is a mesh-level authentication policy that specifies the types of traffic that all services in your mesh accept by default.

    cat namespaces/istio-system/peer-authentication.yaml
    

    Output:

    apiVersion: "security.istio.io/v1beta1"
    kind: "PeerAuthentication"
    metadata:
      name: "default"
      namespace: "istio-system"
    spec:
      mtls:
        mode: PERMISSIVE
    

    As you can see, the PeerAuthentication mTLS mode is PERMISSIVE, which means that services accept both plaintext HTTP and mTLS traffic.

  2. Modify namespaces/istio-system/peer-authentication.yaml to allow only encrypted communication between services by removing PERMISSIVE mode:

    cat <<EOF> namespaces/istio-system/peer-authentication.yaml
    apiVersion: "security.istio.io/v1beta1"
    kind: "PeerAuthentication"
    metadata:
      name: "default"
      namespace: "istio-system"
    spec:
      mtls:
        mode: STRICT
    EOF
    
  3. Next, look at the Destination Rule in namespaces/istio-system/destination-rule.yaml. This specifies rules for sending traffic to the specified destinations, including whether the traffic is encrypted. Notice that TLSmode is DISABLE, meaning that traffic is sent in plaintext to all matching hosts.

    cat namespaces/istio-system/destination-rule.yaml
    

    Output:

    apiVersion: networking.istio.io/v1alpha3
    kind: DestinationRule
    metadata:
      annotations:
        meshsecurityinsights.googleapis.com/generated: "1561996419000000000"
      name: default
      namespace: istio-system
    spec:
      host: '*.local'
      trafficPolicy:
        tls:
          mode: DISABLE
    
  4. Modify namespaces/istio-system/destination-rule.yaml to have Istio set a traffic policy that enables TLS for all matching hosts in the cluster by using TLSmode ISTIO_MUTUAL:

    cat <<EOF> namespaces/istio-system/destination-rule.yaml
    apiVersion: networking.istio.io/v1alpha3
    kind: DestinationRule
    metadata:
      annotations:
        meshsecurityinsights.googleapis.com/generated: "1561996419000000000"
      name: default
      namespace: istio-system
    spec:
      host: '*.local'
      trafficPolicy:
        tls:
          mode: ISTIO_MUTUAL
    EOF
    

Push your changes to the repository

You are almost ready to push your configuration changes; however, we recommend a few checks before you finally commit your updates.

  1. Run nomos vet to ensure that your configuration is valid:

    nomos vet
    

    No output indicates that there were no validation errors.

  2. As soon as you push your changes, Anthos Config Management picks them up and applies them to your system. To avoid unexpected results, we recommend checking that the current live state of your configuration hasn't changed since you made your edits. Use kubectl to check that the destinationrule reflects that mTLS is disabled for the cluster:

    kubectl get destinationrule default -n istio-system -o yaml
    

    Output:

    apiVersion: networking.istio.io/v1alpha3
    kind: DestinationRule
    ...
    spec:
      host: '*.local'
      trafficPolicy:
        tls:
          mode: DISABLE
    
  3. Now commit and push these changes to the upstream repository. The following command uses a helper function called watchmtls that was sourced into your environment by the init script. This helper function runs a combination of nomos status and the kubectl command that you tried earlier. It watches the cluster for changes until you press Ctrl+C to quit. Monitor the display until you see that the changes are applied and synchronized on the cluster.

    git commit -am "enable mtls"
    git push origin master && watchmtls
    

    You can also see the changes reflected on the Anthos Service Mesh pages in Anthos. If you return to the Connected Services page for shippingservice (or any other service), you should see that the red unlocked lock icon has changed. The lock icon appears orange (mixed traffic) rather than green (entirely encrypted traffic) because we're looking by default at the last hour with a mix of mTLS and plaintext. If you check back after an hour, you should see a green lock that shows that you have successfully encrypted all the service traffic.

Using Policy Controller to set up guardrails

Your security team is concerned about potential root attacks that might occur when running pods with privileged containers (containers with root access). While the current configuration does not deploy any privileged containers, you want to guard against as many threat vectors as possible that could compromise performance or, even worse, customer data.

Despite the team's diligence, there is still a risk that you could find yourself vulnerable to root attacks unintentionally from future configuration updates through your continuous delivery process. You decide to set up a security guardrail to protect against this danger.

Apply guardrails

Guardrails are automated administrative controls intended to enforce policies that protect your environment. Anthos Config Management includes support for defining and enforcing custom rules not covered by native Kubernetes objects. The Anthos Config Management Policy Controller checks, audits, and enforces guardrails that you apply that correspond to your organization's unique security, regulatory compliance, and governance requirements.

Use Policy Controller

Anthos Config Management Policy Controller is built on an open source policy engine called Gatekeeper that is used to enforce policies each time a resource in the cluster is created, updated, or deleted. These policies are defined by using constraints from the Policy Controller template library or from other Gatekeeper constraint templates.

The Anthos Sample Deployment on Google Cloud already has Policy Controller installed and also has the Policy Controller template library enabled. You can take advantage of this when implementing your guardrail by using an existing constraint for privileged containers from the library.

Apply a policy constraint for privileged containers

To address your security team's concerns, you apply the K8sPSPPrivilegedContainer constraint. This constraint denies pods from running with privileged containers.

  1. Using the Cloud Shell editor, navigate to anthos-sample-deployment-config-repo/cluster and create a file called constraint.yaml. Copy the contents from this library template to your new file in the editor and save.

    apiVersion: constraints.gatekeeper.sh/v1beta1
    kind: K8sPSPPrivilegedContainer
    metadata:
      name: psp-privileged-container
    spec:
      match:
        kinds:
          - apiGroups: [""]
            kinds: ["Pod"]
        excludedNamespaces: ["kube-system"]
    
  2. In the Cloud Shell terminal, use nomos vet to verify that the updated configuration is valid before you apply it.

    nomos vet
    

    The command returns silently as long as there are no errors.

  3. Commit and push the changes to apply the policy. You can use nomos status with the watch command to confirm that the changes are applied to your cluster. Press Ctrl+C to exit the watch command when finished.

    git add .
    git commit -m "add policy constraint for privileged containers"
    git push && watch nomos status
    

    Output:

    Connecting to clusters...
    Current   Context                  Sync Status  Last Synced Token   Sync Branch   Resource Status
    -------   -------                  -----------  -----------------   -----------   ---------------
    *         anthos-sample-cluster1   SYNCED       f2898e92            master        Healthy
    

Test your policy

After you've applied the policy, you can test it by attempting to run a pod with a privileged container.

  1. Using the Cloud Shell editor, create a new file in the tutorial directory: ~/tutorial/nginx-privileged.yaml. Copy the contents from this spec, and save the file.

      apiVersion: v1
      kind: Pod
      metadata:
        name: nginx-privileged
        labels:
          app: nginx-privileged
      spec:
        containers:
        - name: nginx
          image: nginx
          securityContext:
            privileged: true
    
  2. Using the Cloud Shell terminal, attempt to launch the pod with kubectl apply.

    kubectl apply -f ~/tutorial/nginx-privileged.yaml
    

    Output:

    Error from server ([denied by psp-privileged-container] Privileged container is not allowed: nginx, securityContext: {"privileged": true}): error when creating "~/nginx-privileged.yaml": admission webhook "validation.gatekeeper.sh" denied the request: [denied by psp-privileged-container] Privileged container is not allowed: nginx, security
    Context: {"privileged": true}
    

    The error shows that the Gatekeeper admission controller monitoring your Kubernetes environment enforced your new policy. It prevented the pod's execution due to the presence of a privileged container in the pod's specification.

The concept of version-controlled policies that you can apply to set up guardrails with Anthos Config Management is a powerful one because it standardizes, unifies, and centralizes the governance of your clusters, enforcing your policies through active monitoring of your environment post-deployment.

You can find many other types of policies to use as guardrails for your environment in the Gatekeeper repository.

Exploring the deployment further

While this tutorial has covered many Anthos features, there's still lots more to see in Anthos with our deployment. Feel free to continue to explore the Anthos Sample Deployment on Google Cloud before following the cleanup instructions in the next section.

Cleaning up

After you've finished the Explore Anthos tutorial, you can clean up the resources that you created on Google Cloud so they don't take up quota and you aren't billed for them in the future. The following sections describe how to delete or turn off these resources.

  • Option 1. You can delete the project. This is the recommended approach. However, if you want to keep the project around, you can use Option 2 to delete the deployment.

  • Option 2. (Experimental) If you're working within an existing but empty project, you may prefer to manually revert all the steps from this tutorial, starting with deleting the deployment.

  • Option 3. (Experimental) If you're an expert on Google Cloud or have existing resources in your cluster, you may prefer to manually clean up the resources that you created in this tutorial.

Delete the project (option 1)

  1. In the Cloud Console, go to the Manage resources page.

    Go to the Manage resources page

  2. In the project list, select the project that you want to delete and then click Delete .
  3. In the dialog, type the project ID and then click Shut down to delete the project.

Delete the deployment (option 2)

This approach relies on allowing Deployment Manager to undo what it created. Even if the deployment had errors, you can use this approach to undo it.

  1. In the Cloud Console, on the Navigation menu, click Deployment Manager.

  2. Select your deployment, and then click Delete.

  3. Confirm by clicking Delete again.

  4. Even if the deployment had errors, you can still select and delete it.

  5. If clicking Delete doesn't work, as a last resort you can try Delete but preserve resources. If Deployment Manager is unable to delete any resources, you need to note these resources and attempt to delete them manually later.

  6. Wait for Deployment Manager to finish the deletion.

  7. (Temporary step) On the Navigation menu, click Network services > Load balancing, and then delete the forwarding rules created by the anthos-sample-cluster1 cluster.

  8. (Optional) Go to https://source.cloud.google.com/<project_id>. Delete the repository whose name includes config-repo if there is one.

  9. (Optional) Delete the Service Account that you created during the deployment and all of its IAM roles.

Perform a manual cleanup (option 3)

This approach relies on manually deleting the resources from the Google Cloud Console.

  1. In the Cloud Console, on the Navigation menu, click Kubernetes Engine.

  2. Select your cluster and click Delete, and then click Delete again to confirm.

  3. In the Cloud Console, on the Navigation menu, click Compute Engine.

  4. Select the jump server and click Delete, and then click Delete again to confirm.

  5. Follow Steps 7 and 8 of Option 2.

If you plan to redeploy after the manual cleanup, verify that all requirements are met as described in the Before you begin section.

What's next

Take our survey

When you finish working on this tutorial, please complete our survey. We're interested in hearing about any issues you might have at any point in the tutorial. Thanks for using the survey to submit your feedback.

Thank you!

The Anthos Team