Anthos security blueprint: Auditing and monitoring for deviation from policy

This document describes how to audit and monitor your Anthos clusters to determine whether there has been deviation from security best practices and policies that you have attached to them. It includes an overview of how and why you audit and monitor policies, and it describes the Google Cloud controls that you use for this task.

The document is part of a series of blueprints that provide prescriptive guidance for working with Anthos.

Introduction

Your cluster has policies that help protect access to your assets. You can enhance security by auditing and monitoring for deviation from these policies. Auditing and monitoring give you insights into the current status of your cluster, but they don't prevent any actions that would circumvent your policies. To help guard against changes, you also should take steps to enforce policies.

Monitoring is similar to auditing, but it has a slightly different purpose. A typical monitoring solution consists of a way to collect metrics, dashboards to view the status of your systems and apps, and a way to send alerts when anomalies are detected. In contrast, auditing is used to validate the status of your systems—typically against a set of policies that you've defined that your systems need to meet.

For auditing and monitoring, you need to consider the following requirements:

  • What enforcement controls you have in place and how to audit or monitor them for deviations from policy.
  • Whether you need a consolidated monitoring solution or a segregated one.

Understanding the security controls you need

This section discusses the controls required in order to do the following:

  • Implement auditing that complements the policy enforcement as described in the enforcing policies guide.

  • Implement a monitoring solution that works with Anthos GKE clusters no matter where they are deployed.

Namespaces

Labeling resources that should use the same policies

Namespaces let you provide a scope for related resources within a cluster—for example, pods, services, and replication controllers. By using namespaces, you can delegate administration responsibility for the related resources as a unit. Therefore, namespaces are integral to most security patterns.

Namespaces are an important feature for control plane isolation. However, they don't provide node isolation, data plane isolation, or network isolation.

A common approach is to create namespaces for individual applications. For example, you might create the namespace myapp-frontend for the UI component of an application.

Anthos Config Management

Applying configurations to your Anthos clusters

A best practice when you manage Anthos clusters is to use Anthos Config Management, which keeps your enrolled clusters in sync with configs. A config is a YAML or JSON file that's stored in your repository and that contains the same types of configuration details that you can manually apply to a cluster by using the kubectl apply command. Anthos Config Management lets you manage your policies and infrastructure deployments like you do your apps—by adopting a policy-as-code approach.

You use Anthos Config Management in conjunction with a Git repository that acts as the single source of truth for your declared policies. Anthos Config Management can manage access-control policies like RBAC, resource quotas, namespaces, and platform-level infrastructure deployments. Anthos Config Management is declarative; it continuously checks cluster state and applies the state declared in the config in order to enforce policies.

Anthos Policy Controller

Enforcing compliance with policies

Anthos Policy Controller is a dynamic admission controller for Kubernetes that enforces CustomResourceDefinition-based (CRD-based) policies that are executed by the Open Policy Agent (OPA).

Admission controllers are Kubernetes plugins that intercept requests to the Kubernetes API server before an object is persisted, but after the request is authenticated and authorized. You can use admission controllers to limit how a cluster is used.

To use Policy Controller, you declare a set of constraints in a constraint template. When the constraint template has been deployed in the cluster, you can create individual constraint CRDs that are defined by the constraint template.

The following diagram shows how Policy Controller uses the OPA Constraint Framework to define and enforce policy.

The OPA Constraint Framework receives requests and enforces policies for access to other resources.

The diagram shows the following:

  1. Constraints are created from constraint templates.
  2. Policies are enabled on the cluster by applying constraints.
  3. A request comes in and an admission review is triggered, resulting in an allow or deny decision.
  4. A continuous audit evaluates all active objects on cluster against policies.

Using Policy Controller, you can enforce custom policies, such as enforcing labels. Policy Controller lets you apply the majority of the constraints that you can apply using PodSecurityPolicies. But they typically require less operational overhead for the following reasons:

  • Policy Controller includes a default template library that includes constraint templates, meaning that you don't need to write your own policies for common cases as you do with PodSecurityPolicies.
  • You don't have to manage RoleBindings as you do when you use PodSecurityPolicies.
  • Policy Controller supports dry run mode so that you can validate the effect of a constraint before you apply it.
  • You can scope policies to namespaces, which gives you the opportunity to perform a slower ramp-up of more restrictive policies. This is similar to a canary release strategy, where you manage the exposure of rolling out policies that might have unanticipated effects. For example, your rollout might uncover that you've restricted access to a volume from a pod, but that the pod should have access to the volume.
  • Policy Controller provides a single way to apply policies whether they're custom constraints or they're PodSecurityPolicies constraints that are defined in the Gatekeeper repository.

For more information about how to use Policy Controller to enforce policies that you define, see Anthos Config Management Policy Controller.

Kubernetes Engine Operations

Monitoring GKE clusters

Kubernetes Engine Operations is designed to monitor GKE clusters. It manages Cloud Monitoring and Cloud Logging services together and features a Kubernetes Engine Operations dashboard that's customized for GKE clusters. Kubernetes Engine Operations has a set of GKE monitored resources that represent resources such as clusters, nodes, pods, and containers. Although you can disable Kubernetes Engine Operations for GKE and for GKE on-prem clusters, we recommend that you keep it enabled for these products.

Security Health Analytics

Identifying vulnerabilities

Security Health Analytics helps you prevent incidents by identifying potential misconfigurations and compliance violations in your Google Cloud resources, and by suggesting appropriate corrective action. Security Health Analytics scanners generate vulnerability finding types that are available in the Security Command Center. The container scanner findings relate to GKE container configurations and belong to the CONTAINER_SCANNER scanner type.

Security Command Center integration with Pub/Sub

You can get alerts on findings from Security Command Center by using the notification app. The app subscribes to a notifications Pub/Sub topic and sends notifications to a configured channel, such as email or SMS.

Cloud Asset Inventory

Monitoring Google Cloud resources

Cloud Asset Inventory allows you to monitor resource and policy changes that you're subscribed to through real-time notifications. You can monitor changes of supported resource types and policy types within an organization, folder, or project, or of other resources that you specify. You set up subscriptions by creating a feed. Supported asset types include GKE resource types and Cloud IAM policy types.

You can monitor security-sensitive resources such as firewall rules and changes to IAM policies. Any change to these resources immediately sends a notification through Pub/Sub, allowing you to take quick action if needed.

Real-time notifications connect to your existing workloads. With this functionality, you can merge actions, like creating a Cloud Function to reverse a resource change after the change is detected.

Alerting using Cloud Audit Logs

GKE clusters integrate Kubernetes Audit Logging with Cloud Audit Logs and Cloud Logging. You can use Kubernetes Engine Operations to set up metrics based on your log entries. You can then use logs-based metrics to set up an alerting policy.

Policies for alerting specify notification channels, which let you specify how you want to be informed that an alerting policy has been triggered. You can set up a notification handler by using Cloud Run or Cloud Functions to carry out an action in response—for example, to revert the change or to notify you by email.

Bringing it all together

To integrate the controls, determine your auditing and monitoring needs. Then map out the scope of the controls discussed in this guide and the stage at which they need to be configured, as described in the steps that follow.

  1. Before you start configuring your clusters, see Hybrid and multi-cloud monitoring and logging patterns to help you determine the level of isolation that you require.

  2. Create your clusters. Follow the guidance in the applicable cluster hardening guide (GKE or GKE on-prem). When you create your cluster, be sure you follow the hardening guide and use the --enable-network-policy flag; network policies are required. This step lets you implement firewall rules later that restrict the traffic that flows between pods in a cluster.

  3. Define the namespaces and labels that are required for the pods. This provides a name scope that allows you to work with policies and Kubernetes service accounts.

  4. Install Policy Controller using Anthos Config Management.

    Follow the guidance described in the enforcing policies guide.

  5. Configure Kubernetes Engine Operations to meet your requirements:

  6. Configure Kubernetes Engine Operations alert policies, notifications, and handlers.

  7. Configure real-time notifications from Cloud Asset Inventory.

  8. Set up a process to regularly review the container findings that are generated from the Security Health Analytics scans.