Anthos security blueprint: Restricting traffic

This document describes how to restrict traffic on Anthos clusters. It includes an overview of how and why you need to restrict traffic in Anthos clusters, and it describes the Google Cloud controls that you use for this task. The document is part of a series of security blueprints that provide prescriptive guidance for working with Anthos.

This blueprint is relevant for security and network administrators who work with Anthos. For more information about these blueprints, see Anthos Security Blueprints: Frequently asked questions.

Introduction

Applications in your Anthos clusters need to exchange data with each other, with workloads in the cloud, or with on-premises environments. They can do this through private connectivity options or through the internet.

Allowing traffic only to and from specific applications helps you make sure that you follow the principle of least privilege. Deploying traffic restrictions is part of a defense-in-depth strategy. For certain workloads and industries, showing that traffic is restricted and that firewalls or similar mechanisms are deployed is also required for compliance purposes. In addition, restricting traffic to and from applications can also be part of your data-loss prevention strategy.

In an Anthos environment, there are multiple ways to restrict traffic to and from applications. You can deploy these restrictions in parallel or by themselves.

To apply a defense-in-depth approach to enforcing traffic restrictions, you need to consider the following:

  • What cluster-level restrictions are required at the node level that can be enforced by filtering traffic on the virtual machine (VM) level. You implement these restrictions by using firewalls.
  • What traffic should be restricted between workloads or namespaces in your cluster. You implement this restriction by using network policies.
  • If you're using Anthos Service Mesh to control traffic within the service mesh, what authorization policies need to be applied for traffic in and out of the service mesh. You implement this restriction by deploying gateways.

The content in the restricting-traffic directory in the GitHub repository that's associated with this blueprint provides guidance for how to implement the following common approaches by using network policies:

  • Denying by default all but necessary system traffic.
  • Restricting access to the internet.
  • Restricting access within the kube-system namespace to just the traffic that's required by the Pods in the namespace.

The following diagram shows the different layers at which all security controls that are discussed in this guide apply.

Relationship of security controls that help restrict traffic between elements of an on-premises and cloud-based system.

You can restrict traffic between Pods in GKE clusters by using network policies. On a cluster infrastructure level, you can restrict traffic using firewalls; if you're using Anthos on Google Cloud, you use Virtual Private Cloud (VPC) firewall rules. Anthos Config Management and Anthos Policy Controller provide additional control of the cluster configuration across your different Anthos clusters regardless of the environment.

You can control traffic in Anthos Service Mesh within or between clusters using authorization policies. If you're using Anthos on Google Cloud, there are additional controls available through organizational policies or hierarchical firewall policies.

Understanding the security controls you need

VPC firewall rules

Restricting traffic between virtual machines

Virtual Private Cloud (VPC) firewall rules govern which traffic is allowed to or from Compute Engine VMs. The rules let you filter traffic at VM granularity, depending on Layer 4 attributes.

VPC firewall rules are specific to Anthos on Google Cloud. If you're using Anthos on-premises or in another cloud environment, use the firewall functionality for that environment instead.

When you're using Anthos on Google Cloud, you apply rules to a target group of VMs that are selected by using network tags, by using service accounts, or by specifying all instances in the network. In these rules, you choose the protocols and ports that are allowed or denied, and which source (for ingress rules) or destination (for egress rules) these rules apply to. The source or destination can be selected through network tags, through service accounts, or by IP address ranges.

Firewall rules are ordered, which means that they are evaluated in the order you specify until a rule applies. If no rule applies, by default, ingress traffic to a VM is denied and egress traffic from a VM is allowed.

VPC firewall rules are also stateful. This means that if specific traffic (a request) to or from a VM is allowed, return traffic (a response) using the same connection is allowed as well.

Using network tags to create rules is flexible. However, we recommend that you use service accounts for firewall rules. Using service accounts provides stricter control over how firewall rules are applied because there is more granular access control over who can apply service accounts to VMs.

Hierarchical firewall policies

Restricting traffic for the whole organization or for individual teams

Hierarchical firewall policies let you define VPC firewall rules at the Google Cloud organizational level and at the folder level of the resource hierarchy. Setting policies at these levels lets you control which traffic is always allowed or always denied within an organization, for a team, or for a group of applications. In addition to letting you allow or deny actions, hierarchical firewall policy rules let you delegate decisions about specific traffic patterns to lower-level folders of the resource hierarchy or to VPC firewall rules that are defined at the project level.

Hierarchical firewall policies are specific to Anthos on Google Cloud. If you're using Anthos on-premises or in another cloud environment, use the firewall functionality for that environment instead.

When you're using Anthos on Google Cloud, typical use cases for hierarchical firewall policies are to set organization-wide firewall policies to do the following:

  • Disallow traffic for the whole organization from specific IP address ranges.
  • Disallow traffic to the internet that does not pass through organization-controlled secure proxies or gateways.
  • Explicitly allow traffic from probes made by the security team to discover application-level vulnerabilities.

If you organize workloads by folders and set security policy on those folders, hierarchical firewall policies let you set folder-wide traffic restrictions depending on the sensitivity of the workloads.

Namespaces

Labeling resources that should use the same policies

Namespaces let you provide a scope for related resources within a cluster—for example, Pods, Services, and replication controllers. By using namespaces, you can delegate administration responsibility for the related resources as a unit. Therefore, namespaces are integral to most security patterns.

Namespaces are an important feature for control plane isolation. However, they don't provide node isolation, data plane isolation, or network isolation.

A common approach is to create namespaces for individual applications. For example, you might create the namespace myapp-frontend for the UI component of an application.

Network policies

Enforcing network traffic flow within clusters

Network policies enforce Layer 4 network traffic flows by using Pod-level firewall rules. Network policies are scoped to a namespace.

By default, even if a network policy is enabled for that namespace, access to Pods in a cluster is unrestricted. When at least one NetworkPolicy object selects a Pod, the enforcement is applied.

A best practice is to adopt a least-privilege approach. When you implement network policies, we recommend that you create a default deny-all rule in the namespace to match all Pods; this makes the namespace block access (that is, it acts as a Fail Closed system). To allow network traffic flows, you then have to ensure that you explicitly set up network policies for each namespace.

The following diagram shows that by configuring network policies for each namespace, you can implement policies that manage the traffic flow between applications.

Using network policies to manage traffic flow between namespaces.

In the example, traffic is permitted to flow in both directions between the application with theapp:transactions label and the application with the app:shopfront label. However, traffic is permitted to flow only from the application with the app:shopfront label to the logging application; traffic is not permitted from the logging application to the shopfront application.

Network policies are stateful. This means that if traffic (a request) in a specific direction is allowed, return traffic (a response) for the same connection is allowed automatically as well.

For examples of network policies that show typical deployment approaches for Anthos Config Management, see the restricting traffic directory in the security blueprint repository on GitHub.

Anthos Config Management

Applying configurations to your Anthos clusters

A best practice when you manage Anthos clusters is to use Anthos Config Management, which keeps your enrolled clusters in sync with configs. A config is a YAML or JSON file that's stored in your repository and that contains the same types of configuration details that you can manually apply to a cluster by using the kubectl apply command. Anthos Config Management lets you manage your policies and infrastructure deployments like you do your apps—by adopting a declarative approach.

You use Anthos Config Management in conjunction with a Git repository that acts as the single source of truth for your declared policies. Anthos Config Management can manage access-control policies like RBAC, resource quotas, namespaces, and platform-level infrastructure deployments. Anthos Config Management is declarative; it continuously checks cluster state and applies the state declared in the config in order to enforce policies.

Anthos Policy Controller

Enforcing compliance with policies

Anthos Policy Controller is a dynamic admission controller for Kubernetes that enforces CustomResourceDefinition-based (CRD-based) policies that are executed by the Open Policy Agent (OPA).

Admission controllers are Kubernetes plugins that intercept requests to the Kubernetes API server before an object is persisted, but after the request is authenticated and authorized. You can use admission controllers to limit how a cluster is used.

To use Policy Controller, you declare a set of constraints in a constraint template. When the constraint template has been deployed in the cluster, you can create individual constraint CRDs that are defined by the constraint template.

The following diagram shows how Policy Controller uses the OPA Constraint Framework to define and enforce policy.

The OPA Constraint Framework receives requests and enforces policies for access to other resources.

The diagram shows the following:

  1. Constraints are created from constraint templates.
  2. Policies are enabled on the cluster by applying constraints.
  3. A request comes in and an admission review is triggered, resulting in an allow or deny decision.
  4. A continuous audit evaluates all active objects on cluster against policies.

Using Policy Controller, you can enforce custom policies, such as enforcing labels. Policy Controller lets you apply the majority of the constraints that you can apply using PodSecurityPolicies. But they typically require less operational overhead for the following reasons:

  • Policy Controller includes a default template library that includes constraint templates, meaning that you don't need to write your own policies for common cases as you do with PodSecurityPolicies.
  • You don't have to manage RoleBindings as you do when you use PodSecurityPolicies.
  • Policy Controller supports dry run mode so that you can validate the effect of a constraint before you apply it.
  • You can scope policies to namespaces, which gives you the opportunity to perform a slower ramp-up of more restrictive policies. This is similar to a canary release strategy, where you manage the exposure of rolling out policies that might have unanticipated effects. For example, your rollout might uncover that you've restricted access to a volume from a Pod, but that the Pod should have access to the volume.
  • Policy Controller provides a single way to apply policies whether they're custom constraints or they're PodSecurityPolicies constraints that are defined in the Gatekeeper repository.

For more information about how to use Policy Controller to enforce policies that you define, see Anthos Config Management Policy Controller.

Anthos Service Mesh

Managing secure communications between services

Anthos Service Mesh helps you monitor and manage an Istio-based service mesh. A service mesh is an infrastructure layer that enables managed, observable, and secure communication across your services.

Anthos Service Mesh helps simplify the management of secure communications across services in the following ways:

  • Managing authentication and encryption of traffic (supported protocols within the cluster using mutual Transport Layer Communication (mTLS)). Anthos Service Mesh manages the provisioning and rotation of mTLS keys and certificates for Anthos workloads without disrupting communications. Regularly rotating mTLS keys is a security best practice that helps reduce exposure in the event of an attack.
  • Allowing you to configure network security policies based on service identity rather than on the IP address of the peer. Anthos Service Mesh is used to configure identity-aware access control (firewall) policies that let you create policies that are independent of the network location of the workload. This simplifies the process of setting up service-to-service communications.
  • Allowing you to configure policies that permit access from certain clients.
  • Managing user authentication by using Identity-Aware Proxy or a custom policy engine. This helps you control access to the applications that you've deployed on Anthos GKE clusters by verifying user identity and the context of the request to determine whether a user should be allowed access.

In addition to managing secure communications between services, Anthos Service Mesh helps reduce noise in access logs by logging only successful accesses once for each configurable time window. Requests that are denied by a security policy or that result in an error are always logged. Access logs and metrics are available in Google Cloud's operations suite.

For more information on Anthos Service Mesh security features, see the Anthos Service Mesh security overview.

Bringing it all together

VPC firewall rules and hierarchical firewall policies apply only to workloads that run in Google Cloud. The other controls discussed earlier apply to both Anthos GKE and GKE on-prem.

To integrate the controls discussed in this guide, map out their scope and the stage at which they need to be configured, as described in the steps that follow.

  1. Define hierarchical firewall policies at an organizational level and for each folder that's defined in your resource hierarchy to allow or deny traffic as needed for your workloads.
  2. For traffic restrictions that apply to whole clusters, define VPC firewall rules.
  3. Create your clusters using the guidance in the applicable cluster-hardening guide (GKE or GKE on-prem). When you create your cluster, be sure you follow the hardening guide and use the --enable-network-policy flag. Network policies are required, and this step lets you implement further traffic restrictions at a Pod level.
  4. Define the namespaces and labels that are required for the Pods. This provides a name scope that lets you work with policies and with Kubernetes service accounts.
  5. Install Policy Controller using Anthos Config Management.
  6. Apply your network policies by using Anthos Config Management. For information about typical approaches for restricting traffic by using network policies, see the restricting-traffic directory in the GitHub repository that's associated with this blueprint.
  7. If you're using Anthos Service Mesh, use authorization policies to define which traffic can pass within the service mesh, and use gateways to define which traffic can enter or leave the service mesh. Use network policies to ensure that traffic cannot bypass your egress gateways.