Anthos security blueprint: Protecting API endpoints

This document describes how you protect the API endpoints for your application services that run in Anthos clusters. It includes an overview of the considerations that you need to make when exposing API endpoints, and it describes the Google Cloud controls that you use for this task.

The document is part of a series of security blueprints that provide prescriptive guidance for working with Anthos. For more information about these blueprints, see Anthos Security Blueprints: Frequently asked questions.

Introduction

When you deploy an application on Anthos clusters, you might need to expose the application's API endpoints to customers, to partners, or to other applications that run outside the cluster. It's important to protect these endpoints so that they're available and accessible only to authorized users and by services that you designate.

Before you expose API endpoints, you must understand the flows and operations that you want to permit into the application, within the application, and out of the application. You need to consider the following:

  • Whether other services within the cluster need access to your service.
  • What identities will be accessing your API endpoints.
  • How authentication will be managed.
  • How resources will be consumed.
  • How you can help defend against DDoS attacks.

This blueprint recommends a defense-in-depth approach that starts at the edge of the cluster and provides layers of protection for services within the cluster.

The content in the protecting-api-endpoints directory in the GitHub repository that is associated with this blueprint provides instructions on how to configure the security controls that you need in order to protect your API end points.

Understanding the security controls that you need

This section discusses the controls that you must apply to help protect your API endpoints.

Apigee hybrid

Applying policies to your API endpoints

Apigee hybrid, a full-featured API management platform, provides controls to manage security, rate limiting, quota, and the analytics of API endpoints that are deployed across your clusters. The API runtime planes are deployed on Anthos clusters to process your API traffic, giving you control of runtime capabilities such as message transformation, traffic management, and OAuth.

In this hybrid deployment model, the runtime planes are tethered to the Apigee management plane in Google Cloud. As a result, you can take advantage of the features and scale of the cloud to control and manage your APIs across multiple runtime planes that are deployed to Anthos clusters.

When you expose API proxies on Apigee, you can configure the following built-in policies to help secure the API endpoints and further manage the flow of traffic to your backend systems:

  • OAuth JWT, and JWS policies can be used to build your OAuth 2.0 and OpenID Connect flows. For more information, see the OAuth for Apigee home page.
  • BasicAuthentication and VerifyAPIKey policies can be used for less complex authentication or when client identification alone is sufficient.
  • JSONThreatProtection, XMLThreatProtection, and RegularExpressionProtection policies can help protect against API requests that could overwhelm parsers or that could be attempting content-level application attacks. You configure these policies to set the upper boundaries of known payload structures and to reject potentially malicious data.
  • SpikeArrest policies can smooth the rate of traffic that's sent to your backend endpoints, which helps protect against sharp traffic spikes.
  • Quota policies can impose quotas on your API consumers. This ensures that they stay within your traffic entitlements.

For more information about available policies, see the policy reference overview page.

In addition to proxy-level security configurations, Apigee allows for other platform- and environment-scoped security controls such as the following:

In addition to using the application-layer security policies that are configured in API proxies, you can configure transport-layer security controls to secure connectivity to your backend systems. For example, you can define target servers to establish mutual TLS (mTLS) connections with backend systems. You can use the Apigee APIs to automate these configuration changes, including the rotation of certificates and keys used for mTLS connections. For more information about connectivity options and related patterns, see Apigee Southbound Connectivity Patterns on the Apigee site.

Apigee hybrid uses HTTPS and OAuth to secure the connections between its own components and the Apigee management plane. Components of the runtime plane authenticate against the Apigee APIs that are available on the management plane, using service accounts that are granted permissions following the principle of least privilege.

Anthos Service Mesh

Managing secure communications between services

Anthos Service Mesh helps you monitor and manage an Istio-based service mesh. A service mesh is an infrastructure layer that enables managed, observable, and secure communication across your services.

Anthos Service Mesh helps simplify the management of secure communications across services in the following ways:

  • Managing authentication and encryption of traffic (supported protocols within the cluster using mutual Transport Layer Communication (mTLS)). Anthos Service Mesh manages the provisioning and rotation of mTLS keys and certificates for Anthos workloads without disrupting communications. Regularly rotating mTLS keys is a security best practice that helps reduce exposure in the event of an attack.
  • Allowing you to configure network security policies based on service identity rather than on the IP address of the peer. Anthos Service Mesh is used to configure identity-aware access control (firewall) policies that let you create policies that are independent of the network location of the workload. This simplifies the process of setting up service-to-service communications.
  • Allowing you to configure policies that permit access from certain clients.
  • Managing user authentication by using Identity-Aware Proxy or a custom policy engine. This helps you control access to the applications that you've deployed on Anthos clusters by verifying user identity and the context of the request to determine whether a user should be allowed access.

In addition to managing secure communications between services, Anthos Service Mesh helps reduce noise in access logs by logging only successful accesses once for each configurable time window. Requests that are denied by a security policy or that result in an error are always logged. Access logs and metrics are available in Google Cloud's operations suite.

For more information on Anthos Service Mesh security features, see the Anthos Service Mesh security overview.

Identity Platform

Adding identity and access management functionality to applications

Identity Platform is a customer identity and access management (CIAM) platform that helps you add identity and access management to your applications. It helps protect user accounts and scale on Google Cloud. Identity Platform allows you to authenticate to your apps and services, like multi-tenant SaaS applications, mobile and web applications, APIs, and more.

Google Cloud Armor

Protecting against DDoS attacks and enforcing Layer 7 security policies for your application endpoints

Google Cloud Armor works with the Cloud Load Balancing infrastructure. It provides always-on attack detection and mitigation at the edge, and it provides a defense-in-depth approach to protecting endpoints that are deployed on Google Cloud, in a hybrid deployment, or in a multi-cloud architecture.

You can use a flexible rules language to create rules using any combination of Layer 3 through Layer 7 parameters and geolocation to protect your deployment. In addition, you can use predefined rules to defend against cross-site scripting (XSS) and SQL injection.

Namespaces

Labeling resources that should use the same policies

Namespaces let you provide a scope for related resources within a cluster—for example, Pods, Services, and replication controllers. By using namespaces, you can delegate administration responsibility for the related resources as a unit. Therefore, namespaces are integral to most security patterns.

Namespaces are an important feature for control plane isolation. However, they don't provide node isolation, data plane isolation, or network isolation.

A common approach is to create namespaces for individual applications. For example, you might create the namespace myapp-frontend for the UI component of an application.

Bringing it all together

For protecting API endpoints for your applications that run in Anthos clusters, Anthos Service Mesh and Apigee hybrid serve separate purposes and provide complementary capabilities. Anthos Service Mesh is for service management, whereas Apigee is for API management. Anthos Service Mesh enables service-to-service communication and security within a cluster. A service mesh is business-function independent. In contrast, an API management platform like Apigee lets you define how you want your APIs exposed, managed, and consumed at scale.

You can use the controls described in this document to protect API endpoints for services running in Anthos clusters that are deployed on-premises, on Google Cloud, or across other public clouds.

For services running on-premises

The following diagram illustrates how you use the controls discussed in this blueprint for applications that run on-premises. (In the diagram, ASM refers to Anthos Service Mesh.)

Architecture showing how controls are used to provide an on-premises solution for protecting API endpoints.

When you manage your applications on premises, make your APIs accessible only through Apigee API proxies, without an alternative route to bypass Apigee. You use Apigee hybrid to help secure and control the traffic that is allowed to reach the Anthos Service Mesh of backend services that run in your Anthos cluster.

Note the following about this configuration:

  • This architecture consists of two Anthos clusters on VMware user clusters. One hosts the Apigee hybrid runtime plane, and the other runs your services (application workloads).
  • The load balancer in front of your Apigee hybrid cluster acts as a proxy and forwards client requests to the Apigee hybrid runtime plane. The runtime plane invokes the appropriate API proxy to process the request and to execute other policies that you might have configured.
  • The API proxy uses its target configuration to route requests to the ingress of your application workloads cluster, using the load balancer that's in front of your application workloads cluster. You must use firewall rules to ensure that only the Apigee hybrid runtime is able to reach this ingress and the services that run in the cluster.
  • The Anthos Service Mesh ingress of your application workloads cluster accepts requests from the Apigee hybrid runtime plane and routes them to the appropriate services in the cluster. The Anthos Service Mesh policies that are configured in your application workloads cluster are also applied to requests.

For services running on Google Cloud

The following diagram illustrates the controls you use to protect the API endpoints for applications that are deployed to an Anthos cluster on Google Cloud. (In the diagram, ASM refers to Anthos Service Mesh.)

Architecture showing how controls are used to provide a Google Cloud solution for protecting API endpoints.

To protect the exposed APIs for these services, you use Google Cloud Armor in conjunction with an Apigee hybrid runtime plane that runs in GKE. Google Cloud Armor determines whether incoming API traffic should be allowed at the edge. Requests are blocked if a Google Cloud Armor security policy produces a deny decision.

Note the following about this configuration:

  • This architecture consists of two GKE clusters. One hosts the Apigee hybrid runtime plane, and the other runs your services (application workloads).
  • Requests are first evaluated by Google Cloud Armor. If this results in an allow decision, the request is forwarded from the Google Cloud load balancer to the Apigee hybrid runtime plane in GKE.
  • The Apigee hybrid runtime plane invokes the appropriate API proxy to process the request and to execute other policies that you might have configured.
  • The API proxy uses its target configuration to route requests to the ingress of your application workloads cluster, using an internal Google Cloud load balancer. You must use firewall rules to ensure that only the Apigee hybrid runtime is able to reach this ingress and the services that run in the cluster.
  • The Anthos Service Mesh ingress of your application workloads cluster accepts requests from the Apigee hybrid runtime plane and routes them to the appropriate services in the cluster.
  • The Anthos Service Mesh policies that are configured in your application workloads cluster are also applied to requests.

Steps to apply the controls

The controls discussed earlier apply to both Anthos clusters and Anthos clusters on VMware. To integrate the controls discussed in this guide, map out their scope and the stage at which they need to be configured, as described in the steps that follow.

  1. Create a GKE cluster to deploy Vault by using the guidance in the applicable cluster hardening guide (GKE or Anthos clusters on VMware). When you create your cluster, be sure you follow the hardening guide and use the --enable-network-policy flag; network policies are required. This step lets you implement further traffic restrictions at the Pod level.

  2. Install Anthos Service Mesh in the cluster that's running your application services.

  3. Configure Anthos Service Mesh features to protect your services:

    1. Annotate the namespaces in your cluster where your application services are running to enable auto-injection of the sidecar proxy. Because sidecars are injected when Pods are created, you must restart any Pods that are already running in order for the change to take effect.
    2. Use authorization policies to define which traffic can pass within the service mesh, and use gateways to define which traffic can enter or leave the service mesh. Use network policies to ensure that traffic cannot bypass your egress gateways.
    3. Enable mutual TLS for service-to-service authentication based on identities provided by Anthos Service Mesh.
    4. Annotate the istio-ingressgateway service to configure an internal load balancer. You do this so that the cluster that runs Apigee hybrid can route traffic to the services in this cluster through the internal load balancer.
  4. Create and configure your cluster to run Apigee hybrid using the guidance in the applicable guide (GKE or Anthos clusters on VMware).

  5. When you install your Apigee hybrid runtime plane in an Anthos cluster on Google Cloud, configure Google Cloud Armor security policies. These policies help protect the Apigee hybrid runtime plane that runs on GKE.

  6. Configure Apigee hybrid to provide external authentication, quotas, and overall API policy management. For guidance on how to implement common policies to protect APIs, see the content in the protecting-api-endpoints directory in the GitHub repository that's associated with this blueprint.

  7. Configure firewall rules to ensure that only the Apigee hybrid runtime is able to reach the cluster that runs your application services, using the configured internal load balancer.