This reference architecture shows how to combine Anthos Service Mesh with Cloud Load Balancing to expose applications in a service mesh to internet clients.
Anthos Service Mesh is a managed service mesh, based on Istio, that provides a security-enhanced, observable, and standardized communication layer for applications. A service mesh provides a holistic communications platform for clients that are communicating in the mesh. However, a challenge remains in how to connect clients that are outside the mesh to applications hosted inside the mesh.
You can expose an application to clients in many ways, depending on where the client is. This reference architecture is intended for advanced practitioners who run Anthos Service Mesh but it works for Istio on Google Kubernetes Engine (GKE) too.
Mesh ingress gateway
Istio 0.8 introduced the mesh ingress gateway. The gateway provides a dedicated set of proxies whose ports are exposed to traffic coming from outside the service mesh. These mesh ingress proxies let you control network exposure behavior separately from application routing behavior.
The proxies also let you apply routing and policy to mesh-external traffic before it arrives at an application sidecar. Mesh ingress defines the treatment of traffic when it reaches a node in the mesh, but external components must define how traffic first arrives at the mesh.
To manage this external traffic, you need a load balancer that is external to the mesh. This reference architecture uses Google Cloud Load Balancing provisioned through GKE Gateway resources to automate deployment.
For Google Cloud, the canonical example of this setup is an external load balancing service that deploys a public network load balancer (L4). That load balancer points at the NodePorts of a GKE cluster. These NodePorts expose the Istio ingress gateway Pods, which route traffic to downstream mesh sidecar proxies.
The following diagram illustrates this topology. Load balancing for internal private traffic looks similar to this architecture, except that you deploy an internal passthrough Network Load Balancer instead.
The preceding diagram shows that using L4 transparent load balancing with a mesh ingress gateway offers the following advantages:
- The setup simplifies deploying the load balancer.
- The load balancer provides a stable virtual IP address (VIP), health checking, and reliable traffic distribution when cluster changes, node outages, or process outages occur.
- All routing rules, TLS termination, and traffic policy is handled in a single location at the mesh ingress gateway.
GKE Gateway and Services
You can provide access to applications for clients that are outside the cluster in many ways. GKE Gateway is an implementation of the Kubernetes Gateway API. GKE Gateway evolves the Ingress resource and improves it.
As you deploy GKE Gateway resources to your GKE cluster, the Gateway controller watches the Gateway API resources and reconciles Cloud Load Balancing resources to implement the networking behavior that's specified by the GKE Gateway resources.
When using GKE Gateway, the type of load balancer you use to expose applications to clients depends largely on the following factors:
- The status of the clients (external or internal).
- The required capabilities of the load balancer, including the capability to integrate with Google Cloud Armor security policies.
- The spanning requirements of the service mesh. Service meshes can span multiple GKE clusters or can be contained in a single cluster.
In GKE Gateway, this behavior is controlled by specifying the appropriate GatewayClass.
Although the default load balancer for Anthos Service Mesh is the Network Load Balancer, this reference architecture focuses on the external Application Load Balancer (L7). The external Application Load Balancer provides integration with edge services like Identity-Aware Proxy and Google Cloud Armor, URL redirects and rewrites, as well as a globally distributed network of edge proxies. The next section describes the architecture and advantages of using two layers of HTTP load balancing.
Cloud ingress and mesh ingress
Deploying external L7 load balancing outside of the mesh along with a mesh ingress layer offers significant advantages, especially for internet traffic. Even though Anthos Service Mesh and Istio ingress gateways provide advanced routing and traffic management in the mesh, some functions are better served at the edge of the network. Taking advantage of internet-edge networking through Google Cloud's external Application Load Balancer might provide significant performance, reliability, or security-related benefits over mesh-based ingress. These benefits include the following:
- Global Anycast VIP advertisement and globally distributed TLS and HTTP termination
- DDoS defense and traffic filtering at the edge with Google Cloud Armor
- API gateway functionality with IAP
- Automatic public certificate creation and rotation with Google Certificate Manager
- Multi-cluster and multi-regional load balancing at the edge with Multi Cluster Gateway
This external layer of L7 load balancing is referred to as cloud ingress because it is built on cloud-managed load balancers rather than the self-hosted proxies that are used by mesh ingress. The combination of cloud ingress and mesh ingress uses complementary capabilities of the Google Cloud infrastructure and the mesh. The following diagram illustrates how you can combine cloud ingress (through GKE gateway) and mesh ingress to serve as two load balancing layers for internet traffic.
In the topology of the preceding diagram, the cloud ingress layer sources traffic from outside of the service mesh and directs that traffic to the mesh ingress layer. The mesh ingress layer then directs traffic to the mesh-hosted application backends.
Cloud and mesh ingress topology
This section describes the complementary roles that each ingress layer fulfills when you use them together. These roles aren't concrete rules, but rather guidelines that use the advantages of each layer. Variations of this pattern are likely, depending on your use case.
- Cloud ingress: When paired with mesh ingress, the cloud ingress layer is best used for edge security and global load balancing. Because the cloud ingress layer is integrated with DDoS protection, cloud firewalls, authentication, and encryption products at the edge, this layer excels at running these services outside of the mesh. The routing logic is typically straightforward at this layer, but the logic can be more complex for multi-cluster and multi-region environments. Because of the critical function of internet-facing load balancers, the cloud ingress layer is likely managed by an infrastructure team that has exclusive control over how applications are exposed and secured on the internet. This control also makes this layer less flexible and dynamic than a developer-driven infrastructure, a consideration that could impact who and how you provide administrative access to this layer.
- Mesh ingress: When paired with cloud ingress, the mesh ingress layer provides flexible routing that is close to the application. Because of this flexibility, the mesh ingress is better than cloud ingress for complex routing logic and application-level visibility. The separation between ingress layers also makes it easier for application owners to directly control this layer without affecting other teams. To help secure applications When you expose service mesh applications through an L4 load balancer instead of an L7 load balancer, you should terminate client TLS at the mesh ingress layer inside the mesh.
One complexity of using two layers of L7 load balancing is health checking. You must configure each load balancer to check the health of the next layer to ensure that it can receive traffic. The topology in the following diagram shows how cloud ingress checks the health of the mesh ingress proxies, and the mesh, in return, checks the health of the application backends.
The preceding topology has the following considerations:
- Cloud ingress: In this reference architecture, you configure the Google Cloud load balancer through the Gateway to check the health of the mesh ingress proxies on their exposed health check ports. If a mesh proxy is down, or if the cluster, mesh, or region is unavailable, the Google Cloud load balancer detects this condition and doesn't send traffic to the mesh proxy.
- Mesh ingress: In the mesh application, you perform health checks on the backends directly so that you can execute load balancing and traffic management locally.
The preceding topology also involves several security elements. One of the most
critical elements is in how you configure encryption and deploy certificates.
You can refer to a Certificate Manager
CertificateMap in your Gateway definition.
Internet clients authenticate against the public certificates and connect to the
external load balancer as the first hop in the Virtual Private Cloud (VPC).
The next hop, which is between the Google Front End (GFE) and the mesh ingress proxy, is encrypted by default. Network-level encryption between the GFEs and their backends is applied automatically. However, if your security requirements dictate that the platform owner retain ownership of the encryption keys, then you can enable HTTP/2 with TLS encryption between the cluster gateway (the GFE) and the mesh ingress (the envoy proxy instance). When you enable HTTP/2 with TLS encryption for this path, you can use a self-signed or public certificate to encrypt traffic because the GFE doesn't authenticate against it. This additional layer of encryption is demonstrated in the associated deployment guide. To help prevent the mishandling of certificates, don't use the public certificate for the public load balancer elsewhere. Instead, we recommend that you use separate certificates in the service mesh.
If the service mesh mandates TLS, then all traffic is encrypted between sidecar proxies and to the mesh ingress. The following diagram illustrates HTTPS encryption from the client to the Google Cloud load balancer, from the load balancer to the mesh ingress proxy, and from the ingress proxy to the sidecar proxy.
In this document, you use the following billable components of Google Cloud:
- Google Kubernetes Engine
- Compute Engine
- Cloud Load Balancing
- Anthos Service Mesh
- Google Cloud Armor
- Cloud Endpoints
To deploy this architecture, see From edge to mesh: Deploy service mesh applications through GKE Gateway.
- Learn about the additional features offered by GKE Gateway that you can use with your service mesh.
- Learn about the different types of cloud load balancing available for GKE.
- Learn about the features and functionality offered by Anthos Service Mesh.
- For more reference architectures, diagrams, and best practices, explore the Cloud Architecture Center.