Centralized Multi Cluster Ingress with Anthos Service Mesh
Sr. Cloud Architect at Google
We have never seen a proliferation of open source products sparked by a tool like we have seen with Kubernetes. These building blocks are what make Kubernetes the de facto container orchestrator to securely and reliably run services. Since organizations don't run Kubernetes on its own, Kubernetes is more often than not deployed alongside other tools to solve real business challenges that Kuberntes doesn’t natively solve. Google offers a managed collection of these products under the Anthos umbrella.
In this article, we will demonstrate how organizations can leverage Anthos to centralize the management of internet traffic using Multi Cluster Ingress (MCI) and Anthos Service Mesh (ASM).
For large organizations, the separation of duty is a major concern. It's this concern that drives the design of the cloud foundation. Projects are the boundaries to create least privileges strategy for most of the resources Google Cloud Platform provides. Hence, it’s fair to assume that separate teams would have separate projects to run their workloads.
This separation of duty applies also to the management of the networking. Shared VPC is one of the tools teams have to centrally manage networking. This allows multiple projects to share the same VPC. Having the same VPC is also required for some features Anthos provides.
Applications running in Kubernetes sometimes need to be exposed to the internet. Exposing the application can be achieved using a load balancer. The management of the lifecycle and configuration of the load balancer is done through annotations and custom resource definitions (CRD). MCI is a managed controller that manages the lifecycle of Google Cloud Global Load Balancer (GLBC) in order to expose services running on Kubernetes to the internet. Unfortunately, GLBC doesn’t support cross-project backends. Since projects are the foundation to implementing least privilege strategy, one solution is to deploy a GLBC per project. The management overhead that this would create for the network and security teams is substantial. The second solution is to deploy one GLBC using MCI and use ASM to route the traffic cross-projects. This can also enable security teams to achieve the required separation of duty with a seamless separation of responsibilities while minimizing confusion and accidental configuration errors.
Now that we have talked about the problem we are solving and what tools we are going to use, let's deep dive into how we are going to do it.
This architecture shows the suggested solution to support MCI in a multi project setup. Both the fleet project and the service project share the same VPC. The shared VPC host project is not depicted in the picture but it’s a separate project. The fleet project refers to the project where the Fleet API is enabled. All the clusters that are part of the same fleet are also part of the same mesh. One of the fleet GKE clusters hosts the MCI configuration, but the configurations are deployed in both Fleet GKE clusters for high availability. Deploying the same configuration in both clusters makes it both possible and easy to swap the configuration cluster.
The service projects on the other hand host workloads. Development teams would have access to a namespace to deploy services they manage.
How to Implement this Solution
To implement this solution, a deep understanding of how the traffic flows is key. The following steps describe the process to make sure everything is working. Before starting, take the following actions:
- Create a namespace for ingress gateway and other resources.
- In the ingress namespace create the ingress deployment and the ingress gateway. This should only be deployed in the Fleet GKE clusters.
- Deploy the multi cluster service resource and use the selector that would send the traffic to the ingress controller that was deployed. The cluster spec should only link Fleet GKE clusters because this is where the ASM ingress resources are deployed. The multi cluster service resource creates Network Endpoint Groups (NEG).
- To create the GLBC, a multicluster ingress resource should be deployed.
- Create a namespace for the service. The following configuration should be applied to all clusters. Anthos Configuration Management (ACM) can be used to achieve this.
- Create a virtual service that would listen on the gateway deployed for MCI. The gateway attribute is a combination of the namespace and the name of the gateway that should be used. This virtual service should be created in the fleet GKE clusters.
- The second virtual service should be created in the cluster where the workload is running, the difference from the first virtual service is that the gateway used in this case should be mesh because the ingress gateway is deployed in a different project.
- Deploy the workload in the service GKE clusters
- Deploy the service for the deployment in both Fleet GKE clusters and Service clusters, this is required for service discovery
- The last step is to test that the service answers. Curl can be used for this for example: