Traffic Director new service routing APIs overview

This document is intended for mesh or platform administrators and service developers who have an intermediate to advanced level of familiarity with Traffic Director and service mesh concepts. This document applies to deployments using Envoy and gRPC clients. For more information on Traffic Director concepts, see the general overview and the proxyless gRPC services overview.

Traffic Director provides service networking capabilities to your applications, including advanced traffic management, observability, and security. However, configuring and operating a service mesh is a complex task for mesh administrators and service developers.

This preview introduces a new set of Traffic Director service routing APIs to configure Traffic Director. The new APIs are designed to simplify and improve your overall mesh configuration experience.

The new API model replaces the existing forwarding rule, target proxy, and URL map resources with three new API resources called Mesh, Gateway, and Route. These new resources provide a more contextually-relevant configuration experience when you define your service networking control plane.

This document introduces the following service routing API model and resources.

  • Mesh
    • Service-to-service (east-west) traffic management and security configuration for Envoy sidecar proxies and proxyless gRPC clients.
  • Gateway

    • Traffic management and security configuration for Envoy proxies acting as ingress gateways, allowing external clients to connect to the service mesh (north-south).
  • Route APIs with following types

    • HTTPRoute
    • GRPCRoute
    • TCPRoute
    • TLSRoute

The console does not provide support for the new APIs. You must implement the new resources using the Google Cloud CLI or the REST APIs. Additionally, there is no automated migration path from the older APIs to the new APIs. To replace an existing deployment, you must create a new Traffic Director deployment with the service routing APIs, and then shut down the old deployment.

Use cases and benefits

The new service routing APIs let you configure Traffic Director for both proxyless gRPC and Envoy proxy deployments. The new API model enables several key benefits.

In the following diagram, two services in the service mesh are connected by a Mesh resource. The two HTTPRoute resources configure routing. The mesh or platform admin manages the Mesh resource and the two service owners create the routing configuration for their services.

East-west service-to-service traffic in a service mesh
East-west service-to-service traffic in a service mesh (click to enlarge)

Role-oriented API design enables clear separation of responsibilities

The new service routing APIs let you separate mesh configuration responsibilities based on organizational roles:

  • Mesh administrators can define the logical mesh as well as the ingress gateway infrastructure.
  • Service owners (application developers) can independently define access patterns for their services. They can also define and apply traffic management policies for their services.

In the following diagram, Cloud Load Balancing and a Gateway resource provide an ingress gateway for traffic entering the mesh from a client that is not in the mesh. The mesh administrator configures and manages the Gateway resource, while the service owners configure and manage their own services and traffic routing.

North-south traffic into the mesh through a gateway
North-south traffic into the mesh through a gateway (click to enlarge)

Enhanced reliability with self-serve model

In the older Traffic Director API, the URL map defines routing for service-to-service communication in the mesh, as well as for external traffic entering the mesh though a managed load balancer. Multiple teams might be editing a single URL map resource, which presents potential reliability issues and complicates the process of delegating per-service configuration to service owners.

The new APIs introduce per-protocol, per-route resources that can be configured and owned by independent service owners. The new approach has several advantages.

  • Service owners now have autonomy over how they want to configure policies and traffic management for the services they own.
  • Updating one Route resource does not affect other Route resources in the mesh. The update process is also less error-prone because service owners manage much smaller configurations.
  • The service owner who is responsible for the destination service or hostname owns each Route resource.
  • Service owners do not have to depend on mesh administrators to update routing using the centralized URL map resource.

Configure only what's relevant

The new APIs replace forwarding rules, target proxies, and URL maps. You no longer need to allocate virtual IP addresses from your Virtual Private Cloud (VPC) network for service-to-service communication with sidecar proxies and proxyless gRPC.

Enable a service mesh spanning multiple projects in Shared VPC environments

The new API model lets service owners to participate in a shared mesh infrastructure using Shared VPC and other means of connectivity while maintaining independent control over their services. For example, service owners can define the Route resources in their own projects. Platform administrators can define a Mesh in a centrally administered host project, then grant service owners IAM permissions to attach their Route resources to a shared Mesh or Gateway. The following diagram shows an example with Shared VPC.

Cross-project referencing with Shared VPC
Cross-project referencing with Shared VPC (click to enlarge)

The new APIs also let you have service mesh clients connected to different networks using VPC Network Peering.

Route traffic based on the server name indicator

The TLSRoute resource lets you route TLS-encrypted traffic based on the Server Name Indication (SNI) in the TLS handshake. You can configure TLS traffic to be routed to the appropriate backend services by configuring the SNI match in the TLSRoute resource. In these deployments, proxies only route traffic and the TLS session is terminated at the destination backend instance.

The TLSRoute resource is supported only with Envoy proxies that are deployed as sidecar proxies or gateways.

TLSRoute resource attached to a Mesh resource

The deployment shown in the following diagram routes any service mesh traffic where the SNI extension has the value service1 to the backend service service1. Additionally, any service mesh traffic where the SNI extension has the value service2 is routed to the backend service service2. The SNI extension value and the backend service name are independent of each other.

TLSRoute resource and Mesh resource
TLSRoute resource and Mesh resource (click to enlarge)

TLSRoute resource attached to a Gateway resource

The deployment shown in the following diagram routes any inbound traffic to the Gateway resource where the SNI extension has the value serviceA to the backend service service serviceA. Additionally, any inbound traffic to the Gateway where the SNI extension has the value serviceB is routed to the backend service serviceB. The SNI extension value and the backend service name are independent of each other. The SNI extension value and the header in HTTP requests are also independent.

The Gateway resource does not terminate the TLS connection at the Gateway's Envoy proxy. Instead, the TLS connection is terminated at the corresponding backend service. The Gateway cannot inspect any information encrypted in the TLS layer, other than seeing the ClientHello, which contains a plain text SNI extension. The Gateway performs TLS passthrough in this mode. Note that encrypted ClientHello is unsupported.

TLSRoute resource and Gateway resource
TLSRoute resource and Gateway resource (click to enlarge)

First class gRPC support

You can configure proxyless gRPC clients by using first-class gRPC attributes such as matching by method, instead of translating to equivalent paths and using path matchers.

Traffic splitting for TCP traffic

You can now implement weight-based traffic splitting for TCP traffic across multiple backend services. You can configure patterns such as canary (blue- green) rollouts when you update your service. Traffic splitting also lets you migrate traffic in a controlled manner without downtime.

Traffic interception

When you use the new service routing API Mesh and Gateway resources, all traffic is automatically intercepted. For more information, see Options for Compute Engine VM setup with automatic Envoy deployment.

Architecture and resources

This section describes the new API model and its resources, and helps you to understand how the new API resources work together.

Mesh resource

The Mesh resource represents an instance of a service mesh. You use it to create a logical service mesh in your project. Each Mesh resource must have a unique name in the project. After a Mesh resource is created, its name cannot be modified.

Mesh API resource with Envoy sidecar and proxyless gRPC deployments
MeshAPI resource with Envoy sidecar and proxyless gRPC deployments (click to enlarge)

The Mesh resource is referenced in the Route resource to add routes for services that are part of the mesh.

Envoy proxy and proxyless gRPC clients receive configuration from Traffic Director by joining the service mesh identified by the Mesh resource's name. The Mesh name, as a bootstrap parameter, is supported by automated Envoy deployment on Compute Engine and by the automatic envoy injector on GKE.

The Mesh resource supports the following data plane deployments:

  • Envoy running alongside the application as sidecar proxies
  • Proxyless gRPC clients
  • Mix of Envoy sidecar and proxyless gRPC clients

Route resource

The Route resource is used to set up routing to the services. There are four different types of the Route API resource. They define the protocol used to route traffic to a backend service.

  • HTTPRoute
  • GRPCRoute
  • TCPRoute
  • TLSRoute

The API doesn't contain a Route API verbatim. The only configurable API resources are HTTPRoute, GRPCRoute, TCPRoute, and TLSRoute.

The Route resource references one or more Mesh and Gateway resources to add the routes that are part of the corresponding Mesh or Gateway configuration. A Route resource can reference both Gateway and Mesh resources.

The Route resource also references one or more backend service resources. The services are configured using the backend service API with the existing configuration flow. The new APIs do not change how backend services and health checks are defined in the Traffic Director configuration. You simply create a backend service resource that points to one or more MIG or NEG backends.

The following diagram shows the relationships among the new Mesh, Gateway, and Route resources and the backend service resource and its backends.

Route API resources
Route API resources (click to enlarge)

You define other traffic management capabilities, such as routing, header modifications, timeouts, and weight-based traffic splitting in Route resources. For example, in the following diagram, an HTTPRoute resource defines a 70% / 30% traffic split between two backend services.

Weight-based traffic splitting
Weight-based traffic splitting (click to enlarge)

TLSRoute resource

Use the TLSRoute resource to route TLS traffic to backend services based on SNI hostnames and Application-Layer Protocol Negotiation (ALPN) name. TLSRoute configuration implies TLS passthrough, in which the Envoy proxy does not terminate TLS traffic.

The TLSRoute resource references one or more Mesh and Gateway resources to add the routes that are part of the corresponding Mesh or Gateway configuration.

The TLSRoute resource also references one or more backend service resources. The services are configured using the backend service API resource using the existing configuration flow and APIs.

Gateway resource

The Gateway resource is used to represent Envoy proxies acting as ingress gateways, allowing external clients to connect to the service mesh (north-south traffic). This resource has listening ports along with a scope parameter. The Envoy proxy that acts as an ingress gateway binds to the ports specified and to 0.0.0.0, which represents all of the IP addresses on the local VM. The following diagram shows Envoy proxies deployed as an ingress service and configured by the Gateway resource. In this particular example, Envoy proxies are configured to listen on port 80 for incoming connections from clients.

The Gateway API resource only supports the Envoy proxy data plane. It does not support proxyless gRPC. gRPCRoutes are supported in the Gateway resource, but the gRPC traffic is routed by the Envoy proxy, acting as a middle-proxy.

Service mesh ingress through a `Gateway` resource
Service mesh ingress through a Gateway resource (click to enlarge)
Gateway resource
Gateway resource (click to enlarge)

What are a Gateway scope and merged Gateway configuration?

A Gateway resource instance represents the ports and configuration specific to traffic received on those ports. The Gateway API resource has a parameter, scope, that is used to logically group and merge the configuration of two or more Gateway resources.

For example, If you want the Gateway proxies to listen on ports 80 and 443 to receive HTTP and HTTPS traffic respectively, you create two Gateway resources. Configure one Gateway resource with port 80, for HTTP traffic, and the other with 443, for HTTPS traffic. Give the scope field in each the same value. Traffic Director dynamically merges the individual configurations of all Gateways that have the same scope. On the data plane side, the Envoy proxies that run in the ingress gateway mode must also present the same scope parameter to Traffic Director to receive the Gateway configuration. Note that you specify the scope when you create the Gateway resource, and you specify the same scope as the bootstrap parameter for the proxies.

Gateway resource merge behavior
Gateway resource merge behavior (click to enlarge)

The following are key considerations for the Gateway resource:

  • The Gateway scope parameter is mandatory. Specify the scope in the Gateway resource and in the bootstrap configuration of the Envoy proxies even when only one Gateway exists.
  • Creating a Gateway resource does not deploy a service with an Envoy proxy. Deploying the Envoy proxy is a separate step.
  • The Gateway resource has a type that represents the type of ingress deployment. This field is reserved for future use. The only currently-supported value is OPEN_MESH, which is the default value and which cannot be modified.

Mesh deployments with mixed protocols and data planes

You can have a mixed data plane deployment, with Envoy proxy and proxyless gRPC in the same mesh. When you create such deployments, consider the following.

  • Envoy sidecar deployments support all Routes (HTTPRoute, GRPCRoute, TCPRoute, and TLSRoute).
  • Proxyless gRPC deployments only support GRPCRoute.
  • GRPCRoute is limited to features supported only by gRPC proxyless deployments.

Supported topologies in multi-project Shared VPC environments

Traffic Director supports adding Route resources that are defined in other projects to a Mesh or Gateway resource defined in a centralized admin project. Authorized service owners can directly add their service routing configurations to the Mesh or Gateway.

Cross-project referencing between Mesh and Route resources
Cross-project referencing between Mesh and Route resources (click to enlarge)

In a typical cross-project scenario, you choose a project (host project or centrally controlled admin project) as the mesh admin project where you create a Mesh resource. The mesh admin project owner authorizes Route resources from other projects to reference the Mesh resource, allowing the routing configuration from other projects to be part of the mesh. A mesh data plane, whether Envoy or gRPC, requests configuration from the admin project and receives a union of all of the routes attached to the Mesh. For a Gateway, the routes are also merged across all Gateways that use the same scope.

The Mesh admin project can be any project that you choose, and the configuration works as long as the underlying projects have VPC network connectivity, either through Shared VPC or VPC Network Peering.

IAM permissions and roles

The following are the IAM permissions that are required to securely get, create, update, delete, list, and use the Mesh and Route resources.

  • Mesh admins need to have networkservices.mesh.* permissions.
  • Gateway admins need to have networkservices.gateways.* permissions.
  • Service owners need to have networkservices.grpcRoutes.*, networkservices.httpRoutes.*, or networkservices.tcpRoutes.* permissions.

Mesh admins need to grant the networkservices.mesh.use permission to service owners so that the service owners can attach their Route resources to the Mesh resource. The same model applies to Gateway resources.

To see all IAM permissions for Mesh resources, go to the IAM permissions reference page and search for meshes.

There are no new predefined roles required. The existing, predefined role Compute Network Admin (roles/compute.networkAdmin) has networkservices.* permissions by default. You might need to add the previously described permissions to your custom roles.

Comparison of the new and older API models

This section makes a topic-by-topic comparison between the existing and new Traffic Director API models .

Existing APIs New APIs
API resources Forwarding rule, target proxy, URL map and backend service. Gateway, Mesh, Route, and backend service.
IP addresses and port numbers of services You must provision IP addresses and port numbers for your services and configure forwarding rules, which need to match the IP:Port pairs for all use cases.

You must manually map the IP addresses to hostnames, or you must use the catch-all IP address 0.0.0.0.
You do not need to configure IP addresses for Mesh or Gateway use cases. Gateway does require configuring port numbers.
Service mesh scope Traffic Director programs all proxies attached to the VPC network, so the mesh scope is VPC network. Traffic Director does not program proxies based on the VPC network.

For east-west service-to-service communication, Envoy and proxyless gRPC clients use the name of the Mesh resource.

For north-south ingress gateway use cases, the scope parameter in the Gateway API that allows multiple Gateways to be grouped together with merged configuration.
Cross-project referencing in Shared VPC environments Cross-project referencing is not supported. All API resources must be configured in the same project. It is possible to create Mesh or Gateway resources in a centrally-managed project (host project), and service owners can create the Route resources in service projects in Shared VPC environment. The Route resources can refer to the Mesh or Gateway located across projects.
Interception port TRAFFICDIRECTOR_INTERCEPTION_PORT bootstrap parameter must be specified in every Envoy connecting to the Traffic Director.

With automatic Envoy deployment on Compute Engine API and with automatic sidecar injection on GKE, this value defaults to 15001.
The interception port is configured in the Mesh resource and automatically applies to all Envoys that request configuration for that Mesh.

The value continues to default to 15001 if unspecified.

Bootstrapping Envoy and gRPC clients on Compute Engine and GKE

Existing APIs New APIs
Using automatic Envoy deployment on Compute Engine When you create the VM template, you specify a command-line parameter, --service-proxy=enabled, that dynamically bootstraps the Envoy proxy with the required attributes. When you create the VM template, you specify additional parameters. For example, --service-proxy=enabled, mesh=[MESH_NAME] (for Meshes) or --service-proxy=enabled, scope=[SCOPE_NAME] (for Gateways). Other required attributes are dynamically bootstrapped. For Envoys serving as Gateway, make sure that serving_ports is not specified to the --service-proxy command-line argument. For more information, see Options for Compute Engine VM setup with automatic Envoy deployment
Using automatic sidecar injection on GKE You specify the required bootstrap attributes in the configMap of the sidecar injector. Same workflow with the new attributes specified in the configMap.
Using manual sidecar injection on GKE As explained here, the application pod needs to have an Envoy sidecar container bootstrapped with the required attributes. Same workflow with the new attributes.
Using Compute Engine or GKE to deploy gRPC clients The client application must be bootstrapped with the required attributes. Same workflow with the new attributes.

Configuring mesh and gateway security use cases

Existing APIs New APIs
Service-to-service mTLS in GKE Follow the instructions here for Envoy sidecar based deployments.

Follow the instructions here for proxyless gRPC-based deployments.
The same instructions apply.

Client TLS policy and server TLS policy must be applied to the backend service and endpoint policy resources, respectively. Because both of these APIs are orthogonal to the new APIs, the configuration flow remains the same as before.
Securing middle-proxy (ingress or egress gateway) deployments Follow the instructions here.

The server TLS policy and authorization policy resources are attached to the target HTTPS proxy resource.
You attach the server TLS policy and the authorization policy resources to the Gateway.

Considerations and limitations

  • The Google Cloud console does not support the new APIs in this release.
  • Use the xDS API version 3 or later.
    • Minimum Envoy version of 1.20.0 (since the new APIs are supported only on xDS version 3)
    • Minimum gRPC bootstrap generator version of v0.14.0
  • The TLSRoute resource is supported only with Envoy proxies that are deployed as sidecar proxies or gateways.
  • Only Compute Engine VMs with automatic Envoy deployment and GKE Pods with automatic Envoy injection are supported. You cannot use manual deployment with the new APIs.
  • Terraform is not supported in this release.
  • The new APIs are not backward compatible with the existing APIs.
  • When a TCPRoute resource is attached to a Mesh resource, the port used to match TCP traffic cannot be used to serve anything except the traffic described by this TCPRoute.
    • For example, your deployments might include a TCPRoute resource that matches port "8000" and an HttpRoute resource. When both are attached to the same Mesh resource, traffic routed by the HTTPRoute resource cannot use port 8000 even when the underlying IP addresses are different. This limitation comes from Envoy proxy implementation, which assigns precedence to the port-matched route first.
  • Control plane telemetry for the new resources is not supported.
  • The Gateway resource does not provision a managed load balancer and it does not dynamically create an Envoy service.
  • Automatically deployed Envoys serving as ingress gateways must not have the serving_ports argument to the --service-proxy flag.
  • Automatically deployed Envoy does not support providing a project number different from the project of the VM.

What's next