Traffic Director for multi-environment deployments

Traffic Director supports environments that extend beyond Google Cloud, including on-premises data centers and other public clouds that are reachable using hybrid connectivity. You configure Traffic Director so that your service mesh can send traffic to endpoints that are outside of Google Cloud. These endpoints can be on-premises load balancers, server applications on a virtual machine in another cloud, or any other destination that is reachable using hybrid connectivity and can be represented by an IP address and a port. You just add each endpoint's IP address and port to a hybrid connectivity network endpoint group (NEG).

Traffic Director's support for on-premises and multi-cloud services allows you to:

  • Route traffic globally, including to the endpoints of on-premises and multi-cloud services
  • Bring the benefits of Traffic Director and service mesh — including capabilities such as service discovery and advanced traffic management — to services running on your existing infrastructure outside of Google Cloud
  • Combine Traffic Director capabilities with Cloud Load Balancing to bring Google Cloud networking services to multi-environments.

Use cases

Traffic Director can configure networking between VM- and container-based services in multiple environments, including:

  • Google Cloud
  • On-premises data centers
  • Other public clouds

Route mesh traffic to an on-premises location or another cloud

The simplest use case for this feature is traffic routing. Your application is running a Traffic Director client (either Envoy proxy or proxyless gRPC). Traffic Director tells the client about your services and each service's endpoints.

Routing mesh traffic to an on-premises location or another cloud (click to enlarge)
Routing mesh traffic to an on-premises location or another cloud (click to enlarge)

In the above diagram, when your application sends a request to the on-prem service, the Traffic Director client inspects the outbound request and updates its destination. The destination gets set to an endpoint associated with the on-prem service (in this case, 10.2.0.1). The request then travels over Cloud VPN or Cloud Interconnect to its intended destination.

If you need to add more endpoints, you just add them to your service by updating Traffic Director. You don't need to make any changes to your application code.

Migrate an existing on-premises service to Google Cloud

Sending traffic to a non-Google Cloud endpoint lets you route traffic to other environments. You can combine this capability with advanced traffic management to migrate services between environments (as well as other use cases).

Migrating from an on-premises location to Google Cloud (click to enlarge)
Migrating from an on-premises location to Google Cloud (click to enlarge)

The preceding diagram extends the previous pattern. Instead of configuring Traffic Director to send all traffic to the on-prem service, you configure Traffic Director to split traffic across two services using weight-based traffic splitting.

Traffic splitting allows you to start by sending 0% of traffic to the cloud service and 100% to the on-prem service. You can then gradually increase the proportion of traffic sent to the cloud service. Eventually, you send 100% of traffic to the cloud service and you can retire the on-prem service.

Google Cloud network edge services for on-premises and multicloud deployments

Finally, you can combine this functionality with Google Cloud's existing networking solutions. Google Cloud offers a wide range of network services, such as global external load balancing with Google Cloud Armor for DDoS protection, that you can now use in conjunction with Traffic Director to bring new capabilities to your on-premises or multi-cloud services. Best of all, you don't need to expose these on-premises or multi-cloud services to the public internet.

Deployments spanning multiple environments (click to enlarge)
Deployments spanning multiple environments (click to enlarge)

In the preceding diagram, traffic from clients on the public internet enters Google Cloud's network from a Google Cloud load balancer, such as our global external HTTP(S) load balancer. You can apply network edge services, for example, Google Cloud Armor DDoS protection or Identity-Aware Proxy user authentication when traffic reaches the load balancer. For more information, see Network edge services for multi-environment deployments.

After you've applied these services, the traffic makes a brief stop in Google Cloud, where an application or standalone proxy (configured by Traffic Director) forwards the traffic across Cloud VPN or Cloud Interconnect to your on-premises service.

Architecture and resources

This section provides background information on the Google Cloud resources used to provide a Traffic Director-managed service mesh for on-premises and multi-cloud environments.

Google Cloud resources

The following diagram depicts the Google Cloud resources that enable on- premises and multi-cloud services support for Traffic Director. Note that the key resource is the NEG (and its network endpoints). The other resources are the resources that you would configure as part of standard Traffic Director setup.

Compute Engine resources for on-premises and multi-cloud services (click to enlarge)
Compute Engine resources for on-premises and multi-cloud services (click to enlarge)

For simplicity, options such as multiple global backend services are not shown in the diagram.

When you configure Traffic Director, you create services using the global backend services API resource. A service is just a logical construct that combines:

  1. Policies to apply when a client tries to send traffic to the service
  2. One or more backends or endpoints that handle the traffic that is destined for the service

On-premises and multi-cloud services are just like any other service configured with Traffic Director. The key difference is that you configure the endpoints of these services using a hybrid connectivity NEG. These are NEGs that have the network endpoint type set to non-gcp-private-ip-port. The endpoints that you add to hybrid connectivity NEGs must be valid IP:port combinations that are reachable by your clients (for example, through hybrid connectivity such as Cloud VPN or Cloud Interconnect).

The NEG has a network endpoint type and each NEG can only contain network endpoints of the same type. This type determines:

  • The destination to which your services can send traffic
  • Health checking behavior

When you create your NEG, configure it as follows so that you can send traffic to an on-premises or multi-cloud destination:

  • Set the network endpoint type to non-gcp-private-ip-port. This represents a reachable IP address. If this IP address is on-premises or at another cloud provider, it must be reachable from Google Cloud using hybrid connectivity, such as the connectivity provided by Cloud VPN or Cloud Interconnect.
  • Specify a Google Cloud zone that minimizes the geographic distance between Google Cloud and your on-premises or multi-cloud environment. For example, if you are hosting a service in an on-premises environment in Frankfurt, Germany, you might specify the europe-west3-a Google Cloud zone when you create the NEG.

Health checking behavior for network endpoints of this type differs from health checking behavior for other types of network endpoints. While other network endpoint types use Google Cloud's centralized health checking system, non-gcp-private-ip-port network endpoints use Envoy's distributed health checking mechanism. See Limitations and other considerations for more details.

Connectivity and networking considerations

  • Your Traffic Director clients, such as Envoy proxies and proxyless gRPC libraries, must be able to connect to Traffic Director at trafficdirector.googleapis.com:443. If you lose connectivity to the Traffic Director control plane:
    • Existing Traffic Director clients cannot receive configuration updates from Traffic Director. They continue to operate based on their current configuration.
    • New Traffic Director clients cannot connect to Traffic Director. They cannot use the service mesh until connectivity is re-established.
  • If you want to send traffic between Google Cloud and on-premises or multi-cloud environments, the environments must be connected through hybrid connectivity. We recommend a high availability connection enabled by Cloud Interconnect or Cloud VPN.
  • On-premises, other cloud, and Google Cloud subnet IP addresses and IP address ranges must not overlap.

Limitations and other considerations

Setting proxyBind

You can only set the value of proxyBind when you create a targetHttpProxy. You can't update an existing targetHttpProxy.

Connectivity and disruption to connectivity

For details on connectivity requirements and limitations, see Connectivity and networking considerations.

Mixed backend types

A backend service can have VM or NEG backends. If a backend service has NEG backends, all NEGs must contain the same network endpoint type. You cannot have a backend service with multiple NEGs, each with different endpoint types.

Note that a URL map can have host rules that resolve to different backend services. You might have a backend service with only hybrid connectivity NEGs (with on-premises endpoints) as well as a backend service with standalone NEGs (with GKE endpoints). The URL map can contain rules, for example, weight-based traffic splitting, that split traffic across each of these backend services.

Using a NEG with endpoints of type non-gcp-private-ip-port with Google Cloud backends

It is possible to create a backend service with a hybrid connectivity NEG that points to backends in Google Cloud. But we do not recommend this pattern, because hybrid connectivity NEGs don't benefit from centralized health checking. For an explanation of centralized health checking and distributed health checking, see Health checking.

Endpoint registration

If you want to add an endpoint to a NEG, you must update the NEG. This can either be done manually, or it can be automated using the Google Cloud network endpoint group REST APIs or the gcloud command-line tool. When a new instance of a service starts, you can use Google Cloud APIs to register the instance with the NEG that you've configured. Note that when using Compute Engine MIGs or GKE (in Google Cloud), endpoint registration is handled automatically by the MIG or NEG controller, respectively.

Health checking

Health checking behavior differs from the standard centralized health checking behavior when you use hybrid connectivity NEGs:

  • For network endpoints of type gce-vm-ip-port, Traffic Director receives endpoint health information from Google Cloud's centralized health checking system. Traffic Director provides this information to your Traffic Director clients, eliminating the need for potentially costly data plane-based health checking.
  • For network endpoints of type non-gcp-private-ip-port, Traffic Director configures its clients to handle health checking using the data plane. Envoy instances perform their own health checks and use their own mechanisms to avoid sending requests to unhealthy backends.
  • Because your data plane handles health checks, you cannot retrieve the health check status using the Google Cloud Console, API or gcloud.

In practice, using non-gcp-private-ip-port means:

  • Only HTTP and TCP health checks are supported.
  • Because Traffic Director clients each handle health checking in a distributed fashion, you may see an increase in network traffic because of health checking. The increase depends on the number of Traffic Director clients as well as the number of endpoints that each client needs to health check. For example:
    • When you add another endpoint to a hybrid connectivity NEG, existing Traffic Director clients might begin to health check the endpoints in hybrid connectivity NEGs.
    • When you add another instance to your service mesh (for example, a virtual machine instance that runs your application code as well as a Traffic Director client), the new instance might begin to health check the endpoints in hybrid connectivity NEGs.
    • Network traffic because of health checks increases at a quadratic (O(n^2)) rate.

Virtual Private Cloud network

A service mesh is uniquely identified by its VPC network name. Traffic Director clients receive configuration from Traffic Director based on the VPC network specified in the bootstrap configuration. Consequently, even if your mesh is entirely outside of a Google Cloud data center, you must supply a valid VPC network name in your bootstrap configuration.

Service account

Within Google Cloud, the default Envoy bootstrap is configured to read service account information from either or both of the Compute Engine and GKE deployment environments. When running outside of Google Cloud, you must explicitly specify a service account, network name, and project number in your Envoy bootstrap. This service account must have sufficient permissions to connect with the Traffic Director API.

What's next

For instructions about configuring Traffic Director for on-premises and multi-cloud deployments, see Network edge services for multi-environment (on-prem, multi-cloud) deployments.