Set up external backends with internet network endpoint groups
This document provides instructions for configuring external backends for Cloud Service Mesh by using internet network endpoint groups (NEGs), which require a fully qualified domain name. This document is intended for users who have an intermediate to advanced level of familiarity with the following:
This setup guide provides you with basic instructions for the following:
- Configuring Cloud Service Mesh to use an internet NEG and unauthenticated TLS for outbound traffic
- Routing traffic to a Cloud Run service from your service mesh
Before you begin
Review the Cloud Service Mesh with internet network endpoint groups overview.
For the purposes of this guide, the example configurations assume the following:
- All relevant Compute Engine resources, such as middle proxies, Cloud Service Mesh resources, Cloud DNS zones, and hybrid connectivity, are attached to the default Virtual Private Cloud (VPC) network.
- The service
example.com:443
is running in your on-premises infrastructure. The domainexample.com
is served by three endpoints,10.0.0.100
,10.0.0.101
, and10.0.0.102
. Routes exist that ensure connectivity from the Envoy proxies to these remote endpoints.
The resulting deployment is similar to the following.
Traffic routing with an internet NEG and TLS with SNI
After you configure Cloud Service Mesh with an internet NEG by using the FQDN and TLS for outbound traffic, the example deployment behaves as illustrated in the following diagram and description of the traffic.
The steps in the following legend correspond to the numbering in the previous diagram.
Step | Description |
---|---|
0 | Envoy receives the FQDN backend configuration from Cloud Service Mesh through xDS. |
0 | Envoy, running in the VM, continuously queries DNS for the configured FQDN. |
1 | User application initiates a request. |
2 | Parameters of the request. |
3 | The Envoy proxy intercepts the request. The example assumes that
you are using 0.0.0.0 as the forwarding rule virtual IP
address (VIP). When 0.0.0.0 is the VIP, Envoy intercepts all
requests. Request routing is based only on Layer 7 parameters
regardless of the destination IP address in the original request
generated by the application. |
4 | Envoy selects a healthy remote endpoint and performs a TLS handshake with the SNI obtained from the client TLS policy. |
5 | Envoy proxies the request to the remote endpoint. |
It's not shown in the diagram, but if health checks are configured, Envoy continuously health checks the remote endpoints and routes requests only to healthy endpoints.
Set up hybrid connectivity
This document also assumes that hybrid connectivity is already established:
- Hybrid connectivity between the VPC network and on-premises services or a third-party public cloud is established with Cloud VPN or Cloud Interconnect.
- VPC firewall rules and routes are correctly configured to establish bi-directional reachability from Envoy to private service endpoints and, optionally, to an on-premises DNS server.
- For a successful regional HA failover scenario, global dynamic routing is enabled. For more details, see dynamic routing mode.
Set up Cloud DNS configuration
Use the following commands to set up a Cloud DNS private zone for the
domain (FQDN) example.com
that has A
records pointing to endpoints
10.0.0.100
, 10.0.0.101
, 10.0.0.102
, and 10.0.0.103
.
gcloud
- Create a DNS managed private zone and attach it to the default network:
gcloud dns managed-zones create example-zone \ --description=test \ --dns-name=example.com \ --networks=default \ --visibility=private
- Add DNS records to the private zone:
gcloud dns record-sets transaction start \ --zone=example-zone gcloud dns record-sets transaction add 10.0.0.100 10.0.0.101 10.0.0.102 10.0.0.103 \ --name=example.com \ --ttl=300 \ --type=A \ --zone=example-zone gcloud dns record-sets transaction execute \ --zone=example-zone
Configure Cloud Service Mesh with an internet FQDN NEG
In this section, you configure Cloud Service Mesh with an internet FQDN NEG.
Create the NEG, health check, and backend service
gcloud
- Create the internet NEG:
gcloud compute network-endpoint-groups create on-prem-service-a-neg \ --global \ --network-endpoint-type INTERNET_FQDN_PORT
- Add the
FQDN:Port
endpoint to the internet NEG:
gcloud compute network-endpoint-groups update on-prem-service-a-neg \ --global \ --add-endpoint=fqdn=example.com,port=443
- Create a global health check:
gcloud compute health-checks create http service-a-http-health-check \ --global
- Create a global backend service called
on-prem-service-a
and associate the health check with it:
gcloud compute backend-services create on-prem-service-a \ --global \ --load-balancing-scheme=INTERNAL_SELF_MANAGED \ --health-checks service-a-http-health-check
- Add the NEG called
on-prem-service-a-neg
as the backend of the backend service:
gcloud compute backend-services add-backend on-prem-service-a \ --global \ --global-network-endpoint-group \ --network-endpoint-group on-prem-service-a-neg
Create a routing rule map
The URL map, target HTTP proxy, and forwarding rule constitute a routing rule map, which provides routing information for traffic in your mesh.
This URL map contains a rule that routes all HTTP traffic to
on-prem-service-a
.
gcloud
- Create the URL map:
gcloud compute url-maps create td-url-map \ --default-service on-prem-service-a
- Create the target HTTP proxy and associate the URL map with the target proxy:
gcloud compute target-http-proxies create td-proxy \ --url-map td-url-map
- Create the global forwarding rule by using the IP address
0.0.0.0
. This is a special IP address that causes your data plane to ignore the destination IP address and route requests based only on the request's HTTP parameters.
gcloud compute forwarding-rules create td-forwarding-rule \ --global \ --load-balancing-scheme=INTERNAL_SELF_MANAGED \ --address=0.0.0.0 \ --target-http-proxy=td-proxy \ --ports=443 \ --network=default
Configure unauthenticated TLS and HTTPS
Optionally, if you want to configure unauthenticated HTTPS between your Envoy proxies and your on-premises services, use these instructions. These instructions also demonstrate how to configure SNI in the TLS handshake.
A client TLS policy specifies the client identity and authentication mechanism
when a client sends outbound requests to a particular service. A client TLS
policy is attached to a backend service resource by using the securitySettings
field.
gcloud
- Create and import the client TLS policy; set the SNI to the FQDN that you configured in the NEG:
cat << EOF > client_unauthenticated_tls_policy.yaml name: "client_unauthenticated_tls_policy" sni: "example.com" EOF gcloud beta network-security client-tls-policies import client_unauthenticated_tls_policy \ --source=client_unauthenticated_tls_policy.yaml \ --location=global
- If you configured an
HTTP
health check with the backend service in the previous section, detach the health check from the backend service:
gcloud compute backend-services update on-prem-service-a \ --global \ --no-health-checks
- Create an
HTTPS
health check:
gcloud compute health-checks create https service-a-https-health-check \ --global
- Attach the client TLS policy to the backend service that you created previously; this enforces unauthenticated HTTPS on all outbound requests from the client to this backend service:
gcloud compute backend-services export on-prem-service-a \ --global \ --destination=on-prem-service-a.yaml cat << EOF >> on-prem-service-a.yaml securitySettings: clientTlsPolicy: projects/${PROJECT_ID}/locations/global/clientTlsPolicies/client_unauthenticated_tls_policy healthChecks: - projects/${PROJECT_ID}/global/healthChecks/service-a-https-health-check EOF gcloud compute backend-services import on-prem-service-a \ --global \ --source=on-prem-service-a.yaml
You can use internet FQDN NEGs to route traffic to any service that is reachable through FQDN—for example, DNS resolvable external and internal services or Cloud Run services.
Migrate from an IP:Port
NEG to an FQDN:Port
NEG
NON_GCP_PRIVATE_IP_PORT
NEG requires you to program service endpoints into the
NEG as static IP:PORT
pairs, whereas INTERNET_FQDN_NEG
lets the endpoints be
resolved dynamically by using DNS. You can migrate to the internet NEG by
setting up DNS records for your on-premises service endpoints and configuring
Cloud Service Mesh as described in the following steps:
- Set up DNS records for your FQDN.
- Create a new internet NEG with the FQDN.
- Create a new backend service with the internet NEG that you created in step 2 as its backend. Associate the same health check that you used with the hybrid connectivity NEG backend service with the new backend service. Verify that the new remote endpoints are healthy.
- Update your routing rule map to reference the new backend service by replacing the old backend that includes the hybrid connectivity NEG.
- If you want zero downtime during live migration in a production deployment,
you can use weight-based traffic. Initially, configure your new backend
service to receive only a small percentage of traffic, for example, 5%. Use
the instructions for setting up traffic
splitting.
- Verify that the new remote endpoints are serving traffic correctly.
- If you are using weight-based traffic splitting, configure the new backend service to receive 100% of traffic. This step drains the old backend service.
- After you verify that the new backends are serving traffic without any issues, delete the old backend service.
Troubleshooting
To resolve deployment issues, use the instructions in this section. If your issues are not resolved with this information, see Troubleshooting deployments that use Envoy.
An on-premises endpoint is not receiving traffic
If an endpoint is not receiving traffic, make sure that it is passing health checks, and that DNS queries from the Envoy client return its IP address consistently.
Envoy uses strict_dns
mode to manage connections. It load balances traffic
across all resolved endpoints that are healthy. The order in which endpoints are
resolved does not matter in strict_dns
mode, but Envoy drains traffic to any
endpoint that is no longer present in the list of returned IP addresses.
HTTP host header does not match with FQDN when the request reaches my on-premises server
Consider an example in which the domain example.com
resolves to 10.0.0.1
,
which is the forwarding rule's IP address, and the domain altostrat.com
resolves to 10.0.0.100
, which is your on-premises service endpoint. You want
to send traffic to domain altostrat.com
, which is configured in your NEG.
It's possible that the application in Compute Engine or
GKE sets the HTTP Host
header to example.com
(Host:
example.com
), which gets carried forward to the on-premises endpoint. If you
are using HTTPS, Envoy sets the SNI to altostrat.com
during the TLS handshake.
Envoy obtains the SNI from the client TLS policy resource.
If this conflict is causing issues in processing or routing the request when it
reaches the on-premises endpoint, as a workaround, you can rewrite the Host
header to altostrat.com
(Host: altostrat.com
). This can be done either in
Cloud Service Mesh by using URL rewrite or on the remote endpoint if it
has header rewrite capability.
Another less complex workaround is to set the Host
header to altostrat.com
(Host: altostrat.com
) and use special address 0.0.0.0
as the forwarding
rule's IP address.
Envoy returns many 5xx errors
If Envy returns an excessive number of 5xx errors, do the following:
- Check the Envoy logs to distinguish whether the response is coming from the backend (on-premises backend) and what the reason is for the 5xx error.
- Make sure that DNS queries are successful, and there are no
SERVFAIL
orNXDOMAIN
errors. - Make sure that all the remote endpoints are passing health checks.
- If health checks are not configured, make sure that all endpoints are reachable from Envoy. Check your firewall rules and routes on the Google Cloud side as well as on the on-premises side.
Cannot reach external services over the public internet from the service mesh
You can send traffic to services located on the public internet by using FQDN
backends in Cloud Service Mesh. You must first establish internet
connectivity between Envoy clients and the external service. If you are getting
a 502
error during connections to the external service, do the following:
- Make sure that you have the correct routes, specifically the default route
0.0.0.0/0
, and firewall rules configured. - Make sure that DNS queries are successful, and there are no
SERVFAIL
orNXDOMAIN
errors. - If the Envoy proxy is running on a Compute Engine VM that doesn't have an external IP address or in a private GKE cluster, you need to configure Cloud NAT or another means to establish outbound internet connectivity.
If the errors persist, or if you are getting other 5xx errors, check the Envoy logs to narrow down the source of the errors.
Cannot reach Serverless services from the service mesh
You can send traffic to Serverless (Cloud Run, Cloud Run functions, and App Engine) services by using FQDN backends in Cloud Service Mesh. If the Envoy proxy is running on a Compute Engine VM that doesn't have an external IP address or in a private GKE cluster, you need to configure Private Google Access on the subnet to be able to access Google APIs and services.
What's next
- To learn more about client TLS policies, see Cloud Service Mesh service security and the Network Security API.