Troubleshooting proxyless Traffic Director deployments

This document provides information to help you resolve configuration issues involving deployment of proxyless gRPC services with Traffic Director.

Troubleshooting RPC failures in a gRPC application

There are two common ways to troubleshoot RPC failures in a gRPC application:

  1. Review the status returned when an RPC fails. In most cases, the status contains enough information to help you understand the cause of an RPC failure. Status error handling in gRPC is explained in the gRPC error handling documentation.
  2. Enable logging in gRPC runtime. Sometimes you need to review the gRPC runtime logs to understand a failure that might not get propagated back to an RPC return status. For example, when an RPC fails with a status indicating that the deadline has been exceeded, the logs can help you to understand the underlying failure that caused the deadline to be exceeded. Different language implementations of gRPC have different ways to enable logging in the gRPC runtime.

    • gRPC in Java: gRPC uses java.util.logging for logging. Set io.grpc.level to FINE level to enable sufficient verbose logging in gRPC runtime. A typical way to enable logging in Java is to load the logging config from a file and provide the file location to JVM using a command-line flag. For example:
    # Create a file called logging.properties with the following contents.
    handlers=java.util.logging.ConsoleHandler
    io.grpc.level=FINE
    io.grpc.xds.level=FINEST
    java.util.logging.ConsoleHandler.level=ALL
    java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter
    
    # Pass the location of the file to JVM via this command-line flag
    -Djava.util.logging.config.file=logging.properties
    

    To enable logging specific to xDS modules, set io.grpc.xds.level to FINE. To see more detailed logging set the level to FINER or FINEST.

    GRPC_GO_LOG_VERBOSITY_LEVEL=99 GRPC_GO_LOG_SEVERITY_LEVEL=info
    
    • To enable logging with gRPC in C++ see the instructions here. To enable logging specific to xDS modules, enable the following tracers using the GRPC_TRACE environment variable as described here, for xds_client, xds_resolver, cds_lb, eds_lb, priority_lb, weighted_target_lb and lrs_lb.

Depending on the error in the RPC status or in the runtime logs, your issue might fall in one of the following categories.

Unable to connect to Traffic Director

To troubleshoot connection issues, try the following:

  • Check that the server_uri value in the bootstrap file is trafficdirector.googleapis.com:443.
  • Ensure that the environment variable GRPC_XDS_BOOTSTRAP is defined and pointing to the bootstrap file.
  • Ensure that you are using xds scheme in the URI when you create a gRPC channel.
  • Make sure that you granted the required IAM permissions for creating compute instances and modifying a network in a project.
  • Make sure that you enabled the Traffic Director API for the project. Under the Google Cloud Console APIs & services for your project, look for errors for the Traffic Director API.
  • Confirm that the service account has the correct permissions. The gRPC applications running in the VM or the Pod use the service account of the Compute Engine VM host or of the Google Kubernetes Engine node instance.
  • Confirm that API access scope of the Compute Engine VMs or GKE clusters is set to allow full access to the Compute Engine APIs. Do this by specifying --scopes=https://www.googleapis.com/auth/cloud-platform when you create the VMs or clusteer.
  • Confirm that you can access trafficdirector.googleapis.com:443 from the VM. If there are access issues, the possible reasons include a firewall preventing access to trafficdirector.googleapis.com over TCP port 443 or DNS resolution issues for the trafficdirector.googleapis.com hostname.

Hostname specified in the URI cannot be resolved

To troubleshoot hostname resolution issues, try the following:

  • Ensure that your gRPC client applications are upgraded to gRPC version 1.30.0 or higher.
  • Ensure that the port used in the URI to create a gRPC channel matches the port value in the forwarding rule used in your configuration. If a port is not specified in the URI, then the value 80 is used to match a forwarding rule.
  • Ensure that the hostname and port used in the URI to create a gRPC channel exactly matches a host rule in the URL map used in your configuration.
  • Ensure that the same host rule is not configured in more one URL map.
  • Ensure that no wildcards are in use. Host rules containing * wildcard character are ignored.

RPC fails because the service isn't available

To troubleshoot RPC failures when a service isn't available, try the following:

  • Check the overall status of Traffic Director and the status of your backend services on the Cloud Console. Ensure that the correct URL maps reference the backend services. This is in the Associated routing rule maps column.
  • Check that the backends associated with your backend services are healthy, as shown in the Backends column.
  • If the backends are unhealthy, click the corresponding backend service and ensure that the correct health check is configured. Health checks commonly fail because of incorrect or missing firewall rules, or a mismatch in the tags specified in the VM and in the firewall rules. For more information, see Creating Health Checks.
  • For gRPC health checks to work correctly, the gRPC backends must implement the gRPC health checking protocol {: class="external" target="github" track-type="article" track-name="gitHubLink" track-metadata-position="body" }. If this protocol is not implemented, use a TCP health check instead. Do not use an HTTP, HTTPS, or HTTP/2 health check with gRPC services.
  • When you use instance groups, ensure that the named port specified in the instance group matches the port used in the health check. When you use NEGs, ensure that the GKE service spec has the correct NEG annotation and the health check is configured to use the NEG serving port.
  • Check that the endpoint protocol is configured as GRPC.
  • Click Associated routing rule maps and check that the backend services specified in the host matching rules are correct.

What's next

For general Traffic Director troubleshooting information, see Troubleshooting Traffic Director deployments.