Troubleshooting deployments that use proxyless gRPC

This document provides information to help you resolve configuration issues when you deploy proxyless gRPC services with Traffic Director. For information about how to use the Client Status Discovery Service (CSDS) API to help you investigate issues with Traffic Director, see Understanding Traffic Director client status.

Troubleshooting RPC failures in a gRPC application

There are two common ways to troubleshoot remote procedure call (RPC) failures in a gRPC application:

  1. Review the status returned when an RPC fails. Usually, the status contains enough information to help you understand the cause of an RPC failure.

  2. Enable logging in gRPC runtime. Sometimes you need to review the gRPC runtime logs to understand a failure that might not get propagated back to an RPC return status. For example, when an RPC fails with a status indicating that the deadline has been exceeded, the logs can help you to understand the underlying failure that caused the deadline to be exceeded.

    Different language implementations of gRPC have different ways to enable logging in the gRPC runtime:

    • gRPC in Java: gRPC uses java.util.logging for logging. Set io.grpc.level to the FINE level to enable sufficient verbose logging in gRPC runtime. A typical way to enable logging in Java is to load the logging config from a file and provide the file location to JVM by using a command-line flag. For example:

      # Create a file called logging.properties with the following contents:
      handlers=java.util.logging.ConsoleHandler
      io.grpc.level=FINE
      io.grpc.xds.level=FINEST
      java.util.logging.ConsoleHandler.level=ALL
      java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter
      
      # Pass the location of the file to JVM by using this command-line flag:
      -Djava.util.logging.config.file=logging.properties
      

      To enable logging specific to xDS modules, set io.grpc.xds.level to FINE. To see more detailed logging, set the level to FINER or FINEST.

    • gRPC in Go: Turn on logging by setting environment variables.

      GRPC_GO_LOG_VERBOSITY_LEVEL=99 GRPC_GO_LOG_SEVERITY_LEVEL=info
      
    • gRPC in C++: To enable logging with gRPC in C++, see the instructions in Troubleshooting gRPC. To enable logging specific to xDS modules, enable the following tracers by using the GRPC_TRACE environment variable for xds_client, xds_resolver, cds_lb, eds_lb, priority_lb, weighted_target_lb, and lrs_lb.

    • gRPC in Node.js: To enable logging with gRPC in Node.js, see the instructions in Troubleshooting gRPC-JS. To enable logging specific to xDS modules, enable the following tracers by using the GRPC_TRACE environment variable for xds_client, xds_resolver, cds_balancer, eds_balancer, priority, and weighted_target.

Depending on the error in the RPC status or in the runtime logs, your issue might fall in one of the following categories.

Unable to connect to Traffic Director

To troubleshoot connection issues, try the following:

  • Check that the server_uri value in the bootstrap file is trafficdirector.googleapis.com:443.
  • Ensure that the environment variable GRPC_XDS_BOOTSTRAP is defined and pointing to the bootstrap file.
  • Ensure that you are using xds scheme in the URI when you create a gRPC channel.
  • Make sure that you granted the required IAM permissions for creating compute instances and modifying a network in a project.
  • Make sure that you enabled the Traffic Director API for the project. Under the Google Cloud Console APIs & services for your project, look for errors in the Traffic Director API.
  • Confirm that the service account has the correct permissions. The gRPC applications running in the VM or the Pod use the service account of the Compute Engine VM host or the Google Kubernetes Engine (GKE) node instance.
  • Confirm that the API access scope of the Compute Engine VMs or GKE clusters is set to allow full access to the Compute Engine APIs. Do this by specifying the following when you create the VMs or cluster:

    --scopes=https://www.googleapis.com/auth/cloud-platform
    
  • Confirm that you can access trafficdirector.googleapis.com:443 from the VM. If there are access issues, the possible reasons include a firewall preventing access to trafficdirector.googleapis.com over TCP port 443 or DNS resolution issues for the trafficdirector.googleapis.com hostname.

Hostname specified in the URI cannot be resolved

You might encounter an error message like the following one in your logs:

[Channel<1>: (xds:///my-service:12400)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=NameResolver returned no usable address. addrs=[], attrs={}

To troubleshoot hostname resolution issues, try the following:

  • Ensure that you are using a supported gRPC version and language.
  • Ensure that the port used in the URI to create a gRPC channel matches the port value in the forwarding rule used in your configuration. If a port is not specified in the URI, then the value 80 is used to match a forwarding rule.
  • Ensure that the hostname and port used in the URI to create a gRPC channel exactly matches a host rule in the URL map used in your configuration.
  • Ensure that the same host rule is not configured in more than one URL map.
  • Ensure that no wildcards are in use. Host rules containing a * wildcard character are ignored.

RPC fails because the service isn't available

To troubleshoot RPC failures when a service isn't available, try the following:

  • Check the overall status of Traffic Director and the status of your backend services in the Google Cloud Console:

    • In the Associated routing rule maps column, ensure that the correct URL maps reference the backend services. Click the column to check that the backend services specified in the host matching rules are correct.
    • In the Backends column, check that the backends associated with your backend services are healthy.
    • If the backends are unhealthy, click the corresponding backend service and ensure that the correct health check is configured. Health checks commonly fail because of incorrect or missing firewall rules or a mismatch in the tags specified in the VM and in the firewall rules. For more information, see Creating health checks.
  • For gRPC health checks to work correctly, the gRPC backends must implement the gRPC health checking protocol. If this protocol is not implemented, use a TCP health check instead. Do not use an HTTP, HTTPS, or HTTP/2 health check with gRPC services.

  • When you use instance groups, ensure that the named port specified in the instance group matches the port used in the health check. When you use network endpoint groups (NEGs), ensure that the GKE service spec has the correct NEG annotation, and the health check is configured to use the NEG serving port.

  • Check that the endpoint protocol is configured as GRPC.

What's next