Troubleshoot elevated latency in your app

In many cases, elevated latency in your application will eventually result in 5xx server errors. Therefore, it makes sense to follow a similar set of troubleshooting steps to narrow down the root cause of both error and latency spikes, given that the causes of each might be the same.

Scope the issue

First, define the scope of the issue as narrowly as possible by gathering relevant information. Below are some suggestions for information that might be relevant.

  • What application IDs, services, and versions are impacted?
  • Which specific endpoints on the app are impacted?
  • Did this impact all clients globally, or a specific subset of clients?
  • What is the start and end time of the incident? You should specify the time zone.
  • What specific errors are you seeing?
  • What is the observed latency delta, which is usually specified as an increase at a specific percentile? For example, latency increased by 2 seconds at the 90th percentile.
  • How did you measure the latency? In particular, was it measured at the client or is it visible in Cloud Logging and/or in the Cloud Monitoring latency data provided by the App Engine serving infrastructure?
  • What are the dependencies of your application and did any of them experience incidents?
  • Did you make any recent code, configuration or workload changes that could have triggered this issue?

An application may have its own custom monitoring and logging that you can use to further narrow down the issue scope beyond the suggestions above. Defining the scope of the problem will guide you towards the likely root cause and determine your next troubleshooting steps.

Determine what failed

Next, determine which component in the request path is mostly likely to be causing the latency or errors. The main components in the request path are:

Client -> Internet -> Google Front End (GFE) -> App Engine serving infrastructure -> Application instance

If the information gathered in step #1 does not point you to the source of the failure, then you should generally start by looking at the health and performance of your application instances.

One way to determine whether or not the problem lies in your application instance is to look at the App Engine request logs: if you see HTTP status code errors or elevated latency in those logs, it generally means that the issue lies in the instance which is running your application.

There is one scenario in which elevated errors and latency in the request logs may not be caused by the application instance itself: If the number of instances of your application has not scaled up to match traffic levels, then your instances may be overloaded, resulting in elevated errors and latency.

If you see elevated errors or latency in Cloud Monitoring then you can generally conclude that the problem lies upstream of the load balancer, which records the App Engine metrics. In most cases, this points to a problem in the application instances.

However, if you see elevated latency or errors in monitoring metrics but not request logs, further investigation may be needed. It may indicate a failure in the load balancing layer, or that the instances are experiencing such a severe failure that the load balancer cannot route requests to them. To distinguish between these cases, you can look at the request logs right before the incident starts. If the request logs show increasing latency right before the failure, it indicates that the application instances themselves were beginning to fail before the load balancer stopped routing requests to them.

Scenarios that might cause incidents

Here are some scenarios that users have encountered.

Client

Map a client IP to a geographical region

Google resolves the hostname for the App Engine application to the closest GFE to the client, based on the client IP address used in the DNS lookup. If the client's DNS resolver is not using the EDNS0 protocol, then client requests may not be routed to the closest GFE.

Internet

Poor internet connectivity

Run the following command on your client to determine if the issue is poor internet connectivity.

$ curl -s -o /dev/null -w '%{time_connect}\n' <hostname>

The value for time_connect generally represents the latency of the client's connection to the nearest Google Front End. If this connection is slow, you can troubleshoot further using traceroute to determine which hop on the network causes the delay.

You can run tests from clients in different geographical locations. Requests will be automatically routed to the closest Google data center, which will vary based on the client's location.

Clients with low bandwidth

The application may be responding quickly but the response might be slowed down by network bottlenecks that cause the App Engine serving infrastructure to not send packets across the network as quickly as they potentially could be sent.

Google Front End (GFE)

HTTP/2 head of line blocking

HTTP/2 clients sending multiple requests in parallel might see elevated latency due to head of line blocking at the GFE. The best solution is for clients to upgrade to use the QUIC protocol.

SSL termination for custom domains

The GFE terminates the SSL connection. An extra hop is required for SSL termination if you are using a custom domain, rather than an appspot.com domain. This might add latency for applications running in some regions.

App Engine serving infrastructure

Service-wide incident

Google will post details of a severe service-wide at https://status.cloud.google.com/. Note that Google does rollouts gradually so a service-wide incident is unlikely to affect all your instances at once.

Autoscaling

Scale up traffic too fast

App Engine autoscaling may not scale your instances as fast as traffic increases, leading to temporary overloading. Typically, this occurs when traffic is not generated organically by end users, and is generated by a computer program instead. The best way to resolve is to throttle the system that generates the traffic.

Spikes in traffic

Spikes in traffic might cause elevated latency in cases where an autoscaled application needs to scale up more quickly than is possible without affecting latency. End user traffic does not usually cause frequent traffic spikes. If you are seeing this then you should investigate what is causing the traffic spikes. If a batch system is running at intervals then you may be able to smooth out the traffic or use different scaling settings.

Autoscaler settings

The autoscaler can be configured based on the scaling characteristics of your application. These scaling parameters may become non-optimal during certain scenarios.

App Engine flexible environment applications scale based on CPU utilization. But the application may become I/O bound during an incident resulting in overloading instances with a high number of requests because CPU-based scaling does not occur.

App Engine standard environment scaling settings might cause latency if set too aggressively. If you see server responses with the status code 500 and the message Request was aborted after waiting too long to attempt to service your request in your logs, it means that the request timed out on the pending queue waiting for an idle instance.

Don't use App Engine standard environment manual scaling if your app serves end-user traffic. Manual scaling is better for workloads such as task queues. You might see increased pending time with manual scaling even when you have provisioned sufficient instances.

Don't use App Engine standard environment basic scaling for latency sensitive applications. This scaling type is designed to minimize costs at the expense of latency.

App Engine standard environment's default scaling settings provides optimal latency for most applications. If you still see requests with high pending time, you can specify a minimum number of instances. If you tune the scaling settings to reduce costs by minimizing idle instances then you run the risk of seeing latency spikes if the load increases suddenly.

We recommend that you benchmark performance with the default scaling settings then run a new benchmark after each change to these settings.

Deployments

Elevated latency shortly after a deployment indicates that you have not sufficiently scaled up before migrating traffic. Newer instances may not have warmed up local caches and hence may serve more slowly than older instances.

To avoid latency spikes, don't deploy an App Engine app using the same version name as an existing version of the app. If you reuse an existing version name, you won't be able to slowly migrate traffic to the new version. Requests may be slower because every instance will be restarted within a short period of time. You will also have to redeploy if you want to revert to the previous version.

Application instance

Application code

Issues in application code can be very challenging to debug, particularly if they are intermittent or not easily reproduced. In order to help diagnose issues, we recommend that your application is instrumented using logging, monitoring and tracing. You can try using Cloud Profiler to diagnose issues. See this example of diagnosing loading request latency using Cloud Trace to upload additional timing information for each request.

You can also try to reproduce the issue in a local development environment which may allow you to run language-specific debugging tools that may not be possible to run within App Engine.

If you are running in the App Engine flexible environment, you can SSH to an instance and take a thread dump to see the current state of your application. You can try to reproduce the problem in a load test or by running the app locally. You can increase the instance size to see if this resolves the problem. For example, increased RAM may resolve issues for applications that are experiencing delays due to garbage collection.

To better understand how your app fails and what bottlenecks occur, you can load test your application until failure. Set a maximum instance count and then gradually increase load until the application fails.

If the latency issue correlates with deployment of a new version of your application code then you can roll back to determine whether the new version caused the incident. If you deploy continuously then you may have sufficiently frequent deployments that it is hard to determine whether or not the deployment caused the incident based on time of onset.

Your application may store configuration settings within the Datastore or elsewhere. It will be helpful if you can create a timeline of configuration changes to determine whether any of these line up with the onset of elevated latency.

Workload change

A workload change might cause elevated latency. Some monitoring metrics that may indicate the workload changed include qps as well as API usage or latency. You can also check for changes in request and response sizes.

Health check failures

The App Engine flexible environment load balancer will stop routing requests to instances that fail health checks. This might increase load on other instances, potentially resulting in a cascading failure. The App Engine flexible environment Nginx logs show instances that fail health checks. Analyze your logs and monitoring to determine why the instance went unhealthy, or configure the health checks to be less sensitive to transient failures. Note that there will be a short delay before the load balancer stops routing traffic to an unhealthy instance. This delay might cause an error spike if the load balancer cannot retry requests.

App Engine standard environment does not use health checks.

Memory pressure

If monitoring shows either a saw-tooth pattern in memory usage, or a drop in memory usage that correlates to deployments, then performance issues may be caused by a memory leak. A memory leak might cause frequent garbage collection leading to higher latency. Provisioning larger instances with more memory may resolve the issue if you can't easily trace it to a problem in the code.

Resource leak

If an instance of your application shows rising latency that correlates with instance age, then you may have a resource leak that causes performance issues. In this type of issue, you will also see latency drops right after a deployment. For example, a data structure that gets slower over time due to higher CPU usage might cause any CPU-bound workload to get slower.

Code optimization

Some ways that you can optimize code on App Engine to reduce latency:

  • Offline work: Use Cloud Tasks so that user requests do not block waiting for completion of work such as sending mail.

  • Asynchronous API calls: Ensure that your code is not blocked waiting for an API call to complete. Libraries such as ndb offer built-in support for this.

  • Batch API calls: The batch version of API calls is usually faster than sending individual calls.

  • Denormalize data models: Reduce the latency of calls made to the data persistence layer by denormalizing your data models.

Dependencies

You can monitor dependencies of your application so that you can detect if latency spikes correlate with a dependency failure.

An increase in latency for a dependency may be caused by a change in the workload as well as an increase in traffic.

Non-scaling dependency

If your dependency does not scale as the number of App Engine instances scales up then the dependency may become overloaded when traffic increases. An example of a dependency that may not scale is a SQL database. A higher number of application instances will lead to a higher number of database connections which might cause cascading failure by preventing the database from starting up.

One way to recover from this is as follows:

  1. Deploy a new default version that does not connect to the database.
  2. Shut down the previous default version.
  3. Deploy a new non-default version that does connect to the database.
  4. Slowly migrate traffic to the new version.

A potential preventative measure is to design your application to drop requests to the dependency using Adaptive Throttling.

Caching layer failure

A good way to speed up requests is to make use of multiple caching layers:

  • Edge caching
  • Memcache
  • In-instance memory

A sudden increase in latency might be caused by a failure in one of these caching layers. For example, a Memcache flush may cause more requests to go to the slower Datastore.