Designing for Scale

John Lowry
December 2013

App Engine is a scalable system which will automatically add more capacity as workloads increase.

If you are expecting high traffic, we strongly recommend that you run load tests to reduce the risk of hitting a bottleneck in your code or in App Engine. Your tests should be designed to simulate real world traffic as closely as possible. In addition to testing at the maximum load that you expect to encounter, we also recommend that you test under various scenarios. For example, what happens if your application moves to a new datacenter, causing a Memcache flush and all instances to be stopped and restarted? In the Google Cloud Platform Console, you can flush Memcache and stop all instances to simulate this.

Here are some best practices to ensure that your app will scale to high load.

A single entity group in the Datastore should not be updated too rapidly

If you are using the Datastore, Google recommends that you design your application so that it will not need to update an entity group more than once per second for a sustained period of time. Remember that an entity with no parent and no children is its own entity group. If you update an entity group too rapidly then your Datastore writes will have higher latency, timeouts, and other types of error. This is known as contention.

Datastore write rates to a single entity group can exceed the one per second limit so load tests may not show this problem. Some suggestions for designing your application to reduce write rates on entity groups are in the Datastore contention article.

Avoid high write rates to Datastore keys that are lexicographically close

The Datastore is built on top of Google’s NoSQL database, Bigtable, and is subject to Bigtable's performance characteristics. Bigtable scales by sharding rows onto separate tablet servers, and these rows are lexicographically ordered by key. If you are using the Datastore, you can get slow writes due to a hot tablet server if you have a write rate to a small range of keys that exceeds the capacity of a single tablet server.

By default, App Engine allocates keys using a scattered algorithm. Thus you will not normally encounter this problem if you create new entities at a high write rate using the default ID allocation policy. There are some corner cases where you can hit this problem:

  • If you create new entities at a very high rate using the legacy sequential ID allocation policy.
  • If you create new entities at a very high rate and you are allocating your own IDs which are monotonically increasing.
  • If you create new entities at a very high rate for a kind which previously had very few existing entities. Bigtable will start off with all entities on the same tablet server and will take some time to split the range of keys onto separate tablet servers.
  • You will also see this problem if you create new entities at a high rate with a monotonically increasing indexed property like a timestamp, because these properties are the keys for rows in the index tables in Bigtable.
For a more detailed explanation of this issue, see Ikai Lan's blog posting on saving monotonically increasing values in the Datastore.

Do not set a spending limit that could be exceeded

You can configure a daily spending limit for your application if you are paying online. You can view the spending limit in the Cloud Platform Console. Your application will serve errors if your spending limit is exceeded, so ensure that your limit is sufficient to handle the maximum possible daily usage. You should not wait until the last minute to try to increase your spending limit, because Google needs an approval from your credit card issuer. If there are problems getting this approval then your spending limit increase will be delayed.

Ensure that you will not hit quota limits on API calls

Some API calls have per-minute and per-day quota limits in order to prevent a single application from using up more than its share of available resources. In the Cloud Platform Console, you can view your quota details for all API calls. You will get a quota denied error if you exceed quota limits.

APIs that are not yet generally available are usually subject to strict quota limits. You can visit the App Engine features page to find out which APIs are generally available, and which are still in preview or experimental stages.

In addition, there are some APIs that have relatively strict quotas, despite being generally available. APIs in this category include URL Fetch, Sockets, and Channels. If you are using these APIs, you should pay particular attention to quota limits.

The per-minute quota limits are not shown in the Cloud Platform Console. Your application will not hit a per-minute quota unless you have an unexpected usage pattern. Load testing can be used to determine whether you would hit a per-minute quota limit.

Shard task queues if high throughput is needed

You can shard task queues if you want higher throughput than is possible with a single queue. You can use the same principles described in the sharding counters article to do this. Your application should not depend on tasks executing immediately after creation so that you can handle a short term backlog in the task queues without affecting the experience of your end users.

Use the default performance settings unless you have tested the impact of changes

We recommend that you use the default settings for automatic scaling for max idle instances and min/max pending latency on automatic unless you have done load testing with other settings to verify their effects. The default performance settings will, in most cases, enable the lowest possible latency. A trade-off for low latency is usually higher costs due to having additional idle instances that can handle temporary spikes in load.

You should add resident instances if you want to minimize latency, particularly if you expect sudden spikes in traffic. The number of resident instances that are needed will depend on your traffic and it is best to do load tests to determine the optimal number.

Use traffic splitting when switching to a new default version

A high traffic application may get errors or higher latency when updating to a new version in the following scenarios:

  • Complete update of a new default version
  • Set the default version

Once the update is complete, App Engine will send requests to the new version. However, the new version may take some time to spin up enough instances to handle all traffic. During this period, requests can potentially sit on the pending queue and may time out.

Therefore, in order to minimize latency and errors, we recommend that customers use traffic splitting to move traffic gradually to a new version before making it the default.

An application may serve requests from both versions while you are moving traffic to the new version. In most cases, this will not cause any problems. However, if you have an incompatibility in the cached objects used by an application then you will need to ensure that users go to the same version of an application during their session. You will need to code this into your application logic.

Avoid Memcache hot keys

You can use Dedicated Memcache in order to get guaranteed capacity and more consistent performance. If your load is unevenly distributed across the Memcache keyspace then you may not see the expected performance from Dedicated Memcache.

For Dedicated Memcache, we recommend that the peak access rate on a single key should be 1-2 orders of magnitude less than the per-GB rating. For example, the rating for 1 KB sized items is 10,000 operations per second per GB of Dedicated Memcache. Therefore, the load on a single key should not be higher than 100 - 1,000 operations per second for items that are 1 KB in size.

In load tests, you may see better performance. However, you should design your code so that it complies with the published ratings because the performance of Dedicated Memcache can change.

Here are some strategies for reducing the operations per second on frequently-used keys:

  • Organize data per user so that a single HTTP request only hits that user's data. Avoid storing global data which must be accessed from all HTTP requests.
  • Shard your frequently-used keys to keep under the per-key guideline.
  • Cache frequently-used keys in your instances' global memory not in Memcache.

Test third-party dependencies

If you depend on a system outside App Engine for handling requests then you should ensure that this system has been tested to handle high load. For example, if you are using URL Fetch to get data from a third-party web server then you can determine the impact of various load testing scenarios on the third party web server's throughput and latency.

Implement backoff on retry

Your code may retry on failure, whether calling an App Engine service such as the Datastore or an external service using URL Fetch or the Socket API. In these cases, you should always implement a randomized exponential backoff policy in order to avoid the thundering herd problem. You should also limit the total number of retries and handle failures after the maximum retry limit is reached.

An example implementation for backoff on retry is in the Cloud Storage client library :

Send feedback about...