Avoiding datastore contention

Jason Cooper
June 2009

App Engine's datastore is a powerful distributed data storage service built on top of the high-performance database management system known as Bigtable. Although the datastore is built to scale, you must take care when designing your data models to avoid the prospect of contention as your application grows.


Datastore contention occurs when a single entity or entity group is updated too rapidly. The datastore will queue concurrent requests to wait their turn. Requests waiting in the queue past the timeout period will throw a concurrency exception. If you're expecting to update a single entity or write to an entity group more than several times per second, it's best to re-work your design early-on to avoid possible contention once your application is deployed.


Here are some tips you can use to reduce the possibility of datastore contention in your application:

Keep entity groups small

App Engine's datastore has limited support for transactions. In order to guarantee that updates to two or more entities are atomic (i.e. all updates are applied or none at all), these entities must be in the same entity group. When a transaction starts, App Engine uses optimistic concurrency control by checking the last update time for the entity groups used in the transaction. Upon committing a transaction for the entity groups, App Engine again checks the last update time for the entity groups used in the transaction. If it has changed since our initial check, an exception is thrown.

Given this, it's clear that frequent updates to one or more entities within a single hierarchy can easily lead to contention. For that reason, you should work to keep your entity groups small and only create them when transactions are absolutely necessary. The documentation recommends keeping entity groups no larger than a single user's worth of data. Note that entity groups are not required if you simply plan to reference one entity from another.

Shard oft-written entities

In many cases, it isn't very difficult to re-work your initial data model to avoid the potential of simultaneous writes to single entities or entity groups. But there are legitimate cases where updating a single entity makes sense, e.g. a hit counter that is incremented in every request. If you expect more than one or two requests per second, such a counter could easily lead to contention. For cases like this, sharding is the answer. Sharding takes advantage of two very important principles: the commutative and associative property of addition and App Engine's aptitude at handling many parallel requests distributed across distinct entities.

In essence, sharding effectively splits a single entity into many. In the hit counter example, you would use N separate entities instead of 1. As each request comes in, one of these "shards" is selected at random and the associated count is incremented. To get a final tally, directly fetch each shard, which is especially efficient in App Engine, and sum each individual counter to arrive at the total count. See "Sharding Counters" for more information, and check out the SDK demos for a sample implementation in your favorite runtime.


App Engine is a great "engine" for building highly scalable web applications backed by a world-class infrastructure, but it's your responsibility to use the tools provided as effectively and efficiently as possible. A large part of this is designing your data model to leverage the core strengths of App Engine's underlying datastore and doing so early-on so you can reap the rewards as your application's traffic skyrockets.