Async Datastore API (Python)

Note: Developers building new applications are strongly encouraged to use the NDB Client Library, which has several benefits compared to this client library, such as automatic entity caching via the Memcache API. If you are currently using the older DB Client Library, read the DB to NDB Migration Guide

The Async Datastore API allows you to make parallel, non-blocking calls to the datastore and to retrieve the results of these calls at a later point in the handling of the request. This documentation describes the following aspects of the Async Datastore API:

Working with the Async Datastore Service

With the async datastore API, you make datastore calls using methods of the form *_async (such as get_async() in the google.appengine.ext.db package). The following code sample demonstrates some simple datastore operations using the asynchronous methods:

from google.appengine.ext import db

get_future = db.get_async(key)
put_future = db.put_async(model)
delete_future = db.delete_async(key)
allocate_ids_future = db.allocate_ids_async(key, 10)

These functions perform the same operations as the synchronous versions, except they immediately return an asynchronous object that you can use to get the real result at some later point. The following code sample demonstrates how to get the asynchronous result using get_result():

entity = get_future.get_result()
key = put_future.get_result()
range = allocate_ids_future.get_result()

# Wait for the operation to complete without returning a value.
# Exceptions that occurred in the call are thrown here. Calling
# get_result() allows you to verify that the deletion succeeded.
delete_future.get_result()

Note: Exceptions are not thrown until you call get_result(). Calling this method allows you to verify that the async operation succeeded.

Working with Async Transactions

Async datastore API calls can participate in transactions just like synchronous calls. Here's a function that adjusts the salary of an Employee and writes an additional SalaryAdjustment entity in the same entity group as the Employee, all within a single transaction.

def adjust_salary(employee_key, raise_ammount):
   def runner():
        # Async call to lookup the Employee entity
        employee_entity_future = db.get_async(employee_key)

        # Create and put a SalaryAdjustment entity in parallel with the lookup
        adjustment_entity = SalaryAdjustment(parent=employeeKey)
        adjustment_entity.adjustment = raise_amount
        db.put_async(adjustmentEntity)

        # Fetch the result of our lookup to make the salary adjustment
        employee_entity = employee_entity_future.get_result()
        employee_entity.salary += raise_amount

        # Re-put the Employee entity with the adjusted salary.
        db.put_async(employee_entity)
    db.run_in_transaction(runner)

This sample illustrates an important difference between async calls with no transactions and async calls within transactions. When you are not using a transaction, the only way to ensure that an individual async call has completed is to fetch the result of the async object using a transaction. Committing that transaction blocks on the result of all async calls made within a transaction.

So, in our example above, even though our async call to insert the SalaryAdjustment entity may still be outstanding when runner() finishes, the commit will not happen until the insert completes.

Async Queries

We do not currently expose an explicitly async API for queries. However, when you invoke Query.run(), the query immediately returns and asynchronously prefetches results. This allows your application to perform work in parallel while query results are fetched.

# ...

q1 = Salesperson.all().filter('date_of_hire <', one_month_ago)

# Returns instantly, query is executing in the background.
recent_hires = q1.run()

q2 = Customer.all().filter('last_contact >', one_year_ago)

# Also returns instantly, query is executing in the background.
needs_followup = q2.run()

schedule_phone_call(recent_hires, needs_followUp)

Unfortunately, Query.fetch() does not have the same behavior.

When To Use Async Datastore Calls

When you call a synchronous google.ext.db method, such as db.get(), your code blocks until the call to the datastore completes. If the only thing your application needs to do is render the result of the get() in HTML, blocking until the call is complete is a perfectly reasonable thing to do. However, if your application needs the result of the get() plus the result of a Query to render the response—and if the get() and the Query don't have any data dependencies—then waiting until the get() completes to initiate the Query is a waste of time. Here is an example of some code that can be improved by using the async API:

# Read employee data from the Datastore
employee = Employee.get_by_key_name('Max')  # Blocking for no good reason!

# Fetch payment history
query = PaymentHistory.all().ancestor(employee.key())
history = query.fetch(10)
render_html(employee, history)

Instead of waiting for the get() to complete, use db.get_async() to execute the call asynchronously:

employee_key = db.Key.from_path(Employee.kind(), 'Max')

# Read employee data from the Datastore
employee_future = db.get_async(employee_key)  # Returns immediately!

# Fetch payment history
query = PaymentHistory.all().ancestor(employee_key)

# Implicitly performs query asynchronously
history_itr = query.run(config=datastore_query.QueryOptions(limit=10))
employee = employee_future.get_result()
render_html(employee, history_itr)

The synchronous and asynchronous versions of this code use similar amounts of CPU (after all, they both perform the same amount of work), but since the asynchronous version allows the two datastore operations to execute in parallel, the asynchronous version has lower latency. In general, if you need to perform multiple datastore operations that don't have any data dependencies, the asynchronous API can significantly improve latency.