Updating Your Model's Schema

Justin McWilliams and Mark Ivey, Google Engineers
December 2012

If you are maintaining a successful app, you will eventually find a reason to change your schema. This article walks through an example showing the two basic steps needed to update an existing schema:

  1. Updating the Model class
  2. Updating existing Entities in the Datastore (this step isn't always necessary, we'll talk more about when to do it below).

Before We Start

While updating your schema, you may need to disable the ability for your users to edit data in your application. Whether or not this is necessary depends on your application, but there are a few situations (like trying to add a sequential index value to each entity) where it is much easier to correctly update existing entities if no other edits are happening.

Updating Your Models

Here's an example of a simple picture model:

class Picture(ndb.Model):
    author = ndb.StringProperty()
    name = ndb.StringProperty(default='')

Let's update this so each picture can have a rating. To store the ratings, we'll store the number of votes and the average value of the votes. Updating the model is fairly easy, we just add two new properties:

class Picture(ndb.Model):
    author = ndb.StringProperty()
    name = ndb.StringProperty(default='')
    # Two new fields
    num_votes = ndb.IntegerProperty(default=0)
    avg_rating = ndb.FloatProperty(default=0)

Now whenever a Picture entity is written to Datastore it will be written with a value for num_votes and avg_rating. Whenever a Picture is read from Datastore num_votes and avg_rating will be populated (either with the value from Datastore, or the default specified in the model). Note that existing entries will not be automatically modified, a read-write of the entity is required before the new properties will be persisted to Datastore.

Updating Existing Entities

Datastore doesn't require all entities to have the same set of properties. After updating your models to add new properties, existing entities will continue to exist without these properties. In some situations, this is fine, and you don't need to do any more work. When would you want to go back and update existing entities so they also have the new properties? One situation would be when you want to do a query based on the new properties. In our example with Pictures, queries like "Most popular" or "Least popular" wouldn't return existing pictures, because they don't (yet) have the ratings properties. To fix this, we'll need to update the existing entities in Datastore.

Conceptually, updating existing entities is easy. You just need to write a request handler to load all entities, set the value of the new property, and save them back to Datastore. However, if you need to update more than a couple thousand entities, you'll likely need to work around the short request deadline.

To do this, we can take advantage of the Task Queue API (Python, Java, Go) and Query Cursors. These will allow us to easily update small batches of entities in multiple different requests. First, we can write a small request handler which simply inserts a Task into the Task Queue. Each Task will then perform the following:

  1. Initialize a query for entities to update.
  2. If not the first Task, position the query where the previous Task left off, using the passed Query Cursor.
  3. Perform schema updates on a batch of entites; save to Datastore.
  4. Insert a Task to continue with the next batch in a new request.

def update_schema_task(cursor=None, num_updated=0, batch_size=100):
    """Task that handles updating the models' schema.

    This is started by
    UpdateSchemaHandler. It scans every entity in the datastore for the
    Picture model and re-saves it so that it has the new schema fields.

    # Force ndb to use v2 of the model by re-loading it.

    # Get all of the entities for this Model.
    query = models_v2.Picture.query()
    pictures, next_cursor, more = query.fetch_page(
        batch_size, start_cursor=cursor)

    to_put = []
    for picture in pictures:
        # Give the new fields default values.
        # If you added new fields and were okay with the default values, you
        # would not need to do this.
        picture.num_votes = 1
        picture.avg_rating = 5

    # Save the updated entities.
    if to_put:
        num_updated += len(to_put)
            'Put {} entities to Datastore for a total of {}'.format(
                len(to_put), num_updated))

    # If there are more entities, re-queue this task for the next page.
    if more:
            update_schema_task, cursor=next_cursor, num_updated=num_updated)
            'update_schema_task complete with {0} updates!'.format(

Next, create a request handler which uses deferred to kick start the new update_schema_task function. As the deferred documentation mentions, you can't call a method in the request handler module, so it's important the request handler and the update_schema_task function above live in different modules.

class UpdateSchemaHandler(webapp2.RequestHandler):
    """Queues a task to start updating the model schema."""
    def post(self):
        Schema update started. Check the console for task progress.
        <a href="/">View entities</a>.

Finally, you'll need to enable the deferred builtin:

runtime: python27
api_version: 1
threadsafe: true

# Deferred is required to use google.appengine.ext.deferred.
- deferred: on

- url: /.*
  script: main.app

- name: webapp2
  version: "2.5.2"
- name: jinja2
  version: "2.6"

You can also add a URL mapping in app.yaml with "login: admin", to ensure only administrators of your app can perform the schema migration. To do so, change the 'handlers' section of app.yaml to the following:

- url: /update_schema
  script: main.app
  login: admin
  secure: always
- url: /.*
  script: main.app

When you're ready to kickoff the schema migration, simply upload the new source to your App Engine application using appcfg and visit the /update_schema handler using your favorite web browser. Click "Add Entities", then "View Entities", then "Update Schema", and finally "View Entities" and your schema migration will be complete.

Removing Deleted Properties from the Datastore

If you remove a property from your model, you will find that existing entities still have the property. It will still be shown in the admin console and will still be present in Datastore. To really clean out the old data, you need to cycle through your entities and remove the data from each one.

  1. Make sure you have removed the properties from the model definition.
  2. If your model class inherits from ndb.Model, temporarily switch it to inherit from ndb.Expando. (ndb.Model instances can't be modified dynamically, which is what we need to do in the next step.)
  3. Cycle through existing entities (like described above). For each entity, use delattr to delete the obsolete property and then save the entity.
  4. If your model originally inherited from ndb.Model, don't forget to change it back after updating all the data.