Data Modeling in Python

Note: Developers building new applications are strongly encouraged to use the NDB Client Library, which has several benefits compared to this client library, such as automatic entity caching via the Memcache API. If you are currently using the older DB Client Library, read the DB to NDB Migration Guide

Overview

A datastore entity has a key and a set of properties. An application uses the datastore API to define data models, and create instances of those models to be stored as entities. Models provide a common structure to the entities created by the API, and can define rules for validating property values.

Model Classes

The Model Class

An application describes the kinds of data it uses with models. A model is a Python class that inherits from the Model class. The model class defines a new Kind of datastore entity and the properties the Kind is expected to take. The Kind name is defined by the instantiated class name that inherits from db.Model.

Model properties are defined using class attributes on the model class. Each class attribute is an instance of a subclass of the Property class, usually one of the provided property classes. A property instance holds configuration for the property, such as whether or not the property is required for the instance to be valid, or a default value to use for the instance if none is provided.

from google.appengine.ext import db

class Pet(db.Model):
    name = db.StringProperty(required=True)
    type = db.StringProperty(required=True, choices=set(["cat", "dog", "bird"]))
    birthdate = db.DateProperty()
    weight_in_pounds = db.IntegerProperty()
    spayed_or_neutered = db.BooleanProperty()

An entity of one of the defined entity kinds is represented in the API by an instance of the corresponding model class. The application can create a new entity by calling the constructor of the class. The application accesses and manipulates properties of the entity using attributes of the instance. The model instance constructor accepts initial values for properties as keyword arguments.

from google.appengine.api import users

pet = Pet(name="Fluffy",
          type="cat")
pet.weight_in_pounds = 24

Note: The attributes of the model class are configuration for the model properties, whose values are Property instances. The attributes of the model instance are the actual property values, whose values are of the type accepted by the Property class.

The Model class uses the Property instances to validate values assigned to the model instance attributes. Property value validation occurs when a model instance is first constructed, and when an instance attribute is assigned a new value. This ensures that a property can never have an invalid value.

Because validation occurs when the instance is constructed, any property that is configured to be required must be initialized in the constructor. In this example, name and type are required values, so their initial values are specified in the constructor. weight_in_pounds is not required by the model, so it starts out unassigned, then is assigned a value later.

An instance of a model created using the constructor does not exist in the datastore until it is "put" for the first time.

Note: As with all Python class attributes, model property configuration is initialized when the script or module is first imported. Because App Engine caches imported modules between requests, module configuration may be initialized during a request for one user, and re-used during a request for another. Do not initialize model property configuration, such as default values, with data specific to the request or the current user. See App Caching for more information.

The Expando Class

A model defined using the Model class establishes a fixed set of properties that every instance of the class must have (perhaps with default values). This is a useful way to model data objects, but the datastore does not require that every entity of a given kind have the same set of properties.

Sometimes it is useful for an entity to have properties that aren't necessarily like the properties of other entities of the same kind. Such an entity is represented in the datastore API by an "expando" model. An expando model class subclasses the Expando superclass. Any value assigned to an attribute of an instance of an expando model becomes a property of the datastore entity, using the name of the attribute. These properties are known as dynamic properties. Properties defined using Property class instances in class attributes are fixed properties.

An expando model can have both fixed and dynamic properties. The model class simply sets class attributes with Property configuration objects for the fixed properties. The application creates dynamic properties when it assigns them values.

class Person(db.Expando):
    first_name = db.StringProperty()
    last_name = db.StringProperty()
    hobbies = db.StringListProperty()

p = Person(first_name="Albert", last_name="Johnson")
p.hobbies = ["chess", "travel"]

p.chess_elo_rating = 1350

p.travel_countries_visited = ["Spain", "Italy", "USA", "Brazil"]
p.travel_trip_count = 13

Because dynamic properties do not have model property definitions, dynamic properties are not validated. Any dynamic property can have a value of any of the datastore base types, including None. Two entities of the same kind can have different types of values for the same dynamic property, and one can leave a property unset that the other sets.

Unlike fixed properties, dynamic properties need not exist. A dynamic property with a value of None is different from a non-existent dynamic property. If an expando model instance does not have an attribute for a property, the corresponding data entity does not have that property. You can delete a dynamic property by deleting the attribute.

Attributes whose names begin with an underscore (_) are not saved to the datastore entity. This allows you to store values on the model instance for temporary internal use without affecting the data saved with the entity.

Note: Static properties will always be saved to the datastore entity regardless of whether it is Expando, Model, or begins with an underscore (_).

del p.chess_elo_rating

A query that uses a dynamic property in a filter returns only entities whose value for the property is of the same type as the value used in the query. Similarly, the query returns only entities with that property set.

p1 = Person()
p1.favorite = 42
p1.put()

p2 = Person()
p2.favorite = "blue"
p2.put()

p3 = Person()
p3.put()

people = db.GqlQuery("SELECT * FROM Person WHERE favorite < :1", 50)
# people has p1, but not p2 or p3

people = db.GqlQuery("SELECT * FROM Person WHERE favorite > :1", 50)
# people has no results

Note: The example above uses queries across entity groups, which may return stale results. For strongly consistent results, use ancestor queries within entity groups.

The Expando class is a subclass of the Model class, and inherits all of its methods.

The PolyModel Class

The Python API includes another class for data modeling that allows you to define hierarchies of classes, and perform queries that can return entities of a given class or any of its subclasses. Such models and queries are called "polymorphic," because they allow instances of one class to be results for a query of a parent class.

The following example defines a Contact class, with the subclasses Person and Company:

from google.appengine.ext import db
from google.appengine.ext.db import polymodel

class Contact(polymodel.PolyModel):
    phone_number = db.PhoneNumberProperty()
    address = db.PostalAddressProperty()

class Person(Contact):
    first_name = db.StringProperty()
    last_name = db.StringProperty()
    mobile_number = db.PhoneNumberProperty()

class Company(Contact):
    name = db.StringProperty()
    fax_number = db.PhoneNumberProperty()

This model ensures that all Person entities and all Company entities have phone_number and address properties, and queries for Contact entities can return either Person or Company entities. Only Person entities have mobile_number properties.

The subclasses can be instantiated just like any other model class:

p = Person(phone_number='1-206-555-9234',
           address='123 First Ave., Seattle, WA, 98101',
           first_name='Alfred',
           last_name='Smith',
           mobile_number='1-206-555-0117')
p.put()

c = Company(phone_number='1-503-555-9123',
            address='P.O. Box 98765, Salem, OR, 97301',
            name='Data Solutions, LLC',
            fax_number='1-503-555-6622')
c.put()

A query for Contact entities can return instances of either Contact, Person, or Company. The following code prints information for both entities created above:

for contact in Contact.all():
    print 'Phone: %s\nAddress: %s\n\n' % (contact.phone_number,
                                          contact.address)

A query for Company entities returns only instances of Company:

for company in Company.all()
    # ...

For now, polymorphic models should not passed to the Query class constructor directly. Instead, use the all() method, as in the example above.

For more information on how to use polymorphic models, and how they are implemented, see The PolyModel Class.

Property Classes and Types

The datastore supports a fixed set of value types for entity properties, including Unicode strings, integers, floating point numbers, dates, entity keys, byte strings (blobs), and various GData types. Each of the datastore value types has a corresponding Property class provided by the google.appengine.ext.db module.

Types and Property Classes describes all of the supported value types and their corresponding Property classes. Several special value types are described below.

Strings and Blobs

The datastore supports two value types for storing text: short text strings up to 1500 bytes in length, and long text strings up to one megabyte in length. Short strings are indexed and can be used in query filter conditions and sort orders. Long strings are not indexed and cannot be used in filter conditions or sort orders.

A short string value can be either a unicode value or a str value. If the value is a str, an encoding of 'ascii' is assumed. To specify a different encoding for a str value, you can convert it to a unicode value with the unicode() type constructor, which takes the str and the name of the encoding as arguments. Short strings can be modeled using the StringProperty class.

class MyModel(db.Model):
    string = db.StringProperty()

obj = MyModel()

# Python Unicode literal syntax fully describes characters in a text string.
obj.string = u"kittens"

# unicode() converts a byte string to a Unicode string using the named codec.
obj.string = unicode("kittens", "latin-1")

# A byte string is assumed to be text encoded as ASCII (the 'ascii' codec).
obj.string = "kittens"

# Short string properties can be used in query filters.
results = db.GqlQuery("SELECT * FROM MyModel WHERE string = :1", u"kittens")

A long string value is represented by a db.Text instance. Its constructor takes either a unicode value, or a str value and optionally the name of the encoding used in the str. Long strings can be modeled using the TextProperty class.

class MyModel(db.Model):
    text = db.TextProperty()

obj = MyModel()

# Text() can take a Unicode string.
obj.text = u"lots of kittens"

# Text() can take a byte string and the name of an encoding.
obj.text = db.Text("lots of kittens", "latin-1")

# If no encoding is specified, a byte string is assumed to be ASCII text.
obj.text = "lots of kittens"

# Text properties can store large values.
obj.text = db.Text(open("a_tale_of_two_cities.txt").read(), "utf-8")

The datastore also supports two similar types for non-text byte strings: db.ByteString and db.Blob. These values are strings of raw bytes, and are not treated as encoded text (such as UTF-8).

Like db.StringProperty values, db.ByteString values are indexed. Like db.TextProperty properties, db.ByteString values are limited to 1500 bytes. A ByteString instance represents a short string of bytes, and takes a str value as an argument to its constructor. Byte strings are modeled using the ByteStringProperty class.

Like db.Text, a db.Blob value can be as large as one megabyte, but is not indexed, and cannot be used in query filters or sort orders. The db.Blob class takes a str value as an argument to its constructor, or you can assign the value directly. Blobs are modeled using the BlobProperty class.

class MyModel(db.Model):
    blob = db.BlobProperty()

obj = MyModel()

obj.blob = open("image.png").read()

Lists

A property can have multiple values, represented in the datastore API as a Python list. The list can contain values of any of the value types supported by the datastore. A single list property may even have values of different types.

Order is generally preserved, so when entities are returned by queries and get(), the list properties values are in the same order as when they were stored. There's one exception to this: Blob and Text values are moved to the end of the list; however, they retain their original order relative to each other.

The ListProperty class models a list, and enforces that all values in the list are of a given type. For convenience, the library also provides StringListProperty, similar to ListProperty(basestring).

class MyModel(db.Model):
    numbers = db.ListProperty(long)

obj = MyModel()
obj.numbers = [2, 4, 6, 8, 10]

obj.numbers = ["hello"]  # ERROR: MyModel.numbers must be a list of longs.

A query with filters on a list property tests each value in the list individually. The entity will match the query only if some value in the list passes all of the filters on that property. See the Datastore Queries page for more information.

# Get all entities where numbers contains a 6.
results = db.GqlQuery("SELECT * FROM MyModel WHERE numbers = 6")

# Get all entities where numbers contains at least one element less than 10.
results = db.GqlQuery("SELECT * FROM MyModel WHERE numbers < 10")

Query filters only operate on list members. There is no way to test two lists for similarity in a query filter.

Internally, the datastore represents a list property value as multiple values for the property. If a list property value is the empty list, then the property has no representation in the datastore. The datastore API treats this situation differently for static properties (with ListProperty) and dynamic properties:

  • A static ListProperty can be assigned the empty list as a value. The property does not exist in the datastore, but the model instance behaves as if the value is the empty list. A static ListProperty cannot have a value of None.
  • A dynamic property with a list value cannot be assigned an empty list value. However, it can have a value of None, and can be deleted (using del).

The ListProperty model tests that a value added to the list is of the correct type, and throws a BadValueError if it isn't. This test occurs (and potentially fails) even when a previously stored entity is retrieved and loaded into the model. Because str values are converted to unicode values (as ASCII text) prior to storage, ListProperty(str) is treated as ListProperty(basestring), the Python data type which accepts both str and unicode values. You can also use StringListProperty() for this purpose.

For storing non-text byte strings, use db.Blob values. The bytes of a blob string are preserved when they are stored and retrieved. You can declare a property that is a list of blobs as ListProperty(db.Blob).

List properties can interact with sort orders in unusual ways; see the Datastore Queries page for details.

References

A property value can contain the key of another entity. The value is a Key instance.

The ReferenceProperty class models a key value, and enforces that all values refer to entities of a given kind. For convenience, the library also provides SelfReferenceProperty, equivalent to a ReferenceProperty that refers to the same kind as the entity with the property.

Assigning a model instance to a ReferenceProperty property automatically uses its key as the value.

class FirstModel(db.Model):
    prop = db.IntegerProperty()

class SecondModel(db.Model):
    reference = db.ReferenceProperty(FirstModel)

obj1 = FirstModel()
obj1.prop = 42
obj1.put()

obj2 = SecondModel()

# A reference value is the key of another entity.
obj2.reference = obj1.key()

# Assigning a model instance to a property uses the entity's key as the value.
obj2.reference = obj1
obj2.put()

A ReferenceProperty property value can be used as if it were the model instance of the referenced entity. If the referenced entity is not in memory, using the property as an instance automatically fetches the entity from the datastore. A ReferenceProperty also stores a key, but using the property causes the related entity to be loaded.

obj2.reference.prop = 999
obj2.reference.put()

results = db.GqlQuery("SELECT * FROM SecondModel")
another_obj = results.fetch(1)[0]
v = another_obj.reference.prop

If a key points to a non-existent entity, then accessing the property raises an error. If an application expects that a reference could be invalid, it can test for the existence of the object using a try/except block:

try:
  obj1 = obj2.reference
except db.ReferencePropertyResolveError:
  # Referenced entity was deleted or never existed.

ReferenceProperty has another handy feature: back-references. When a model has a ReferenceProperty to another model, each referenced entity gets a property whose value is a Query that returns all of the entities of the first model that refer to it.

# To fetch and iterate over every SecondModel entity that refers to the
# FirstModel instance obj1:
for obj in obj1.secondmodel_set:
    # ...

The name of the back-reference property defaults to modelname_set (with the name of the model class in lowercase letters, and "_set" added to the end), and can be adjusted using the collection_name argument to the ReferenceProperty constructor.

If you have multiple ReferenceProperty values that refer to the same model class, the default construction of the back-reference property raises an error:

class FirstModel(db.Model):
    prop = db.IntegerProperty()

# This class raises a DuplicatePropertyError with the message
# "Class Firstmodel already has property secondmodel_set"
class SecondModel(db.Model):
    reference_one = db.ReferenceProperty(FirstModel)
    reference_two = db.ReferenceProperty(FirstModel)

To avoid this error, you must explicitly set the collection_name argument:

class FirstModel(db.Model):
    prop = db.IntegerProperty()

# This class runs fine
class SecondModel(db.Model):
    reference_one = db.ReferenceProperty(FirstModel,
        collection_name="secondmodel_reference_one_set")
    reference_two = db.ReferenceProperty(FirstModel,
        collection_name="secondmodel_reference_two_set")

Automatic referencing and dereferencing of model instances, type checking and back-references are only available using the ReferenceProperty model property class. Keys stored as values of Expando dynamic properties or ListProperty values do not have these features.