This document describes the data model for objects stored in Cloud Datastore, how queries are structured using the API, and how transactions are processed.
Objects in Cloud Datastore are known as entities. An entity has one or more named properties, each of which can have one or more values. Property values can belong to a variety of data types, including integers, floating-point numbers, strings, dates, and binary data, among others. A query on a property with multiple values tests whether any of the values meets the query criteria. This makes such properties useful for membership testing.
Kinds, keys, and identifiers
Each Cloud Datastore entity is of a particular kind, which categorizes the entity for the purpose of queries; for instance, a human resources application might represent each employee at a company with an entity of kind
Employee. In addition, each entity has its own key, which uniquely identifies it. The key consists of the following components:
- The entity's kind
- An identifier, which can be either
- a key name string
- an integer ID
- An optional ancestor path locating the entity within the Cloud Datastore hierarchy
The identifier is assigned when the entity is created. Because it is part of the entity's key, it is associated permanently with the entity and cannot be changed. It can be assigned in either of two ways:
- Your application can specify its own key name string for the entity.
- You can have Cloud Datastore automatically assign the entity an integer numeric ID.
Entities in Cloud Datastore form a hierarchically structured space similar to the directory structure of a file system. When you create an entity, you can optionally designate another entity as its parent; the new entity is a child of the parent entity (note that unlike in a file system, the parent entity need not actually exist). An entity without a parent is a root entity. The association between an entity and its parent is permanent, and cannot be changed once the entity is created. Cloud Datastore will never assign the same numeric ID to two entities with the same parent, or to two root entities (those without a parent).
An entity's parent, parent's parent, and so on recursively, are its ancestors; its children, children's children, and so on, are its descendants. A root entity and all of its descendants belong to the same entity group. The sequence of entities beginning with a root entity and proceeding from parent to child, leading to a given entity, constitute that entity's ancestor path. The complete key identifying the entity consists of a sequence of kind-identifier pairs specifying its ancestor path and terminating with those of the entity itself:
[Person:GreatGrandpa, Person:Grandpa, Person:Dad, Person:Me]
For a root entity, the ancestor path is empty and the key consists solely of the entity's own kind and identifier:
This concept is illustrated by the following diagram:
Queries and indexes
In addition to retrieving entities from Cloud Datastore directly by their keys, an application can perform a query to retrieve them by the values of their properties. The query operates on entities of a given kind; it can specify filters on the entities' property values, keys, and ancestors, and can return zero or more entities as results. A query can also specify sort orders to sequence the results by their property values. The results include all entities that have at least one (possibly null) value for every property named in the filters and sort orders, and whose property values meet all the specified filter criteria. The query can return entire entities, projected entities, or just entity keys.
A typical query includes the following:
- An entity kind to which the query applies
- Zero or more filters based on the entities' property values, keys, and ancestors
- Zero or more sort orders to sequence the results
Note: To conserve memory and improve performance, a query should, whenever possible, specify a limit on the number of results returned.
A query can also include an ancestor filter limiting the results to just the entity group descended from a specified ancestor. Such a query is known as an ancestor query. By default, ancestor queries return strongly consistent results, which are guaranteed to be up to date with the latest changes to the data. Non-ancestor queries, by contrast, can span the entire Cloud Datastore rather than just a single entity group, but are only eventually consistent and may return stale results. If strong consistency is important to your application, you may need to take this into account when structuring your data, placing related entities in the same entity group so they can be retrieved with an ancestor rather than a non-ancestor query; see Structuring Data for Strong Consistency for more information.
App Engine predefines a simple index on each property of an entity.
An App Engine application can define further custom indexes in an
configuration file named
which is generated in your application's
. The development server automatically
adds suggestions to this file as it encounters queries that cannot be executed with the existing indexes.
You can tune indexes manually by editing the file before uploading the application.
Note: The index-based query mechanism supports a wide range of queries and is suitable for most applications. However, it does not support some kinds of query common in other database technologies: in particular, joins and aggregate queries aren't supported within the Cloud Datastore query engine. See Datastore Queries page for limitations on Cloud Datastore queries.
Every attempt to insert, update, or delete an entity takes place in the context of a transaction. A single transaction can include any number of such operations. To maintain the consistency of the data, the transaction ensures that all of the operations it contains are applied to Cloud Datastore as a unit or, if any of the operations fails, that none of them are applied.
You can perform multiple actions on an entity within a single transaction. For example, to increment a counter field in an object, you need to read the value of the counter, calculate the new value, and then store it back. Without a transaction, it is possible for another process to increment the counter between the time you read the value and the time you update it, causing your application to overwrite the updated value. Doing the read, calculation, and write in a single transaction ensures that no other process can interfere with the increment.
Transactions and entity groups
Only ancestor queries are allowed within a transaction: that is, each transactional query must be limited to a single entity group. The transaction itself can apply to multiple entities, which can belong either to a single entity group or (in the case of a cross-group transaction) to as many as twenty-five different entity groups.
Cloud Datastore uses optimistic concurrency to manage transactions. When two or more transactions try to change the same entity group at the same time (either updating existing entities or creating new ones), the first transaction to commit will succeed and all others will fail on commit. These other transactions can then be retried on the updated data. Note that this limits the number of concurrent writes you can do to any entity in a given entity group.
A transaction on entities belonging to different entity groups is called a cross-group (XG) transaction. The transaction can be applied across a maximum of twenty-five entity groups, and will succeed as long as no concurrent transaction touches any of the entity groups to which it applies. This gives you more flexibility in organizing your data, because you aren't forced to put disparate pieces of data under the same ancestor just to perform atomic writes on them.
As in a single-group transaction, you cannot perform a non-ancestor query in an XG transaction. You can, however, perform ancestor queries on separate entity groups. Nontransactional (non-ancestor) queries may see all, some, or none of the results of a previously committed transaction. (For background on this issue, see Cloud Datastore Writes and Data Visibility.) However, such nontransactional queries are more likely to see the results of a partially committed XG transaction than those of a partially committed single-group transaction.
An XG transaction that touches only a single entity group has exactly the same performance and cost as a single-group, non-XG transaction. In an XG transaction that touches multiple entity groups, operations cost the same as if they were performed in a non-XG transaction, but may experience higher latency.
Cloud Datastore writes and data visibility
Data is written to Cloud Datastore in two phases:
- In the Commit phase, the entity data is recorded in the transaction logs of a majority of replicas, and any replicas in which it was not recorded are marked as not having up-to-date logs.
- The Apply phase occurs independently in each replica, and consists of two actions performed in parallel:
- The entity data is written in that replica.
- The index rows for the entity are written in that replica. (Note that this can take longer than writing the data itself.)
The write operation returns immediately after the Commit phase and the Apply phase then takes place asynchronously, possibly at different times in each replica, and possibly with delays of a few hundred milliseconds or more from the completion of the Commit phase. If a failure occurs during the Commit phase, there are automatic retries; but if failures continue, Cloud Datastore returns an error message that your application receives as an exception. If the Commit phase succeeds but the Apply fails in a particular replica, the Apply is rolled forward to completion in that replica when one of the following occurs:
- Periodic Cloud Datastore sweeps check for uncompleted Commit jobs and apply them.
- Certain operations (
delete, and ancestor queries) that use the affected entity group cause any changes that have been committed but not yet applied to be completed in the replica in which they are executing before proceeding with the new operation.
This write behavior can have several implications on how and when data is visible to your application at different parts of the Commit and Apply phases:
- If a write operation reports a timeout error, it cannot be determined (without attempting to read the data) whether the operation succeeded or failed.
- Because Cloud Datastore gets and ancestor queries apply any outstanding modifications to the replica on which they are executing, these operations always see a consistent view of all previous successful transactions. This means that a
getoperation (looking up an updated entity by its key) is guaranteed to see the latest version of that entity.
- Non-ancestor queries may return stale results because they may be executing on a replica on which the latest transactions have not yet been applied. This can occur even if an operation was performed that is guaranteed to apply outstanding transactions because the query may execute on a different replica than the previous operation.
- The timing of concurrent changes may affect the results of non-ancestor queries. If an entity initially satisfies a query but is later changed so that it no longer does, the entity may still be included in the query's result set if the changes had not yet been applied to the indexes in the replica on which the query was executed.=
Cloud Datastore statistics
Cloud Datastore maintains statistics about the data stored for an application, such as how many entities there are of a given kind or how much space is used by property values of a given type. You can view these statistics in the GCP Console Cloud Datastore dashboard page. You can also use Cloud Datastore API to access these values programmatically from within the application by querying for specially named entities; see Datastore Statistics in Java for more information.