Note: Developers building new applications are strongly encouraged to use the NDB Client Library, which has several benefits compared to this client library, such as automatic entity caching via the Memcache API. If you are currently using the older DB Client Library, read the DB to NDB Migration Guide
App Engine predefines a simple index on each property of an entity.
An App Engine application can define further custom indexes in an
index
configuration file named
index.yaml
. The development server automatically
adds suggestions to this file as it encounters queries that cannot be executed with the existing indexes.
You can tune indexes manually by editing the file before uploading the application.
Note: The index-based query mechanism supports a wide range of queries and is suitable for most applications. However, it does not support some kinds of query common in other database technologies: in particular, joins and aggregate queries aren't supported within the Datastore query engine. See Datastore Queries page for limitations on Datastore queries.
Index definition and structure
An index is defined on a list of properties of a given entity kind, with a corresponding order (ascending or descending) for each property. For use with ancestor queries, the index may also optionally include an entity's ancestors.
An index table contains a column for every property named in the index's definition. Each row of the table represents an entity in Datastore that is a potential result for queries based on the index. An entity is included in the index only if it has an indexed value set for every property used in the index; if the index definition refers to a property for which the entity has no value, that entity will not appear in the index and hence will never be returned as a result for any query based on the index.
Note: Datastore distinguishes
between an entity that does not possess a property and one that possesses the
property with a null value (None
). If you explicitly assign a
null value to an entity's property, that entity may be included in the results
of a query referring to that property.
Note: Indexes composed of multiple properties require that each individual property must not be set to unindexed.
The rows of an index table are sorted first by ancestor and then by property values, in the order specified in the index definition. The perfect index for a query, which allows the query to be executed most efficiently, is defined on the following properties, in order:
- Properties used in equality filters
- Property used in an inequality filter (of which there can be no more than one)
- Properties used in sort orders
This ensures that all results for every possible execution of the query appear in consecutive rows of the table. Datastore executes a query using a perfect index by the following steps:
- Identifies the index corresponding to the query's kind, filter properties, filter operators, and sort orders.
- Scans from the beginning of the index to the first entity that meets all of the query's filter conditions.
- Continues scanning the index, returning each entity in turn, until it
- encounters an entity that does not meet the filter conditions, or
- reaches the end of the index, or
- has collected the maximum number of results requested by the query.
For example, consider the following query (stated in GQL):
SELECT * FROM Person WHERE last_name = "Smith"
AND height < 72
ORDER BY height DESC
The perfect index for this query is a table of keys for entities of kind
Person
, with columns for the values of the
last_name
and
height
properties. The index
is sorted first in ascending order by
last_name
and then in descending order by
height
.
To generate these indexes, configure your indexes like this:
indexes:
- kind: Person
properties:
- name: last_name
direction: asc
- name: height
direction: desc
Two queries of the same form but with different filter values use the same index. For example, the following query uses the same index as the one above:
SELECT * FROM Person WHERE last_name = "Jones"
AND height < 63
ORDER BY height DESC
The following two queries also use the same index, despite their different forms:
SELECT * FROM Person WHERE last_name = "Friedkin"
AND first_name = "Damian"
ORDER BY height ASC
and
SELECT * FROM Person WHERE last_name = "Blair"
ORDER BY first_name, height ASC
Index configuration
By default, Datastore automatically predefines an index for each
property of each entity kind. These predefined indexes are sufficient to perform
many simple queries, such as equality-only queries and simple inequality
queries. For all other queries, the application must define the indexes it needs
in an index configuration
file
named index.yaml
. If the application tries to perform a query
that cannot be executed with the available indexes (either predefined or
specified in the index configuration file), the query will fail
with a NeedIndexError
exception.
Datastore builds automatic indexes for queries of the following forms:
- Kindless queries using only ancestor and key filters
- Queries using only ancestor and equality filters
- Queries using only inequality filters (which are limited to a single property)
- Queries using only ancestor filters, equality filters on properties, and inequality filters on keys
- Queries with no filters and only one sort order on a property, either ascending or descending
Other forms of query require their indexes to be specified in the index configuration file, including:
- Queries with ancestor and inequality filters
- Queries with one or more inequality filters on a property and one or more equality filters on other properties
- Queries with a sort order on keys in descending order
- Queries with multiple sort orders
Indexes and properties
Here are a few special considerations to keep in mind about indexes and how they relate to the properties of entities in Datastore:
Properties with mixed value types
When two entities have properties of the same name but different value types, an
index of the property sorts the entities first by
value type and then by a
secondary ordering
appropriate to each type. For example, if two entities each have a property
named age
, one with an integer value and one with a string value, the entity
with the integer value always precedes the one with the string value when sorted
by the age
property, regardless of the property values themselves.
This is especially worth noting in the case of integers and floating-point
numbers, which are treated as separate types by Datastore.
Because all integers are sorted before all floats, a property with the integer
value 38
is sorted before one with the floating-point value 37.5
.
Unindexed properties
If you know you will never have to filter or sort on a particular property, you can tell Datastore not to maintain index entries for that property by declaring the property unindexed. This lowers the cost of running your application by decreasing the number of Datastore writes it has to perform. An entity with an unindexed property behaves as if the property were not set: queries with a filter or sort order on the unindexed property will never match that entity.
Note:
If a property appears in an index composed of multiple properties, then setting
it to unindexed will prevent it from being indexed in the composed index.
For example, suppose that an entity has properties a and b and
that you want to create an index able to satisfy queries like
WHERE a ="bike" and b="red"
. Also suppose that you don't care
about the queries WHERE a="bike"
and WHERE b="red"
.
If you set a to unindexed and create an index for a and b
Datastore will not create index entries for the a and
b index and so the WHERE a="bike" and b="red"
query won't
work. For Datastore to create entries for the a and
b indexes, both a and b must be indexed.
You declare a property unindexed by setting indexed=False
in the
property constructor:
class Person(db.Model):
name = db.StringProperty()
age = db.IntegerProperty(indexed=False)
You can later change the property back to indexed by calling the constructor
again with indexed=True
:
class Person(db.Model):
name = db.StringProperty()
age = db.IntegerProperty(indexed=True)
Note, however, that changing a property from unindexed to indexed does not affect any existing entities that may have been created before the change. Queries filtering on the property will not return such existing entities, because the entities weren't written to the query's index when they were created. To make the entities accessible by future queries, you must rewrite them to Datastore so that they will be entered in the appropriate indexes. That is, you must do the following for each such existing entity:
- Retrieve (get) the entity from Datastore.
- Write (put) the entity back to Datastore.
Similarly, changing a property from indexed to unindexed only affects entities subsequently written to Datastore. The index entries for any existing entities with that property will continue to exist until the entities are updated or deleted. To avoid unwanted results, you must purge your code of all queries that filter or sort by the (now unindexed) property.
Index limits
Datastore imposes limits on the number and overall size of index entries that can be associated with a single entity. These limits are large, and most applications are not affected. However, there are circumstances in which you might encounter the limits.
As described above,
Datastore creates an entry in a predefined index for every
property of every entity except
long text strings
(Text
) and long byte strings
(Blob
)
and those you have explicitly
declared as unindexed. The property
may also be included in additional, custom indexes declared in your
index.yaml
configuration
file. Provided that
an entity has no list properties, it will have at most one entry in each such
custom index (for non-ancestor indexes) or one for each of the entity's
ancestors (for ancestor indexes). Each of these index entries must be updated
every time the value of the property changes.
For a property that has a single value for each entity, each possible value needs to be stored just once per entity in the property's predefined index. Even so, it is possible for an entity with a large number of such single-valued properties to exceed the index entry or size limit. Similarly, an entity that can have multiple values for the same property requires a separate index entry for each value; again, if the number of possible values is large, such an entity can exceed the entry limit.
The situation becomes worse in the case of entities with multiple properties, each of which can take on multiple values. To accommodate such an entity, the index must include an entry for every possible combination of property values. Custom indexes that refer to multiple properties, each with multiple values, can "explode" combinatorially, requiring large numbers of entries for an entity with only a relatively small number of possible property values. Such exploding indexes can dramatically increase the cost of writing an entity to Datastore, because of the large number of index entries that must be updated, and also can easily cause the entity to exceed the index entry or size limit.
Consider the query
SELECT * FROM Widget WHERE x=1 AND y=2 ORDER BY date
which causes the SDK to suggest the following index:
indexes:
- kind: Widget
properties:
- name: x
- name: y
- name: date
|x|
*
|y|
*
|date|
entries for each
entity (where |x|
denotes the number of values associated with the entity for
property x
). For example, the following code
class Widget(db.Expando):
pass
e2 = Widget()
e2.x = [1, 2, 3, 4]
e2.y = ['red', 'green', 'blue']
e2.date = datetime.datetime.now()
e2.put()
creates an entity with four values for property x
, three values for property
y
, and date
set to the current date. This will require 12 index entries, one
for each possible combination of property values:
(1
, "red"
, <now>
)
(1
, "green"
, <now>
)
(1
, "blue"
, <now>
)
(2
, "red"
, <now>
)
(2
, "green"
, <now>
)
(2
, "blue"
, <now>
)
(3
, "red"
, <now>
)
(3
, "green"
, <now>
)
(3
, "blue"
, <now>
)
(4
, "red"
, <now>
)
(4
, "green"
, <now>
)
(4
, "blue"
, <now>
)
When the same property is repeated multiple times, Datastore can detect exploding indexes and suggest an alternative index. However, in all other circumstances (such as the query defined in this example), Datastore will generate an exploding index. In this case, you can circumvent the exploding index by manually configuring an index in your index configuration file:
indexes:
- kind: Widget
properties:
- name: x
- name: date
- kind: Widget
properties:
- name: y
- name: date
(|x|
*
|date|
+
|y|
*
|date|)
, or 7 entries instead of 12:
(1
, <now>
)
(2
, <now>
)
(3
, <now>
)
(4
, <now>
)
("red"
, <now>
)
("green"
, <now>
)
("blue"
, <now>
)
Any put operation that would cause an index to exceed the index entry or size
limit will fail with
a BadRequestError
exception. The text of the
exception describes which limit was
exceeded ("Too many indexed properties"
or "Index entries too large"
) and
which custom index was the cause. If you create a new index that would exceed
the limits for any entity when built, queries against the index will fail and
the index will appear in the Error
state in the Google Cloud console. To
resolve indexes in the Error
state:
Remove the index in the
Error
state from yourindex.yaml
file.Run the following command from the directory where your
index.yaml
is located to remove that index from Datastore:gcloud datastore indexes cleanup index.yaml
Resolve the cause of the error. For example:
- Reformulate the index definition and corresponding queries.
- Remove the entities that are causing the index to explode.
Add the index back to your
index.yaml
file.Run the following command from the directory where your
index.yaml
is located to create the index in Datastore:gcloud datastore indexes create index.yaml
You can avoid exploding indexes by avoiding queries that would require a custom index using a list property. As described above, this includes queries with multiple sort orders or queries with a mix of equality and inequality filters.