Google Cloud Platform
Cloud Datastore

Index Configuration

Google Cloud Datastore uses indexes for every query your application makes. These indexes are updated whenever an entity changes, so the results can be returned quickly when the app makes a query. To do this, the datastore needs to know in advance which queries the application will make. You specify which indexes your app needs in a configuration file. If you're using the protocol buffer API, the development server can generate the datastore index configuration automatically as you test your app. If you're using the JSON API you'll need to do a bit more work. We hope to reach feature parity between these two APIs in an upcoming release.

  1. System requirements
  2. About datastore-indexes.xml
  3. Using automatic index configuration
  4. Manual index configuration
  5. Updating indexes
  6. Deleting unused indexes
  7. Passwordless login with OAuth2
  8. Command-line arguments

System requirements

To use the gcd tool, you must have Java 7 installed on your computer.

About datastore-indexes.xml

You specify configuration for datastore indexes in WEB-INF/datastore-indexes.xml, in your dataset directory. This is an XML file whose root element is <datastore-indexes>. It contains zero or more <datastore-index> elements, one for each index that the Datastore should maintain.

As described on the Datastore Indexes page, an index is a table of values for a set of given properties for entities of a given kind. Each column of property values is sorted either in ascending or descending order. Configuration for an index specifies the kind of the entities, and the names of the properties and their sort orders.

Here is an example that specifies two indexes:

<?xml version="1.0" encoding="utf-8"?>
    <datastore-index kind="Employee" ancestor="false">
        <property name="lastName" direction="asc" />
        <property name="hireDate" direction="desc" />
    <datastore-index kind="Project" ancestor="false">
        <property name="dueDate" direction="asc" />
        <property name="cost" direction="desc" />

The <datastore-indexes> element has an autoGenerate attribute that controls whether this file should be considered along with automatically generated index configuration. See Using Automatic Index Configuration below.

Each <datastore-index> element represents an index. The kind attribute specifies the kind of the entities to index. The ancestor attribute is true if the index supports queries that filter by ancestor-key to constrain results to a single entity group, false otherwise.

The <property> elements in a <datastore-index> represent the entity properties to index. The name attribute is the property name, and the direction attribute is the sort order, either asc for ascending or desc for descending. The order of the property elements specifies the order in the index: rows are sorted by the first property, then the second property, and so on.

Using automatic index configuration

Determining the indexes required by your application's queries manually can be tedious and error-prone. Thankfully, the development server can determine the index configuration for you. To use automatic index configuration, add the attribute autoGenerate="true" to your WEB-INF/datastore-indexes.xml file's <datastore-indexes> element. Automatic index configuration is also used if your dataset does not have a datastore-indexes.xml file.

With automatic index configuration enabled, the development server maintains a file named WEB-INF/appengine-generated/datastore-indexes-auto.xml in your dataset directory. When your app, running against the development server, attempts a datastore query for which there is no corresponding index in either datastore-indexes.xml or datastore-indexes-auto.xml, the server adds the appropriate configuration to datastore-indexes-auto.xml.

If automatic index configuration is enabled when you update your production indexes (see Updating Indexes), the tool uses both datastore-indexes.xml and datastore-indexes-auto.xml to determine which indexes need to be built for your dataset in production.

If autoGenerate="false" is in your datastore-indexes.xml, the development server and the command line tool that updates your indexes in production (see Updating Indexes) ignore the contents of datastore-indexes-auto.xml. If the app running locally performs a query whose index is not specified in datastore-indexes.xml, the development server throws an exception, just as the production Datastore would.

It's a good idea to occasionally move index configuration from datastore-indexes-auto.xml to datastore-indexes.xml, then disable automatic index configuration and test your app against the development server. This makes it easy to maintain indexes without having to manage two files, and ensures that your testing will reproduce errors caused by missing index configuration.

Manual index configuration

The legacy App Engine Datastore viewer allows you to interactively query the Datastore using a query language called GQL. These interactive queries will succeed if your dataset has the necessary index to fulfill the query and fail with an error containing the xml definition of the missing index if it does not. By translating your Google Cloud Datastore queries to GQL you can use interactive queries and the detailed error messages they return to determine which indexes need to be added to your dataset's WEB-INF/datastore-indexes.xmlfile. GQL provides a superset of the query functionality available in the Google Cloud Datastore query API so this translation should always be possible.

To run a GQL query using the legacy App Engine Datastore viewer:

  1. Go to the App Engine Administration Console.
  2. In the list of projects, click the project for your dataset (the project ID will match your dataset ID).
  3. Click Datastore Viewer on the left-hand side of the page.
  4. Click +Options to open up the interactive query form.
  5. Type your query in the form and click Run Query.
  6. If the query succeeds, no further action is necessary, you have the indexes you need for that query.
  7. If the query fails, copy the xml index definition contained in the error message and add it to your WEB-INF/datastore-indexes.xml file.

Updating indexes

Google Cloud Datastore provides the gcd command line tool for updating the indexes that are available to your production dataset. You can download gcd from here. This tool looks at your dataset index configuration (the datastore-indexes.xml and appengine-generated/datastore-indexes-auto.xml files), and if the index configuration defines an index that doesn't exist yet in your production dataset, the Datastore creates the new index.

gcd-v1beta2-rev1-4.0.0/ updateindexes [options] <dataset-directory>

where options are command line arguments supplied to the gcd tool.

Depending on how much data is already in the Datastore that belongs in the new index, the process of creating the index may take a while. If the app performs a query that requires an index that hasn't finished building yet, the query will raise an exception. To prevent this, you must be careful about deploying a new version of your app that requires a new index before the new index finishes building.

You can check the status of the dataset's indexes from the Indexes page in the Cloud Platform Console.

Deleting unused indexes

When you change or remove an index from the index configuration, the original index is not deleted from the Datastore automatically. This gives you the opportunity to leave an older version of the app running while new indexes are being built, or to revert to the older version immediately if a problem is discovered with a newer version.

When you are sure that old indexes are no longer needed, you can delete them from the Datastore using the vacuumindexes action. The command to vacuum indexes is as follows:

gcd-v1beta2-rev1-4.0.0/ vacuumindexes [options] <dataset-directory>

where options are command line arguments supplied to the gcd tool.

This command deletes all indexes for the dataset that are not mentioned in the local versions of datastore-indexes.xml and appengine-generated/datastore-indexes-auto.xml.

Passwordless login with OAuth2

If you don't want to enter your login credentials, you can use an OAuth 2.0 token instead. This token gives access to the Datastore, but not to other parts of your Google account; if your Google account uses two-factor authentication, you'll find this especially convenient. You can store this token to permanently log in on this machine.

To set this up, set the --auth_mode option to oauth2.

gcd-v1beta2-rev1-4.0.0/ updateindexes --auth_mode=oauth2 <dataset-directory>

A page will appear in your web browser prompting you for authorization. If no browser could be started, then gcd will instead show you a URL to copy/paste into your browser. Log in if necessary. The page will ask whether you wish to give the Datastore access. Click OK, then you will be given a token that you will need to supply to the prompt from gcd.

From now on, when you use the --auth_mode=oauth2 option it uses the saved credentials.

Command-line arguments

The gcd tool accepts the following options for index management:

Dataset ID to use instead of the one in the project directory. If your local dataset ID and the Cloud Datastore project ID don't match, you need to use this option in order to update indexes in the Cloud Datastore project.
Authentication mode for connecting to Cloud Datastore. oauth2 will take you through an OAuth2 flow using a web browser. password will prompt you for a password on the command line.
Force deletion of indexes without being prompted (For vacuumindexes only.)