Composite Index Configuration

Firestore in Datastore mode uses indexes for every query your application makes. These indexes are updated whenever an entity changes, so the results can be returned quickly when the application makes a query. Datastore mode provides built-in indexes automatically, but needs to know in advance which composite indexes the application will require. You specify which composite indexes your application needs in a configuration file. The Datastore emulator can generate the Datastore mode composite index configuration automatically as you test your application. The gcloud command-line tool provides commands to update the indexes that are available to your production Datastore mode database.

System requirements

To use the gcloud CLI, you must have installed the Google Cloud CLI.

About index.yaml

Every Datastore mode query made by an application needs a corresponding index. Indexes for simple queries, such as queries over a single property, are created automatically. Composite indexes for complex queries must be defined in a configuration file named index.yaml. This file is uploaded with the application to create composite indexes in a Datastore mode database.

The Datastore emulator automatically adds items to this file when the application tries to execute a query that needs a composite index that does not have an appropriate entry in the configuration file. You can adjust composite indexes or create new ones manually by editing the file. The index.yaml is located in the <project-directory>/WEB-INF/ folder. By default, the data directory that contains WEB-INF/appengine-generated/index.yaml is ~/.config/gcloud/emulators/datastore/. See Datastore emulator project directories for additional details.

The following is an example of an index.yaml file:

indexes:

- kind: Task
  ancestor: no
  properties:
  - name: done
  - name: priority
    direction: desc

- kind: Task
  properties:
  - name: collaborators
    direction: asc
  - name: created
    direction: desc

- kind: TaskList
  ancestor: yes
  properties:
  - name: percent_complete
    direction: asc
  - name: type
    direction: asc

The syntax of index.yaml is the YAML format. For more information about this syntax, see the YAML website.

Composite Index definitions

index.yaml has a single list element called indexes. Each element in the list represents a composite index for the application.

An index element can have the following elements:

kind
The kind of the entity for the query. This element is required.
properties

A list of properties to include as columns of the composite index, in the order to be sorted: properties used in equality filters first, followed by the property used in inequality filters, then the sort orders and their directions.

Each element in this list has the following elements:

name
The Datastore mode name of the property.
direction
The direction to sort, either asc for ascending or desc for descending. This is only required for properties used in sort orders of the query, and must match the direction used by the query. The default is asc.
ancestor

yes if the query has an ancestor clause. The default is no.

Automatic and manual composite indexes

When the Datastore emulator adds a generated composite index definition to index.yaml, it does so below the following line, inserting it if necessary:

# AUTOGENERATED

The emulator considers all composite index definitions below this line to be automatic, and it may update existing definitions below this line as the application makes queries.

All composite index definitions above this line are considered to be under manual control, and are not updated by the emulator. The emulator will only make changes below the line, and will only do so if the complete index.yaml file does not describe a composite index that accounts for a query executed by the application. To take control of an automatic composite index definition, move it above this line.

Updating composite indexes

The datastore indexes create command looks at your local Datastore composite index configuration (the index.yaml file), and if the composite index configuration defines an composite index that doesn't exist yet in your production Datastore mode database, your database creates the new composite index. See the development workflow using the gcloud CLI for an example of how to use indexes create.

To create a composite index, the database must set up the composite index and then backfill the composite index with existing data. Composite index creation time is the sum of setup time and backfill time:

  • Setting up a composite index takes a few minutes. The minimum creation time for a composite index is a few minutes, even for an empty database.

  • Backfill time depends on how much existing data belongs in the new composite index. The more property values that belong in the composite index, the longer it takes to backfill the composite index.

If the application performs a query that requires a composite index that hasn't finished building yet, the query raises an exception. To prevent this, you must be careful about deploying a new version of your application that requires an composite index before the new composite index finishes building.

You can check the status of the composite indexes from the Indexes page in the Google Cloud console.

Deleting unused composite indexes

When you change or remove a composite index from the composite index configuration, the original composite index is not deleted from your Datastore mode database automatically. This gives you the opportunity to leave an older version of the application running while new composite indexes are being built, or to revert to the older version immediately if a problem is discovered with a newer version.

When you are sure that old composite indexes are no longer needed, you can delete them by using the datastore indexes cleanup command. This command deletes all composite indexes for the production Datastore mode instance that are not mentioned in the local version of index.yaml. See the development workflow using the gcloud CLI for an example of how to use indexes cleanup.

Command-line arguments

For details on command-line arguments for creating and cleaning composite indexes, see datastore indexes create and datastore indexes cleanup, respectively. For details on command-line arguments for the gcloud CLI, see the gcloud CLI reference.

Managing long-running operations

Composite index builds are long-running operations and can take a substantial amount of time to complete.

After you start a composite index build, Datastore mode assigns the operation a unique name. Operation names are prefixed with projects/[PROJECT_ID]/databases/(default)/operations/, for example:

projects/project-id/databases/(default)/operations/ASA1MTAwNDQxNAgadGx1YWZlZAcSeWx0aGdpbi1zYm9qLW5pbWRhEgopEg

However, you can leave out the prefix when specifying an operation name for the describe command.

Listing all long-running operations

To list long-running operations, use the gcloud datastore operations list command. This command lists ongoing and recently completed operations. Operations are listed for a few days after completion:

gcloud

gcloud datastore operations list

rest

Before using any of the request data, make the following replacements:

  • project-id: your project ID

HTTP method and URL:

GET https://datastore.googleapis.com/v1/projects/project-id/operations

To send your request, expand one of these options:

See information about the response below.

For example, a recently completed composite index build shows the following information:

{
  "operations": [
  {
    "name": "projects/project-id/operations/S01vcFVpSmdBQ0lDDCoDIGRiNTdiZDQNmE4YS0yMTVmNWUzZSQadGx1YWZlZAcSMXRzYWVzdS1yZXhlZG5pLW5pbWRhFQpWEg",
    "done": true,
    "metadata": {
      "@type": "type.googleapis.com/google.datastore.admin.v1.IndexOperationMetadata",
      "common": {
        "endTime": "2020-06-23T16:55:29.923562Z",
        "operationType": "CREATE_INDEX",
        "startTime": "2020-06-23T16:55:10Z",
        "state": "SUCCESSFUL"
      },
      "indexId": "CICAJiUpoMK",
      "progressEntities": {
        "workCompleted": "2193027",
        "workEstimated": "2198182"
      }
    },
    "response": {
      "@type": "type.googleapis.com/google.datastore.admin.v1.Index",
      "ancestor": "NONE",
      "indexId": "CICAJiUpoMK",
      "kind": "Task",
      "projectId": "project-id",
           "properties": [
        {
          "direction": "ASCENDING",
          "name": "priority"
        },
        {
          "direction": "ASCENDING",
          "name": "done"
        },
        {
          "direction": "DESCENDING",
          "name": "created"
        }
      ],
      "state": "READY"
    }
  },
  ]
}

Describing a single operation

Instead of listing all long-running operations, you can list the details of a single operation:

gcloud

Use the operations describe command to show the status of a composite index build.

gcloud datastore operations describe operation-name

rest

Before using any of the request data, make the following replacements:

  • project-id: your project ID

HTTP method and URL:

GET https://datastore.googleapis.com/v1/projects/project-id/operations

To send your request, expand one of these options:

See information about the response below.

Estimating the completion time

As your operation runs, see the value of the state field for the overall status of the operation.

A request for the status of a long-running operation also returns the metrics workEstimated and workCompleted. These metrics are returned for the number of entities. workEstimated shows the estimated total number of entities an operation will process, based on database statistics. workCompleted shows the number of entities processed so far. After the operation completes, workCompleted reflects the total number of entities that were actually processed, which might be different than the value of workEstimated.

Divide workCompleted by workEstimated for a rough progress estimate. The estimate might be inaccurate because it depends on delayed statistics collection.

For example, here is the progress status of a composite index build:

{
  "operations": [
    {
      "name": "projects/project-id/operations/AyAyMDBiM2U5NTgwZDAtZGIyYi0zYjc0LTIzYWEtZjg1ZGdWFmZWQHEjF0c2Flc3UtcmV4ZWRuaS1uaW1kYRUKSBI",
      "metadata": {
        "@type": "type.googleapis.com/google.datastore.admin.v1.IndexOperationMetadata",
        "common": {
          "operationType": "CREATE_INDEX",
          "startTime": "2020-06-23T16:52:25.697539Z",
          "state": "PROCESSING"
        },
        "progressEntities": {
          "workCompleted": "219327",
          "workEstimated": "2198182"
        }
       },
    },
    ...

When an operation is done, the operation description will contain "done": true. See the value of the state field for the result of the operation. If the done field is not set in the response, then its value is false. Do not depend on the existence of the done value for in-progress operations.