An aggregation query processes the data from multiple indexed entities to return a single summary value. Firestore in Datastore mode supports the following aggregation queries:
count()
sum()
avg()
Aggregation queries simplify your application code and cost less than fetching each entity for processing. Read this page to learn how to use aggregation queries.
count()
aggregation
Use the count()
aggregation to return the total number of indexed entities
that match a given query. For example, this count()
aggregation returns the
total number of entities in a kind.
Java
Python
Go
aggregationCountQuery := datastore.NewQuery("Task"). NewAggregationQuery(). WithCount("total_tasks") countResults, err := client.RunAggregationQuery(ctx, aggregationCountQuery) count := countResults["total_tasks"] countValue := count.(*datastorepb.Value) fmt.Printf("Number of results from query: %d\n", countValue.GetIntegerValue())
GQL
AGGREGATE COUNT(*) AS total OVER ( SELECT * AS total FROM tasks )
GQL supports a simplified form of count()
queries:
SELECT COUNT(*) AS total FROM tasks
This example uses an optional alias of total
.
The simplified form supports only FROM
and WHERE
clauses. See the GQL reference for more information.
The count()
aggregation takes into account any filters on the query and any
limit
clauses. For example, the following aggregation returns a count of
the number of entities that match the given filters.
Java
Python
Go
aggregationCountQuery := datastore.NewQuery("Task"). FilterField("done", "=", true). NewAggregationQuery(). WithCount("total_tasks_done") countResults, err := client.RunAggregationQuery(ctx, aggregationCountQuery) count := countResults["total_tasks_done"] countValue := count.(*datastorepb.Value) fmt.Printf("Number of results from query: %d\n", countValue.GetIntegerValue())
GQL
AGGREGATE COUNT(*) OVER ( SELECT * FROM tasks WHERE is_done = false AND tag = 'house')
GQL supports a simplified form of count()
queries:
SELECT COUNT(*) AS total FROM tasks WHERE is_done = false AND tag = 'house'
This example uses an optional alias of total
.
The simplified form supports only FROM
and WHERE
clauses. See the GQL reference for more information.
This example shows how to count up to a certain value. You can use this to, for example, stop counting at a certain number and inform users that they exceeded that number.
Java
Python
Go
aggregationCountQuery := datastore.NewQuery("Task"). Limit(2). NewAggregationQuery(). WithCount("at_least") countResults, err := client.RunAggregationQuery(ctx, aggregationCountQuery) count := countResults["at_least"] countValue := count.(*datastorepb.Value) fmt.Printf("We have at least %d tasks\n", countValue.GetIntegerValue())
GQL
AGGREGATE COUNT_UP_TO(1000) OVER ( SELECT * FROM tasks WHERE is_done = false)
GQL supports a simplified form of count_up_to()
queries:
SELECT COUNT_UP_TO(1000) AS total FROM tasks WHERE is_done = false AND tag = 'house'
This example uses an optional alias of total
.
The simplified form supports only FROM
and WHERE
clauses. See the GQL reference for more information.
sum()
aggregation
Use the sum()
aggregation to return the total sum of numeric values that match
a given query. For example, the following sum()
aggregation returns the total sum of numeric values of the given property from
entities of the given kind:
Java
Python
Go
aggregationSumQuery := datastore.NewQuery("Task"). NewAggregationQuery(). WithSum("hours", "total_hours") sumResults, err := client.RunAggregationQuery(ctx, aggregationSumQuery) sum := sumResults["total_hours"] sumValue := sum.(*datastorepb.Value) fmt.Printf("Sum of results from query: %d\n", sumValue.GetIntegerValue())
GQL
AGGREGATE SUM(hours) AS total_hours OVER ( SELECT * FROM tasks )
GQL supports a simplified form of sum()
queries:
SELECT SUM(hours) AS total_hours FROM tasks
This example uses an optional alias of total_hours
.
The simplified form supports only FROM
and WHERE
clauses. See the GQL reference for more information.
The sum()
aggregation takes into account any filters on the query and any
limit
clauses. For example, the following aggregation returns a sum of
the specified property with a numeric value in entities that match the given
filters.
Java
Python
This query requires an index such as:
- kind: Task properties: - name: done - name: hours
Go
aggregationSumQuery := datastore.NewQuery("Task"). FilterField("done", "=", false). FilterField("tag", "=", "house"). NewAggregationQuery(). WithSum("hours", "total_hours") sumResults, err := client.RunAggregationQuery(ctx, aggregationSumQuery) sum := sumResults["total_hours"] sumValue := sum.(*datastorepb.Value) fmt.Printf("Sum of results from query: %d\n", sumValue.GetIntegerValue())
GQL
AGGREGATE SUM(hours) AS total_hours OVER ( SELECT * FROM tasks WHERE is_done = false AND tag = 'house' )
GQL supports a simplified form of sum()
queries:
SELECT SUM(hours) AS total_hours FROM tasks WHERE is_done = false AND tag = 'house'
This example uses an optional alias of total_hours
.
The simplified form supports only FROM
and WHERE
clauses. See the GQL reference for more information.
avg()
aggregation
Use the avg()
aggregation to return the average of numeric values that
match a given query. For example, the following avg()
aggregation returns the arithmetic
mean of the specified property from the numeric property values of entities
that match the query:
Java
Python
Go
aggregationAvgQuery := datastore.NewQuery("Task"). NewAggregationQuery(). WithAvg("hours", "avg_hours") avgResults, err := client.RunAggregationQuery(ctx, aggregationAvgQuery) avg := avgResults["avg_hours"] avgValue := avg.(*datastorepb.Value) fmt.Printf("average hours: %f\n", avgValue.GetDoubleValue())
GQL
AGGREGATE AVG(hours) as avg_hours OVER ( SELECT * FROM tasks )
GQL supports a simplified form of avg()
queries:
SELECT AVG(hours) as avg_hours
This example uses an optional alias of avg_hours
.
The simplified form supports only FROM
and WHERE
clauses. See the GQL reference for more information.
The avg()
aggregation takes into account any filters on the query and any
limit
clauses. For example, the following aggregation returns the arithmetic
mean of the specified property from the numeric property values of entities that
match the query filters.
Java
Python
This query requires an index such as:
- kind: Task properties: - name: done - name: hours
Go
aggregationAvgQuery := datastore.NewQuery("Task"). FilterField("done", "=", false). FilterField("tag", "=", "house"). NewAggregationQuery(). WithAvg("hours", "avg_hours") avgResults, err := client.RunAggregationQuery(ctx, aggregationAvgQuery) avg := avgResults["avg_hours"] avgValue := avg.(*datastorepb.Value) fmt.Printf("average hours: %f\n", avgValue.GetDoubleValue())
GQL
AGGREGATE AVG(hours) as avg_hours OVER ( SELECT * FROM tasks WHERE is_done = false AND tag = 'house' )
GQL supports a simplified form of avg()
queries:
SELECT AVG(hours) as avg_hours FROM tasks WHERE is_done = false AND tag = 'house'
This example uses an optional alias of avg_hours
.
The simplified form supports only FROM
and WHERE
clauses. See the GQL reference for more information.
Calculate multiple aggregations in a query
You can combine multiple aggregations in a single aggregation pipeline. This can reduce the number of index reads required. If the query includes aggregations on multiple fields, the query requires a composite index, and each aggregation calculation includes only those entities that contain all the fields used by each aggregation.
The following example performs multiple aggregations in a single aggregation query:
Java
Python
Go
aggregationQuery := datastore.NewQuery("Task"). NewAggregationQuery(). WithCount("total_tasks"). WithSum("hours", "total_hours"). WithAvg("hours", "avg_hours") Results, err := client.RunAggregationQuery(ctx, aggregationQuery) fmt.Printf("Number of results from query: %d\n", Results["total_tasks"].(*datastorepb.Value).GetIntegerValue()) fmt.Printf("Sum of results from query: %d\n", Results["total_hours"].(*datastorepb.Value).GetIntegerValue()) fmt.Printf("Avg of results from query: %f\n", Results["avg_hours"].(*datastorepb.Value).GetDoubleValue())
GQL
AGGREGATE SUM(hours) AS total_hours, COUNT(*) AS total_tasks OVER ( SELECT * FROM tasks WHERE is_done = false AND tag = 'house' )
GQL supports a simplified form for aggregation queries:
SELECT SUM(hours) AS total_hours, COUNT(*) AS total_tasks FROM tasks WHERE is_done = false AND tag = 'house'
This example uses the optional aliases of total_hours
and total_tasks
.
The simplified form supports only FROM
and WHERE
clauses. See the GQL reference for more information.
Queries with multiple aggregation include only the entities that contain all the properties in each aggregation. This might lead to different results from performing each aggregation separately.
Behavior and limitations
As you work with aggregation queries, note the following behavior and limitations:
- The query you provide to the aggregation must meet the restrictions on queries.
If an aggregation query cannot resolve within 60 seconds, it returns a
DEADLINE_EXCEEDED
error. Performance depends on your index configuration and on the size of the dataset.If the operation cannot be completed within the 60 second deadline, a possible workaround is to use cursors to merge multiple aggregations.
Aggregation queries read from index entries and include only indexed properties in the calculation.
Adding an
OrderBy
clause to the query limits the aggregation to the entities where the sorting property exists.In GQL, the simplified form does not support
ORDER BY
,LIMIT
, orOFFSET
clauses.In a projection query, you can aggregate data only from the properties in the projection. For example, in the GQL query
SELECT a, b FROM k WHERE c = 1
, you can aggregate data only from amonga
orb
.A
count()
aggregation does not de-duplicate entities with array properties. Each array value that matches the query adds one to the count.For
sum()
andavg()
aggregations, Non-numeric values are ignored.sum()
andavg()
aggregation take into account only integer values, floating-point number values, and timestamps. Timestamps are converted to microsecond integer values forsum()
,avg()
, and projections.When combining multiple aggregations in a single query, note that
sum()
andavg()
ignore non-numeric values whilecount()
includes non-numeric values.If you combine aggregations that are on different properties, the calculation includes only the entities that contain all those properties. This might lead to different results from performing each aggregation separately.
Pricing
Pricing for the count()
, sum()
, and avg()
aggregation queries depends on
the number of index entries scanned during the operation. You are billed one
entity read for up to 1,000 index entries matched. Subsequent index entries
matched cost additional read units. There is a minimum cost of one read unit
for every query. For pricing
information, see Firestore in Datastore mode pricing.
If you combine multiple aggregation in a single query, the query uses the same index for each aggregation and performs a single scan over the data. This can help reduce the number of index scans and reads billed when compared to performing each aggregation separately. However, queries with multiple aggregations include only the entities that contain all those properties. This might lead to different results from performing each aggregation separately.
What's next
- Learn about queries.
- Learn about best practices for Firestore in Datastore mode.