Aggregate values at write time

If you want to aggregate your data in Bigtable at write time, you can use aggregates. Aggregates are Bigtable table cells that aggregate cell values as the data is written. When you add a new value, an aggregation function merges the value with the aggregated value that is already in the cell. Other databases refer to similar functionality as counters or distributed counters.

The type of aggregation that Bigtable offers is sum. When you add a value to a sum aggregate cell, the cell value is replaced with the sum of the newly added value and the current cell value.

During the Preview, you can work with aggregates using the cbt CLI and the Bigtable client libraries for C++, Go, and Java.

This document provides an overview of aggregates, shows how to create an aggregate column family, and provides examples showing how to add a value to an aggregate cell. Before you read this document, you should be familiar with the Bigtable overview and Writes.

When to use aggregates

Bigtable aggregates are useful for situations where you care about data for an entity in aggregate and not as individual data points. If you're migrating to Bigtable from databases such as Apache Cassandra or Redis, you can use Bigtable aggregates in places where you previously relied on counters in these systems.

Time buckets

You can use time buckets to get aggregate values for a period of time, such as an hour, day, or week. Instead of aggregating data before or after it is written to your table, you add new values to aggregate cells in the table.

For example, if you run a service that helps charities raise money, you might want to know the amount of online donations per day for each campaign, but you don't need to know the exact time of each donation or the amount per hour. In your table, row keys represent charity IDs, and you create an aggregate column family called donations. The column qualifiers in the row are campaign IDs.

As each donation amount received for a given day for a campaign is received, it's added to the sum in the aggregate cell in the column for that day. Each add request for the cell uses a timestamp truncated to the beginning of the day, so that in effect each request has the same timestamp. Truncating the timestamps ensures that all of the donations from that day are added to the same cell. The next day, all of your requests go into a new cell, using timestamps that are truncated down to the new date, and that pattern continues.

Depending on your use case, you might choose to create new columns for your new aggregates instead. For more information on time buckets, see Schema design for time series data.

Streamlining workflows

Aggregates let you aggregate your data in your Bigtable table without needing to use any ETL or streaming processing software to aggregate your data before or after you write it to Bigtable. For example, if your application previously published messages to Pub/Sub and then used Dataflow to read the messages and aggregate the data before writing it to Bigtable, you could instead send the data directly to aggregate cells in Bigtable.

Aggregate column families

To create and update aggregate cells, you must have one or more aggregate column families in your table – column families that contain only aggregate cells. You can create them when you create a table, or you can add an aggregate column family to a table that is already in use. When you create the column family, you specify the aggregation type, such as sum.

You can't convert a column family that contains non-aggregate data into an aggregate column family. Columns in aggregate column families can't contain non-aggregate cells, and standard column families can't contain aggregate cells.

To create a new table with an aggregate column family, see Create a table. To add an aggregate column family to a table, see Add column families.

Aggregation types

Bigtable supports the aggregation type sum. The input type that is supported for sums is Int64.

Timestamps

An aggregate cell is defined by row key, column family, column qualifier, and timestamp. You use the same timestamp each time you add data to the cell. If you send a value to the same row key, column family, and column qualifier but with a different timestamp, a new aggregate cell is created in the column.

An add request sent to an aggregate cell must include a timestamp.

Input type

The input type of the value in the add request must match the input type that the column family is created with. For example, if you send a string value to a column family configured for Int64, the request is rejected.

AddToCell

An add request sends an AddToCell mutation in the Bigtable Data API. In contrast, a non-aggregate write request sends a SetCell mutation. For more information, see the Data API reference. AddToCell operations are subject to the same operations limits as other table mutations.

In a replicated table, an aggregate cell converges on the same total value in all clusters within the current replication delay. The total value is the aggregate of all AddToCell mutations sent to that cell in all clusters since the last delete operation, or since the cell was created.

Add request examples

The following examples show you how to add a value to an aggregate cell. The examples add to a sum in a column family that expects input type Int64.

cbt

cbt addtocell TABLE_ID ROW_KEY FAMILY_NAME:COLUMN_QUALIFER=VALUE@TIMESTAMP

Replace the following:

  • TABLE_ID: the permanent identifier for the table
  • ROW_KEY: the row key
  • FAMILY_NAME: the name of the aggregate column family
  • COLUMN_QUALIFIER: an identifier for the column
  • VALUE: the value to add to the cell
  • TIMESTAMP: a Unix timestamp in microseconds, such as 1710868850000000

Example:

cbt addtocell mobile-data device-1 updates:week12=100@1710868850000000

Go

To learn how to install and use the client library for Bigtable, see Bigtable client libraries.

To authenticate to Bigtable, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

import (
	"context"
	"fmt"
	"io"
	"time"

	"cloud.google.com/go/bigtable"
)

func writeAggregate(w io.Writer, projectID, instanceID string, tableName string) error {
	// projectID := "my-project-id"
	// instanceID := "my-instance-id"
	// tableName := "mobile-time-series"

	ctx := context.Background()
	client, err := bigtable.NewClient(ctx, projectID, instanceID)
	if err != nil {
		return fmt.Errorf("bigtable.NewClient: %w", err)
	}
	defer client.Close()
	tbl := client.Open(tableName)
	columnFamilyName := "view_count"
	viewTimestamp, err := time.Parse(time.RFC3339, "2024-03-13T12:41:34Z")
	if err != nil {
		return err
	}
	hourlyBucket := viewTimestamp.Truncate(time.Hour)

	mut := bigtable.NewMutation()
	mut.AddIntToCell(columnFamilyName, "views", bigtable.Time(hourlyBucket), 1)

	rowKey := "page#index.html"
	if err := tbl.Apply(ctx, rowKey, mut); err != nil {
		return fmt.Errorf("Apply: %w", err)
	}

	fmt.Fprintf(w, "Successfully wrote row: %s\n", rowKey)
	return nil
}

Java

To learn how to install and use the client library for Bigtable, see Bigtable client libraries.

To authenticate to Bigtable, set up Application Default Credentials. For more information, see Set up authentication for client libraries.


import com.google.cloud.bigtable.data.v2.BigtableDataClient;
import com.google.cloud.bigtable.data.v2.models.RowMutation;
import java.time.Instant;
import java.time.temporal.ChronoUnit;

public class WriteAggregate {
  private static final String COUNT_COLUMN_FAMILY_NAME = "view_count";
  private static final long MICROS_PER_MILLI = 1000;

  public static void writeAggregate(String projectId, String instanceId, String tableId) {
    // String projectId = "my-project-id";
    // String instanceId = "my-instance-id";
    // String tableId = "page-view-counter";

    try (BigtableDataClient dataClient = BigtableDataClient.create(projectId, instanceId)) {

      String rowKey = "page#index.html";
      Instant viewTimestamp = Instant.parse("2024-03-13T12:41:34.123Z");

      // Bucket the views for an hour into a single count, giving us an hourly view count for a
      // given page.
      Instant hourlyBucket = viewTimestamp.truncatedTo(ChronoUnit.HOURS);
      long hourlyBucketMicros = hourlyBucket.toEpochMilli() * MICROS_PER_MILLI;

      RowMutation rowMutation =
          RowMutation.create(tableId, rowKey)
              .addToCell(COUNT_COLUMN_FAMILY_NAME, "views", hourlyBucketMicros, 1);

      dataClient.mutateRow(rowMutation);
      System.out.printf("Successfully wrote row %s", rowKey);

    } catch (Exception e) {
      System.out.println("Error during WriteAggregate: \n" + e.toString());
    }
  }
}

What's next