Keep only the most recent value

This page describes a strategy that lets you keep only the most recent value in a column in a Bigtable table.

All Cloud Bigtable client libraries let you use filters to read the most recent value, or cell, at a given row and column. In some cases, however, you might not ever need to read older versions of your data. You can use a version-based garbage collection policy that specifies to only keep one cell in a column, but because it can take up to a week for garbage collection to occur, in practice your table might contain older data that you never plan to read.

To keep only the most recent value, we recommend that you use a delete-then-write approach to limit columns in your table to only one cell.

Delete, then write

To retain only one value in a column, you can send a request that deletes the column and then recreates it with a new value and timestamp in one atomic action.

The following Java pseudocode example shows how this would work. The order is important: the deletion must occur before the write.

RowMutation mutation = RowMutation.create(TABLE, ROW_KEY)
                .deleteCells(COLUMN_FAMILY, COLUMN_QUALIFIER, Range.TimestampRange.unbounded())
                .setCell((COLUMN_FAMILY, COLUMN_QUALIFIER, TIMESTAMP, VALUE);
dataClient.mutateRow(mutation);

Provide the following:

  • TABLE: the ID of the table
  • COLUMN_FAMILY: the column family that contains the column
  • COLUMN_QUALIFIER: the column qualifier to delete and recreate
  • TIMESTAMP: the new timestamp
  • VALUE: the new value for the column

Timestamp of zero

Previously, we recommended a strategy of always sending writes with a timestamp of 0. You can still use that approach, but because valid timestamps are useful, we recommend the delete-then-write approach instead.

If you set the timestamp for a cell to 0 or any value less than the current time in milliseconds and also use an age-based garbage collection policy, your cell might be deleted the next time garbage collection occurs.

What's next