Time to live (TTL)

Overview

Time to live (TTL) allows database administrators to set policies to periodically delete data from Cloud Spanner tables. Removing unneeded data:

  • Decreases storage and backup costs.
  • Reduces the number of rows the database has to scan for some queries, potentially increasing query performance.

TTL is ideal for regular clean-up activities. It runs continuously in the background, periodically deleting eligible data in batches. Data is typically deleted within 72 hours after its expiration date. TTL doesn't immediately invalidate data or hide it from queries when it becomes eligible for deletion.

Another background compaction process reclaims storage from deleted rows, typically within 7 days.

TTL is designed to minimize impact on other database activities. It parallelizes work more efficiently than end-user queries and includes retry logic to ensure end-to-end cleanup. Note that the preview doesn't yet expose progress and error metrics. These will be available in the generally available (GA) release.

How does TTL work?

A database owner sets TTL on a table by defining a row deletion policy in the database schema. Each table can have its own policy. Only one row deletion policy can be specified on a table.

A row deletion policy specifies a column name containing a TIMESTAMP datatype, and a number of days to add to the timestamp. For example, a policy might specify that a row should be deleted once today's date equals "TransactionDate + 30 days." See Defining a row deletion policy for the actual syntax.

A background system process checks daily for eligible rows. It parallelizes the actual deletes in batches that are executed close to where the data is stored internally. Each batch runs in its own transaction at a consistent timestamp. Thus the rows in a given batch, along with any indexes and interleaved children, are guaranteed to be deleted atomically. However, deletes across batches will happen in different transactions.

Since this is an asynchronous background process, there is a delay between eligibility and deletion. Typically, the delay is less than 72 hours. As a result, rows may remain in your table for up to 3 days after their TTL has expired; for example, a table with a row deletion policy that deletes rows older than four days may include rows up to seven days old as well as older, undeletable rows. Rows may be undeletable because their removal would violate a constraint or a batch exceeds the mutation limit. See TTL and Interleaved Tables below for more information.

Managing TTL

TTL row deletion policies are defined in a database schema using DDL.

Before you begin

During this preview period, before adding a row deletion policy to your table, we recommend the following steps.

  1. Enable Cloud Spanner Backup and Restore. This allows you to restore a database in case you accidentally delete your data with a row deletion policy.

  2. Extend the version retention period to 7 days to enable point-in-time recovery.

Permission and schema updates

To change your database's schema, you must have spanner.databases.updateDdl permission. See Schema updates for details.

Defining a row deletion policy

You define a row deletion policy on a table using a row deletion policy clause in a CREATE or ALTER TABLE statement. This clause contains an expression of a column and an interval.

ROW DELETION POLICY (OLDER_THAN(timestamp_column, INTERVAL num_days DAY));

Where:

  • timestamp_column must be an existing column with type TIMESTAMP. Columns with commit timestamps are valid.

  • The INTERVAL expression is similar to that in other timestamp and date functions, with DAY as the only supported unit. num_days must be a non-negative integer.

For example:

CREATE TABLE MyTable(
  Key INT64,
  CreatedAt TIMESTAMP,
) PRIMARY KEY (Key)
, ROW DELETION POLICY (OLDER_THAN(CreatedAt, INTERVAL 30 DAY));

Adding a row deletion policy

If this is the first time you are adding a row deletion policy to a table and you expect the first run to delete many rows, consider first cleaning up old data manually via PartitionDML. This gives you more control over the resource usage, instead of leaving it to the TTL background process. Row deletion policies run at a low priority, ideal for incremental clean-up. However, this will likely lengthen the time it takes to delete the initial set of rows in a busy database because Cloud Spanner's internal work scheduler will prioritize other work, such as user queries.

Using the ALTER TABLE statement, you can add a row deletion policy to an existing table. Each table can have at most one row deletion policy. Adding a row deletion policy fails with an error if there is already a row deletion policy on the table. See TTL on generated columns below to specify more sophisticated row deletion logic.

ALTER TABLE MyTable ADD ROW DELETION POLICY (OLDER_THAN(CreatedAt, INTERVAL 1 DAY));

Dropping a row deletion policy

You can drop an existing row deletion policy from a table. This returns an error if there is no existing row deletion policy on the table.

ALTER TABLE MyTable DROP ROW DELETION POLICY;

Replacing a row deletion policy

You can alter the column or the interval expression of an existing row deletion policy. The example below switches the column from CreatedAt to ModifiedAt and extends the interval from 1 DAY to 7 DAY. This returns an error if there is no existing row deletion policy on the table.

ALTER TABLE MyTable REPLACE ROW DELETION POLICY (OLDER_THAN(ModifiedAt, INTERVAL 7 DAY));

TTL on generated columns

Row deletion policies can use generated columns to express more sophisticated rules for a policy. For example, you can define a row deletion policy on the maximum timestamp of multiple columns, or map another value to a timestamp.

Suppose you have an e-commerce application that tracks sales orders in OrderTable. How would you set up a row deletion policy that stores cancelled orders for 30 days and other orders for 180 days?

Intuitively, this could be done with two row deletion policies. However, to reduce the opportunities to create conflicting policies, a table can only have one. Instead, you can create a generated column that uses an IF statement to express the two criteria. That is, a timestamp column that is 30 days from last modification if the order is cancelled, and 180 days otherwise. The row deletion policy is set to expire rows on the calculated expired timestamp column no older than now, or a 0 DAY interval.

CREATE TABLE OrderTable (
  OrderId INT64 NOT NULL,
  OrderStatus STRING(30) NOT NULL,
  LastModifiedDate TIMESTAMP NOT NULL,
  ExpiredDate TIMESTAMP AS (IF(OrderStatus = "Cancelled",
    TIMESTAMP_ADD(LastModifiedDate, INTERVAL 30 DAY),
    TIMESTAMP_ADD(LastModifiedDate, INTERVAL 180 DAY))) STORED,
) PRIMARY KEY(OrderId)
, ROW DELETION POLICY (OLDER_THAN(ExpiredDate, INTERVAL 0 DAY));

To see why this works, note that for a cancelled order, the ExpiredDate is just LastModificationDate + 30 DAY. By adding 30 DAY to both sides of the expression OLDER_THAN(ExpiredDate, INTERVAL 0 DAY), we get OLDER_THAN(LastModificationDate, INTERVAL 30 DAY). Similarly, for non-cancelled orders, the predicate becomes OLDER_THAN(LastModificationDate, INTERVAL 180 DAY).

TTL and interleaved tables

Interleaved tables are a performance optimization that associates related rows in a one-to-many child table with a row in a parent table. To add a row deletion policy on a parent table, all interleaved child tables must specify ON DELETE CASCADE, meaning the child rows will be deleted atomically with the parent row. This ensures referential integrity such that deletes on the parent table also delete the related child rows in the same transaction. Cloud Spanner TTL does not support ON DELETE NO ACTION.

Cloud Spanner has limits on the maximum transaction size. Cascading deletes on large parent-child hierarchies could exceed these limits and cause one or more TTL operations to fail. For failed operations, TTL will retry with smaller batches, down to a single parent row. However, large child hierarchies for even a single parent row may still exceed the mutation limit. In this scenario, you can attach a row deletion policy directly on the child tables, in addition to the one on the parent table. The policy on child tables should be configured such that child rows are deleted prior to parent rows. Consider this when all the following are true:

  • The child table has any global indexes associated with it; and
  • You expect a large number of (>100) child rows per parent row.

Dry-run a row deletion policy

You can estimate how many rows a row deletion policy is going to delete with a query. Suppose you want to set up a row deletion policy on AdCampaigns at column EndDate and expire rows older than 7 days.

CREATE TABLE AdCampaigns (
  CampaignId INT64 NOT NULL,
  …
  EndDate TIMESTAMP NOT NULL
) PRIMARY KEY(CampaignId)
, ROW DELETION POLICY (OLDER_THAN(EndDate, INTERVAL 7 DAY));

Before you set the policy, you can count how many rows would be deleted with a query.

SELECT COUNT(*) FROM AdCampaigns
WHERE EndDate < TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY);

Introspection, troubleshooting, and monitoring

This limited preview version does not include metrics to monitor progress and errors. It is possible to extend the version retention period for up to 7 days and to query for data in the past with point-in-time recovery. This technique allows the database owner to check data that has been deleted by TTL. See Perform a stale read.

Frequently asked questions

How does TTL affect schema updates?

When you have a row deletion policy on a column, you cannot drop that column, unless the row deletion policy is dropped first.

How does TTL work with foreign key constraints?

When you have a table referenced by a foreign key, you cannot create a row deletion policy on the referenced table. This restriction also applies when a table has a referenced child table. In the example below, you cannot add a row deletion policy to the Customers table, because it is referenced by a foreign key in the Orders table and deleting customers could violate this constraint.

CREATE TABLE Customers (
  CustomerID INT64,
  CreatedAt TIMESTAMP
) PRIMARY KEY (CustomerID)

CREATE TABLE Orders (
  OrderID INT64,
  CustomerID INT64,
  CONSTRAINT FK_CustomerOrder FOREIGN KEY (CustomerID) REFERENCES Customers (CustomerID)
) PRIMARY KEY (OrderID)

Are deletion policies restored from a backup?

When you mount a backup, all the row deletion policies are automatically dropped. This is intentional such that Cloud Spanner does not begin to delete expired data as soon as the backup has been mounted. You will need to re-create the row deletion policies manually using ALTER TABLE, as shown above.

Does TTL impact the consistency of Cloud Spanner backups or exports via Dataflow?

No, TTL does not change the consistency model for backups and export jobs. Backups are a consistent snapshot of the data at a particular point in time. Similarly, Dataflow export jobs read the entire table at a fixed timestamp. TTL garbage collects eligible rows continuously and in the background. Each DELETE issued by TTL is assigned a timestamp in the same way as user transactions. The backup would not contain any DELETEs with a timestamp more recent than the backup time. The backup will contain rows that may be eligible for TTL deletion but for which TTL has not completed, yet.

Can data deleted by TTL be recovered with point-in-time recovery (PITR)?

If your data is within the PITR version retention period, you can use a stale query to read data at a timestamp before it was cleaned up by TTL. We suggest that you extend the version retention period to 7 days before you begin using TTL.