Time to live (TTL) allows database administrators to set policies to periodically delete data from Cloud Spanner tables. Removing unneeded data:
- Decreases storage and backup costs.
- Reduces the number of rows the database has to scan for some queries, potentially increasing query performance.
TTL is ideal for regular clean-up activities. It runs continuously in the background, periodically deleting eligible data in batches. Data is typically deleted within 72 hours after its expiration date. TTL doesn't immediately invalidate data or hide it from queries when it becomes eligible for deletion.
Another background compaction process reclaims storage from deleted rows, typically within 7 days.
TTL is designed to minimize impact on other database activities. It parallelizes work more efficiently than end-user queries and includes retry logic to ensure end-to-end cleanup. Note that the preview doesn't yet expose progress and error metrics. These will be available in the generally available (GA) release.
How does TTL work?
A database owner sets TTL on a table by defining a row deletion policy in the database schema. Each table can have its own policy. Only one row deletion policy can be specified on a table.
A row deletion policy specifies a column name containing a TIMESTAMP datatype, and a number of days to add to the timestamp. For example, a policy might specify that a row should be deleted once today's date equals "TransactionDate + 30 days." See Defining a row deletion policy for the actual syntax.
A background system process checks daily for eligible rows. It parallelizes the actual deletes in batches that are executed close to where the data is stored internally. Each batch runs in its own transaction at a consistent timestamp. Thus the rows in a given batch, along with any indexes and interleaved children, are guaranteed to be deleted atomically. However, deletes across batches will happen in different transactions.
Since this is an asynchronous background process, there is a delay between eligibility and deletion. Typically, the delay is less than 72 hours. As a result, rows may remain in your table for up to 3 days after their TTL has expired; for example, a table with a row deletion policy that deletes rows older than four days may include rows up to seven days old as well as older, undeletable rows. Rows may be undeletable because their removal would violate a constraint or a batch exceeds the mutation limit. See TTL and Interleaved Tables below for more information.
TTL row deletion policies are defined in a database schema using DDL.
Before you begin
During this preview period, before adding a row deletion policy to your table, we recommend the following steps.
Enable Cloud Spanner Backup and Restore. This allows you to restore a database in case you accidentally delete your data with a row deletion policy.
Permission and schema updates
To change your database's schema, you must have spanner.databases.updateDdl permission. See Schema updates for details.
Defining a row deletion policy
You define a row deletion policy on a table using a row deletion policy
clause in a
ALTER TABLE statement. This clause contains an
expression of a column and an interval.
ROW DELETION POLICY (OLDER_THAN(timestamp_column, INTERVAL num_days DAY));
timestamp_columnmust be an existing column with type
TIMESTAMP. Columns with commit timestamps are valid.
CREATE TABLE MyTable( Key INT64, CreatedAt TIMESTAMP, ) PRIMARY KEY (Key) , ROW DELETION POLICY (OLDER_THAN(CreatedAt, INTERVAL 30 DAY));
Adding a row deletion policy
If this is the first time you are adding a row deletion policy to a table and you expect the first run to delete many rows, consider first cleaning up old data manually via PartitionDML. This gives you more control over the resource usage, instead of leaving it to the TTL background process. Row deletion policies run at a low priority, ideal for incremental clean-up. However, this will likely lengthen the time it takes to delete the initial set of rows in a busy database because Cloud Spanner's internal work scheduler will prioritize other work, such as user queries.
ALTER TABLE statement, you can add a row deletion policy to an
existing table. Each table can have at most one row deletion policy. Adding a
row deletion policy fails with an error if there is already a row deletion
policy on the table. See TTL on generated columns
below to specify more sophisticated row deletion logic.
ALTER TABLE MyTable ADD ROW DELETION POLICY (OLDER_THAN(CreatedAt, INTERVAL 1 DAY));
Dropping a row deletion policy
You can drop an existing row deletion policy from a table. This returns an error if there is no existing row deletion policy on the table.
ALTER TABLE MyTable DROP ROW DELETION POLICY;
Replacing a row deletion policy
You can alter the column or the interval expression of an existing row deletion
policy. The example below switches the column from
and extends the interval from
1 DAY to
7 DAY. This returns an error if there
is no existing row deletion policy on the table.
ALTER TABLE MyTable REPLACE ROW DELETION POLICY (OLDER_THAN(ModifiedAt, INTERVAL 7 DAY));
TTL on generated columns
Row deletion policies can use generated columns to express more sophisticated rules for a policy. For example, you can define a row deletion policy on the maximum timestamp of multiple columns, or map another value to a timestamp.
Suppose you have an e-commerce application that tracks sales orders in
OrderTable. How would you set up a row deletion policy that stores cancelled
orders for 30 days and other orders for 180 days?
Intuitively, this could be done with two row deletion policies. However, to
reduce the opportunities to create conflicting policies, a table can only have
one. Instead, you can create a generated column that uses an
IF statement to
express the two criteria. That is, a timestamp column that is 30 days from last
modification if the order is cancelled, and 180 days otherwise. The row deletion
policy is set to expire rows on the calculated expired timestamp column no older
than now, or a
0 DAY interval.
CREATE TABLE OrderTable ( OrderId INT64 NOT NULL, OrderStatus STRING(30) NOT NULL, LastModifiedDate TIMESTAMP NOT NULL, ExpiredDate TIMESTAMP AS (IF(OrderStatus = "Cancelled", TIMESTAMP_ADD(LastModifiedDate, INTERVAL 30 DAY), TIMESTAMP_ADD(LastModifiedDate, INTERVAL 180 DAY))) STORED, ) PRIMARY KEY(OrderId) , ROW DELETION POLICY (OLDER_THAN(ExpiredDate, INTERVAL 0 DAY));
To see why this works, note that for a cancelled order, the
LastModificationDate + 30 DAY. By adding
30 DAY to both sides of the
OLDER_THAN(ExpiredDate, INTERVAL 0 DAY), we get
OLDER_THAN(LastModificationDate, INTERVAL 30 DAY). Similarly, for
non-cancelled orders, the predicate becomes
INTERVAL 180 DAY).
TTL and interleaved tables
are a performance optimization that associates related rows in a one-to-many
child table with a row in a parent table. To add a row deletion policy on a
parent table, all interleaved child tables must specify
ON DELETE CASCADE,
meaning the child rows will be deleted atomically with the parent row. This
ensures referential integrity such that deletes on the parent table also delete
the related child rows in the same transaction. Cloud Spanner TTL does not
ON DELETE NO ACTION.
Cloud Spanner has limits on the maximum transaction size. Cascading deletes on large parent-child hierarchies could exceed these limits and cause one or more TTL operations to fail. For failed operations, TTL will retry with smaller batches, down to a single parent row. However, large child hierarchies for even a single parent row may still exceed the mutation limit. In this scenario, you can attach a row deletion policy directly on the child tables, in addition to the one on the parent table. The policy on child tables should be configured such that child rows are deleted prior to parent rows. Consider this when all the following are true:
- The child table has any global indexes associated with it; and
- You expect a large number of (>100) child rows per parent row.
Dry-run a row deletion policy
You can estimate how many rows a row deletion policy is going to delete with a
query. Suppose you want to set up a row deletion policy on
EndDate and expire rows older than 7 days.
CREATE TABLE AdCampaigns ( CampaignId INT64 NOT NULL, … EndDate TIMESTAMP NOT NULL ) PRIMARY KEY(CampaignId) , ROW DELETION POLICY (OLDER_THAN(EndDate, INTERVAL 7 DAY));
Before you set the policy, you can count how many rows would be deleted with a query.
SELECT COUNT(*) FROM AdCampaigns WHERE EndDate < TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY);
Introspection, troubleshooting, and monitoring
This limited preview version does not include metrics to monitor progress and errors. It is possible to extend the version retention period for up to 7 days and to query for data in the past with point-in-time recovery. This technique allows the database owner to check data that has been deleted by TTL. See Perform a stale read.
Frequently asked questions
How does TTL affect schema updates?
When you have a row deletion policy on a column, you cannot drop that column, unless the row deletion policy is dropped first.
How does TTL work with foreign key constraints?
When you have a table referenced by a foreign key, you cannot create a row
deletion policy on the referenced table. This restriction also applies when a
table has a referenced child table. In the example below, you cannot add a row
deletion policy to the
Customers table, because it is referenced by a foreign
key in the
Orders table and deleting customers could violate this constraint.
CREATE TABLE Customers ( CustomerID INT64, CreatedAt TIMESTAMP ) PRIMARY KEY (CustomerID) CREATE TABLE Orders ( OrderID INT64, CustomerID INT64, CONSTRAINT FK_CustomerOrder FOREIGN KEY (CustomerID) REFERENCES Customers (CustomerID) ) PRIMARY KEY (OrderID)
Are deletion policies restored from a backup?
When you mount a backup, all the row deletion policies are automatically
dropped. This is intentional such that Cloud Spanner does not begin to
delete expired data as soon as the backup has been mounted. You will need to
re-create the row deletion policies manually using
ALTER TABLE, as shown
Does TTL impact the consistency of Cloud Spanner backups or exports via Dataflow?
No, TTL does not change the consistency model for backups and export jobs. Backups are a consistent snapshot of the data at a particular point in time. Similarly, Dataflow export jobs read the entire table at a fixed timestamp. TTL garbage collects eligible rows continuously and in the background. Each DELETE issued by TTL is assigned a timestamp in the same way as user transactions. The backup would not contain any DELETEs with a timestamp more recent than the backup time. The backup will contain rows that may be eligible for TTL deletion but for which TTL has not completed, yet.
Can data deleted by TTL be recovered with point-in-time recovery (PITR)?
If your data is within the PITR version retention period, you can use a stale query to read data at a timestamp before it was cleaned up by TTL. We suggest that you extend the version retention period to 7 days before you begin using TTL.