Time to live (TTL) allows database administrators to set policies to periodically delete data from Cloud Spanner tables. Removing unneeded data:
- Decreases storage and backup costs.
- Reduces the number of rows the database has to scan for some queries, potentially increasing query performance.
- Helps to adhere to regulations or industry guidelines that limit the retention time on certain types of data.
TTL is ideal for regular clean-up activities. It runs continuously in the background, periodically deleting eligible data in batches. Data is typically deleted within 72 hours after its expiration date. TTL doesn't immediately invalidate data or hide it from queries when it becomes eligible for deletion. TTL also doesn't check data while it is inserted, thus it doesn't block you from inserting a row with an expired timestamp.
Another background compaction process reclaims storage from deleted rows, typically within 7 days.
TTL is designed to minimize impact on other database activities. It parallelizes work more efficiently than end-user queries and includes retry logic to ensure end-to-end cleanup.
How does TTL work?
A database owner sets TTL on a table by defining a row deletion policy in the database schema. Each table can have its own policy. Only one row deletion policy can be specified per table.
A row deletion policy specifies a timestamp and an interval to determine when a row is eligible for deletion; for example, last update date plus 30 days. See Working with TTL for the actual syntax.
A background system process checks daily for eligible rows. It parallelizes the actual deletes in batches that are executed close to where the data is stored internally. Each batch runs in its own transaction at a consistent timestamp. Thus the rows in a given batch, along with any indexes and interleaved children, are guaranteed to be deleted atomically. However, deletes across batches will happen in different transactions.
Since this is an asynchronous background process, there is a delay between eligibility and deletion. Typically, the delay is less than 72 hours. As a result, rows may remain in your table for up to 3 days after their TTL has expired; for example, a table with a row deletion policy that deletes rows older than four days may include rows up to seven days old as well as older, undeletable rows.
Backups and TTL
Restoring a backup
When you restore a database from a backup, any row deletion policies that were configured on the source database are automatically dropped. This prevents Cloud Spanner from potentially deleting expired data as soon as the backup has been restored. You'll need to re-create row deletion policies manually.
A backup is a consistent snapshot of your data at a particular point in time. Similarly, Dataflow export jobs read the entire table at a fixed timestamp. TTL garbage collects eligible rows continuously and in the background. Each DELETE issued by TTL is assigned a timestamp in the same way as user transactions. A backup does not contain any DELETEs with a timestamp more recent than the backup time. The backup can contain rows that may be eligible for TTL deletion but for which TTL has not completed, yet.