You can recover a permanently failed stream without having to create a new stream. You can do this by specifying the position from which Datastream attempts to resume reading changes from the source.
Stream recovery overview
A running stream can encounter some unrecoverable errors and change its state to
FAILED_PERMANENTLY
. Such errors prevent the stream from continuing to run, and
might cause data loss.
You can recover a permanently failed stream by setting it to ignore the error and keep reading the ongoing events instead of recreating the stream and backfilling the historical data. To recover a permanently failed stream, you reset the replication to start reading from a different replication position. Each supported source type has its own definition of what a replication position is:
- For Oracle sources, a replication position is a redo log file in the database and the system change number (SCN) in this file.
- For MySQL sources, a replication position is the database binary log (binlog) file and the position in this file.
- For SQL Server sources, a replication position is the log sequence number (LSN) in the transaction logs or change tables.
- For PostgreSQL sources (including AlloyDB for PostgreSQL), a replication position is the log sequence number (LSN) in the replication slot. During recovery, the stream starts reading from the first LSN in the replication slot.
Recover a stream for a MySQL or an Oracle source
To recover a stream for a MySQL or Oracle source, you have the following options:
Retry from the current position (recommended): Select this option to attempt streaming from the current position, where the stream last failed. You need to fix the log file or recover it from backup first. This is the recommended option.
Skip the current position and stream from the next available position: If one or more log files are missing, select this option to skip these files and resume streaming from the first position in the subsequent available file. The changes from the missing log files are lost, but you can recover them by performing a backfill.
Skip the current position and stream from the most recent position: If one or more log files are missing, select this option to skip these files and resume streaming from the most recent position in the most up-to-date log file. The changes from the missing log files are lost, but you can recover them by performing a backfill.
Resume from your preferred streaming file and position: Select this option to resume the stream from a specific log file and log position. Some changes might be lost if the specified log position doesn't overlap with or immediately follow the lost log position. You can recover these changes by performing a backfill.
To recover a permanently failed stream for a MySQL or an Oracle source, perform the following steps:
Go to the Streams page in Google Cloud.
Click Recover in the row with the name of the stream that you want to recover.
The Choose a recovery strategy pane opens. Select an option. If you select Resume from your preferred streaming file and position, enter the following:
- For a MySQL source: the log filename in the File name field and the log position in the Position field. If you don't specify the position, the stream resumes from the first position in the indicated log file.
- For an Oracle source: the system change number (SCN) in the System Change Number (SCN) field. This field is mandatory.
Click Apply.
When the stream is recovered, a timestamp appears in the Recovered column in the Streams page.
Recover a stream for a PostgreSQL source
To recover a stream for a PostgreSQL source, you need to provide the replication slot name. The server uses this replication slot to send events to Datastream. The replication slot name can be the same as the slot used for the failed stream or different:
- If the new replication slot has a different name, then provide the new replication slot name to Datastream.
If you don't provide a replication slot name, Datastream uses the replication slot name specified in the source configuration.
For more information about replication slots, see Configure a source PostgreSQL database.
Any change events to the source that occurred between the log position loss and the first LSN in the new replication slot are lost. You can recover these changes by performing a backfill.
To recover a permanently failed stream for a PostgreSQL source, perform the following steps:
Go to the Streams page in Google Cloud.
Click Recover in the row with the name of the stream that you want to recover.
The Define a new replication slot pane opens.
In the Replication slot name field, provide the name of a new replication slot from which the stream will attempt to recover. If you recreated the replication slot using the same name, or you want to reuse the slot you specified when you configured your source, you can leave this field empty.
Click Apply.
When the stream is recovered, a timestamp appears in the Recovered column in the Streams page.
You can also recover permanently failed streams from the Stream details page. To do this, click Recover stream when viewing detailed information about your stream.
Recover a stream for a SQL Server source
To recover a stream for a SQL Server source, you have the following options:
Resume from the first available position: Select this option if the log was truncated or some records are missing from change tables, and you want to resume from the first event available. The missing events are lost, but you can recover them by performing a backfill.
Resume from your preferred log sequence number (LSN): Select this option to resume the stream from a specific LSN in the transaction logs or change tables. Some events might be lost if the specified LSN doesn't overlap with or immediately follow the last LSN that Datastream was able to retrieve. You can recover these events by performing a backfill.
The LSN in both the transaction logs and change tables contains 20 hexadecimal characters, but for transaction logs, it's separated by a delimiter. For example:
- LSN in transaction logs:
0000123C:0000BA78:0004
- LSN in change tables:
0000123C0000BA780004
- LSN in transaction logs:
To recover a permanently failed stream for a SQL Server source, perform the following steps:
Go to the Streams page in Google Cloud.
Click Recover in the row with the name of the stream that you want to recover.
The Choose a recovery strategy pane opens. Select an option.
Click Apply.
When the stream is recovered, a timestamp appears in the Recovered column in the Streams page.
Use stream recovery for a MySQL source in a manual failover scenario
You can perform a manual failover and use stream recovery to avoid recreating your streams from scratch during maintenance or primary instance failure. Generally, Datastream doesn't support failovers to replicas because they break the binlog continuity, but you can follow these steps to recover the stream and ensure your change data is captured:
- Stop all writes to the primary instance.
- Make sure that data freshness metric is set to 0. This means that Datastream captured all changes and there are no new events to read from the source. For more information, see Monitor a stream.
- Fail over to the new database instance.
- If needed, update the stream's connection profile to the new database instance (for example, you might need to change the database hostname or IP address). For more information, see Modify connection profiles.
- Recover the stream from a specific position on the failover instance to ensure CDC continuity.
What's next
- To learn more about stream states, see Stream lifecycle.
- To learn how to view information about your stream, see View a stream.
- To learn how to monitor a stream, see Monitor a stream.
- To learn how to manage backfill for a stream, see Manage backfill for the objects of a stream.
- To learn how to delete an existing stream, see Delete a stream.