Source SQL Server database

This page contains information about:

  • The behavior of how Datastream handles data that's being pulled from a source SQL Server database.
  • The versions of SQL Server databases that Datastream supports.
  • Known limitations for using SQL Server database as a source.

Behavior

Datastream tracks data manipulation language (DML) changes using one of the following CDC methods:

Change tables

The change tables CDC method lets users retain logs for a shorter time, and thus save storage space, but supports lower throughput compared to the transaction logs method. The method has fewer limitations than transaction logs. For example, it eliminates the risk of log truncation leading to streams failing permanently, and supports replication of encrypted tables. For more information, see Known limitations.

When using this CDC method, changes to the source are tracked using dedicated change tables. Transaction logs are still used, but to a limited extent, and you don't need to retain them for longer periods. As DML events are applied to the source tables, the changes are replicated to the corresponding change tables. The change tables have the same structure as the source tables, but with extra columns to include the metadata for the changes. Only committed transactions are added to the change tables, along with the log sequence number (LSN) of the commit operation.

Transaction logs

When using this CDC method, Datastream reads changes in the source directly from transaction logs. This method requires fewer resources and allows for faster data retrieval, but has more limitations.

To prevent data loss, it's important that the logs aren't truncated before Datastream reads them. On the other hand, if you keep the log files for too long, they take up storage space, which might eventually cause the database instance to enter read-only mode.

To make sure that the CDC reader has enough time to read the logs while allowing log truncation to free up storage space, you need to apply additional configuration steps, such as changing polling intervals and setting up a truncation safeguard. These steps provide an additional layer of protection to ensure that Datastream can read the data even if there's downtime on the Datastream side or a connectivity issue between the source database and Datastream.

For detailed instructions on how to apply these additional measures, see the Configure a source SQL Server database page and select your database type.

Versions

Datastream supports the following versions and editions of SQL Server databases:

  • Self-managed (on-premise or cloud-hosted) using the following versions:
    • Enterprise: 2008 and later
    • Standard: 2016 SP1 and later
    • Developer: 2008 and later
  • Amazon RDS for SQL Server
  • Azure SQL Database (tier S3 and above).

  • Cloud SQL for SQL Server

Datastream doesn't support the following versions of SQL Server databases:

  • SQL Server Standard edition from version 2008 to 2014
  • SQL Server Express
  • SQL Server Web

Known limitations

Known limitations for using SQL Server database as a source include:

  • Streams are limited to 10,000 tables.
  • A table that has more than 500 million rows can't be backfilled unless the following conditions are met:
    1. The table has a unique index.
    2. None of the index columns are nullable.
    3. All columns of the index are included in the stream.
  • Databases with delayed durability or accelerated database recovery (ADR) enabled aren't supported.
  • Streaming changes to system tables isn't supported.
  • Windows Active Directory (AD) authentication isn't supported.
  • The following data types aren't supported, and aren't replicated to the destination:
    • SQL_VARIANT
    • HIERARCHYID
    • GEOMETRY
    • GEOGRAPHY
  • Datastream replicates user-defined data types, however it's the base data type from which you derive your user-defined type that is stored in the destination. For example, if you define a USERNAME data type based on the VARCHAR(50) data type, the data is stored in the destination as VARCHAR(50).
  • Datastream doesn't support CDC for large object columns (TEXT, NTEXT, XML, IMAGE) and maximum variable length columns (VARCHAR(MAX), VARBINARY(MAX), NVARCHAR(MAX)) in tables without a unique index.

    If large object columns aren't included in the stream, CDC is supported.

  • Datastream doesn't support replicating source schema changes when using the change tables CDC method. The following schema changes might cause data corruption or failure to process the events:

    • Adding columns: the new columns aren't replicated to the destination.
    • Dropping columns: data in these columns is replaced with NULL values.
    • Modifying data types: Datastream tries to insert the data to the destination and raises an error if the data is rejected.
    • Renaming columns: not supported for SQL Server when CDC is enabled.
  • Datastream doesn't support data masking. Data is replicated without masking.

  • Datastream doesn't support replicating changes applied to the database using the Data Tier Application Package (DACPAC) package.

  • Datastream doesn't replicate changes made using the WRITETEXT or UPDATETEXT statements.

  • Datastream doesn't support replicating computed columns unless the column is marked PERSISTED.

  • Datastream doesn't support PAGE, COLUMNSTORE or COLUMNSTORE ARCHIVE compression types.

Additional limitations when using the transaction logs method

If you use the transaction logs CDC method, the following additional limitations apply:

  • Transparent Data Encryption (TDE) isn't supported.
  • Column Level Encryption isn't supported. Data in these columns is replaced with NULL values.
  • Datastream doesn't support the ROLLBACK TO SAVEPOINT statement. Such rollback events are ignored and aren't replicated to the destination.
  • Datastream doesn't support CDC for rows greater than 8 KB in the following types of tables:
    • Tables without a unique index
    • Tables containing only a unique, non-clustered index with one or more variable length columns (VARCHAR, VARBINARY, NVARCHAR)
  • Datastream doesn't support CDC for large object columns (TEXT, NTEXT, XML, IMAGE) in the following types of tables:

    • Tables without a unique index
    • Tables containing only a unique, non-clustered index with one or more variable length columns (VARCHAR, VARBINARY, NVARCHAR)

    If large object columns aren't included in the stream, CDC is only supported for such tables if they have valid indexes.

What's next