Create a stream

In this section, you learn how to create a stream. Datastream uses this stream to transfer data from a source Oracle, MySQL, SQL Server or PostgreSQL database into BigQuery or Cloud Storage.

Creating a stream includes:

Defining settings for the stream.
Selecting the connection profile that you created for your Oracle, MySQL, SQL Server, PostgreSQL, Salesforce or MongoDB source (the source connection profile), or creating a source connection profile if you haven't created one.
Configuring information about the source database for the stream by specifying the tables and schemas in the source database that Datastream:
- Can transfer into the destination.
- Is restricted from transferring into the destination.
Determining whether Datastream will backfill historical data, as well as stream ongoing changes into the destination, or stream only changes to the data. When enabling historical backfill, you can optionally specify schemas and tables in the source database that Datastream shouldn't backfill into the destination.
Selecting the connection profile that you created for BigQuery or Cloud Storage (the destination connection profile), or creating a destination connection profile if you haven't created one.
Configuring information about the destination for the stream. This information includes:
- For BigQuery:
  - The datasets into which Datastream will replicate schemas, tables, and data from a source database.
- For Cloud Storage:
  - The folder of the destination bucket into which Datastream will transfer schemas, tables, and data from a source database.
Validating the stream to ensure that it will run successfully. Validating a stream checks:
- Whether the source is configured properly to allow Datastream to stream data from it.
- Whether the stream can connect to both the source and the destination.
- The end-to-end configuration of the stream.

Before you begin

Make sure that you've set up your source database for replication. For information about steps required for each supported source type, see Configure sources.
Find out what steps might be required to set up your selected destination. For more information, see Configure destinations.

Define settings for the stream

Go to the Streams page in the Google Cloud Console.

Go to the Streams page
Click CREATE STREAM.

Use the following table to populate the fields of the Define stream details section of the Create stream page:

Field	Description
Stream name	Enter the display name of the stream.
Stream ID	Datastream populates this field automatically based on the stream name that you enter. You can keep the ID that's auto-generated or change it.
Region	Select the region where the stream is stored. Streams, like all resources, are saved in a region. Region selection doesn't impact whether your stream can connect to the source database or the destination, but can impact availability if the region experiences downtime. We recommend that you keep all resources for the stream in the same region as your source data to optimize cost and performance.
Source type	Select the profile type that you specified when you created a connection profile for your Oracle, MySQL, SQL Server, PostgreSQL, Salesforce source or MongoDB source. Or, if you haven't created a connection profile for the source database, you can create one now.
Destination type	Select the profile type that you specified when you created a connection profile for a BigQuery or Cloud Storage destination. Or, if you haven't created a connection profile for the destination, you can create one now.
Encryption	By default, your data is encrypted with a key that's managed by Google Cloud. If you want to manage your encryption, then you can use a customer-managed encryption key (CMEK): Select the Cloud KMS key checkbox. From the Key type drop-down menu, select Cloud KMS and then select your CMEK. If you don't see your key, then click ENTER KEY RESOURCE NAME to provide the resource name of the key that you want to use. For example, you can enter `projects/my-project-name/locations/my-location/keyRings/my-keyring/cryptoKeys/my-key` in the Key resource name field, and then click SAVE. As part of creating the stream, Datastream will verify that the CMEK exists, and that Datastream has permissions to use the key. For more information about creating a CMEK or giving Datastream permissions for the key, see Use customer-managed encryption keys (CMEK).

Optionally, use labels to organize your Datastream resources.
1. To create a label, click ADD LABEL, and then enter the key-value pair for the label.
2. To remove the label, click the trashcan icon to the right of the row that contains the label.
Note: If your source is SQL Server, you can configure your stream to replicate columns added to the source schema after the stream is created by adding the enable_ddl_support_for_ct label. For more information, see How Datastream handles DDL changes to the source schema.
Optionally, add alerting policies for your stream. Alerting policies define when and how you want to be notified about stream failures.
1. To create an alerting policy, click Add alerting policy.
2. The Create alerting policy page appears in Cloud Monitoring. On this page, you define the alerting policy in case your stream fails.
For more information about alerting policies, see Managing metric-based alerting policies.
Review the required prerequisites that are generated automatically to reflect how the environment must be prepared for a stream. These prerequisites can include how to configure the source database and how to connect it to the destination. It's best to complete these prerequisites at this step, but you can complete them at any time before you test the stream or start it. For more information about these prerequisites, see Sources.
Click CONTINUE. The Define connection profile panel of the Create stream page appears for your source database type.

Note: If your source database type is Oracle, then the panel name is Define Oracle connection profile. If you select a different source database, then that database name appears in the panel name.

Specify information about the source connection profile

If you have created a source connection profile for Oracle, MySQL, SQL Server, PostgreSQL, Salesforce or MongoDB, then select it from the list of connection profiles.

If you haven't created a source connection profile, then create one by clicking CREATE CONNECTION PROFILE at the bottom of the drop-down list, and then perform the same steps as in Create connection profiles.
Click RUN TEST to verify that the source database and Datastream can communicate with each other.

If the test fails, then the issue associated with the connection profile appears. Refer to the Diagnose issues page for troubleshooting steps. Make the necessary changes to correct the issue, and then retest.
Click CONTINUE. The Configure stream source panel of the Create stream page appears.

Configure information about the source database for the stream

Configure a PostgreSQL source database

Define replication properties for your PostgreSQL source database. In the Replication properties section, specify the following properties:

In the Replication slot name field, enter the name of the slot you created specifically for this stream. The database server uses this slot to send events to Datastream.
In the Publication name field, enter the name of the publication you created in the database. A publication is a group of all tables that you want to replicate changes from using this stream.

In the Select objects to include section, use the Objects to include drop-down menu to specify the tables and schemas in the source database that Datastream can transfer into the destination.

If you want Datastream to transfer all tables and schemas, then select All tables from all schemas.
If you want Datastream to transfer only specific tables and schemas, select Specific schemas and tables, and then select the checkboxes for the schemas and tables that you want Datastream to pull.

If you want to provide a textual definition of the tables and schemas that you want Datastream to transfer, select Custom, and then, in the Object matching criteria field, enter the schemas and tables that you want Datastream to pull. If your database has large numbers of tables and schemas, we recommend using the Custom option because some tables and schemas might not be included in the list of objects to pull.
Enter schemas and tables as a comma-separated list, using the [schema].[table] format, where both the schema and the table can be a wildcard (asterisk). You can't use partial wildcards.

Optionally, expand the Select objects to exclude node. In the Objects to exclude field, enter the tables and schemas in the source database that you want to restrict Datastream from pulling. The Objects to exclude list takes precedence over the Objects to include list. If an object meets the criteria in both the include and exclude lists, the object is excluded from the stream.
You define these schemas and tables with the same comma-separated logic that you use if you're specifying schemas and tables that you want Datastream to pull.
Optionally, expand the Choose backfill mode for historical data node, and then make one of the following selections:

Select the Automatic option to stream all existing data, in addition to changes to the data, from the source into the destination. In the Objects excluded from automatic backfill field, enter the tables and schemas in the source database that you want to restrict Datastream from backfilling into the destination.
Select the Manual option to stream only changes to the data into the destination.

Click CONTINUE. The Define connection profile panel of the Create stream page appears for your destination type.

Configure a MySQL source database

In the Select objects to include section, use the Objects to include drop-down menu to specify the tables and schemas in the source database that Datastream can transfer into the destination.

If you want Datastream to transfer all tables and schemas, then select All tables from all schemas.
If you want Datastream to transfer only specific tables and schemas, select Specific schemas and tables, and then select the checkboxes for the schemas and tables that you want Datastream to pull.

If you want to provide a textual definition of the tables and schemas that you want Datastream to transfer, select Custom, and then, in the Object matching criteria field, enter the schemas and tables that you want Datastream to pull. If your database has large numbers of tables and schemas, we recommend using the Custom option because some tables and schemas might not be included in the list of objects to pull.
Enter schemas and tables as a comma-separated list, using the [schema].[table] format, where both the schema and the table can be a wildcard (asterisk). You can't use partial wildcards.

Optionally, expand the Select objects to exclude node. In the Objects to exclude field, enter the tables and schemas in the source database that you want to restrict Datastream from pulling. The Objects to exclude list takes precedence over the Objects to include list. If an object meets the criteria in both the include and exclude lists, the object is excluded from the stream.
You define these schemas and tables with the same comma-separated logic that you use if you're specifying schemas and tables that you want Datastream to pull.
Specify the CDC method for your stream:

GTID (Global Transaction IDs)-based replication: select this method if you want Datastream to support failovers and provide seamless replication irrespective of the changes in your database cluster.
Binlog-based replication: select this method to read and replicate changes directly from the binary log files of a selected database instance. You can't switch a running stream to a different source instance or a replica using this CDC method.

Optionally, expand the Choose backfill mode for historical data node, and then make one of the following selections:

Select the Automatic option to stream all existing data, in addition to changes to the data, from the source into the destination. In the Objects excluded from automatic backfill field, enter the tables and schemas in the source database that you want to restrict Datastream from backfilling into the destination.
Select the Manual option to stream only changes to the data into the destination.

Click CONTINUE. The Define connection profile panel of the Create stream page appears for your destination type.

Configure an Oracle source database

In the Select objects to include section, use the Objects to include drop-down menu to specify the tables and schemas in the source database that Datastream can transfer into the destination.

If you want Datastream to transfer all tables and schemas, then select All tables from all schemas.
If you want Datastream to transfer only specific tables and schemas, select Specific schemas and tables, and then select the checkboxes for the schemas and tables that you want Datastream to pull.

If you want to provide a textual definition of the tables and schemas that you want Datastream to transfer, select Custom, and then, in the Object matching criteria field, enter the schemas and tables that you want Datastream to pull. If your database has large numbers of tables and schemas, we recommend using the Custom option because some tables and schemas might not be included in the list of objects to pull.
Enter schemas and tables as a comma-separated list, using the [schema].[table] format, where both the schema and the table can be a wildcard (asterisk). You can't use partial wildcards. Unless specified explicitly, the Oracle predefined schemas and tables aren't pulled into the destination.

Optionally, expand the Select objects to exclude node. In the Objects to exclude field, enter the tables and schemas in the source database that you want to restrict Datastream from pulling. The Objects to exclude list takes precedence over the Objects to include list. If an object meets the criteria in both the include and exclude lists, the object is excluded from the stream.
You define these schemas and tables with the same comma-separated logic that you use if you're specifying schemas and tables that you want Datastream to pull.
Specify the CDC method for your stream:

LogMiner: select this method to query archived redo logs using the LogMiner API. LogMiner supports most options available for Oracle, such as encryption and compression options.
Binary reader (Preview): select this method to extract changes directly from Oracle logs. This allows for faster replication of large Oracle databases and reduced latency and overhead. When you select this option, an additional drop-down is shown. Select the way you want to access your Oracle log files:

Automatic Storage Management (ASM): select this option if your database uses ASM. Make sure that you selected the Enable ASM access for binary reader checkbox and filled in the details of your ASM instance in your source connection profile.
Database directories: select this option if you don't use ASM for storing your log files. If you select database directories, you need to provide the redo log and archived log directory names.

Optionally, expand the Choose backfill mode for historical data node, and then make one of the following selections:

Select the Automatic option to stream all existing data, in addition to changes to the data, from the source into the destination. In the Objects excluded from automatic backfill field, enter the tables and schemas in the source database that you want to restrict Datastream from backfilling into the destination.
Select the Manual option to stream only changes to the data into the destination.

Click CONTINUE. The Define connection profile panel of the Create stream page appears for your destination type.

Configure a SQL Server source database

In the Select objects to include section, use the Objects to include drop-down menu to specify the tables and schemas in the source database that Datastream can transfer into the destination.

If you want Datastream to transfer all tables and schemas, then select All tables from all schemas.
If you want Datastream to transfer only specific tables and schemas, select Specific schemas and tables, and then select the checkboxes for the schemas and tables that you want Datastream to pull.

If you want to provide a textual definition of the tables and schemas that you want Datastream to transfer, select Custom, and then, in the Object matching criteria field, enter the schemas and tables that you want Datastream to pull. If your database has large numbers of tables and schemas, we recommend using the Custom option because some tables and schemas might not be included in the list of objects to pull.
Enter schemas and tables as a comma-separated list, using the [schema].[table] format, where both the schema and the table can be a wildcard (asterisk). You can't use partial wildcards.

Optionally, expand the Select objects to exclude node. In the Objects to exclude field, enter the tables and schemas in the source database that you want to restrict Datastream from pulling. The Objects to exclude list takes precedence over the Objects to include list. If an object meets the criteria in both the include and exclude lists, the object is excluded from the stream.
You define these schemas and tables with the same comma-separated logic that you use if you're specifying schemas and tables that you want Datastream to pull.
Specify the CDC method for your stream:

Transaction logs: select this method to process changes directly from the database logs. This method provides the best performance and is more efficient, but requires additional configuration steps.
Change tables: select this method to process changes from dedicated change tables. This method is easier to configure and has fewer limitations, but supports lower throughput and produces higher load on your database than the transaction logs method.

Optionally, expand the Choose backfill mode for historical data node, and then make one of the following selections:

Select the Automatic option to stream all existing data, in addition to changes to the data, from the source into the destination. In the Objects excluded from automatic backfill field, enter the tables and schemas in the source database that you want to restrict Datastream from backfilling into the destination.
Select the Manual option to stream only changes to the data into the destination.

Click CONTINUE. The Define connection profile panel of the Create stream page appears for your destination type.

Configure a source Salesforce org

In the Select objects to include section, use the Objects to include drop-down menu to specify the objects and fields in your org that you want Datastream to transfer into the destination.

If you want Datastream to transfer all objects, then select All objects.
If you want Datastream to transfer only specific objects, select Specific objects, and then select the checkboxes for the objects that you want Datastream to pull.

If you want to provide a textual definition of the objects that you want Datastream to transfer, select Custom, and then, in the Object matching criteria field, enter the objects that you want Datastream to pull. Enter the objects as a comma-separated list, using the [object].[field] format. If your database has large numbers of objects, we recommend using the Custom option because some tables and schemas might not be included in the list of objects to pull.

Define the polling interval value in minutes. Datastream uses this value to check your Salesforce org for data changes. The higher the value, the lower the cost and load on your org. The lower the value, the more up-to-date the data in the destination.
Optionally, expand the Select objects to exclude node. In the Objects to exclude field, enter the objects and fields in the source org that you want to restrict Datastream from pulling. The Objects to exclude list takes precedence over the Objects to include list. If an object meets the criteria in both the include and exclude lists, the object is excluded from the stream.
You define these objects and fields with the same comma-separated logic that you use for including objects.
Optionally, expand the Choose backfill mode for historical data node, and then make one of the following selections:

Select the Automatic option to replicate all existing data, in addition to changes to the data, from the source into the destination. In the Objects excluded from automatic backfill field, enter the objects in the source org that you want to restrict Datastream from backfilling into the destination.
Select the Manual option to replicate only changes to the data into the destination.

Click CONTINUE. The Define connection profile panel of the Create stream page appears for your destination type.

Configure a source MongoDB database

In the Select objects to include section, use the Objects to include drop-down menu to specify the databases and collections that you want Datastream to transfer into the destination.

If you want Datastream to transfer all collections, then select All databases and collections.
If you want Datastream to transfer only specific collections, select Specific databases and collection, and then select the checkboxes for the collections that you want Datastream to pull.

If you want to provide a textual definition of the collections that you want Datastream to transfer, select Custom, and then, in the Object matching criteria field, enter the collections that you want Datastream to pull. Enter the objects as a comma-separated list, using the [collection].[field] format. If your database has large numbers of objects, we recommend using the Custom option because some tables and schemas might not be included in the list of objects to pull.

Optionally, expand the Choose backfill mode for historical data node, and then make one of the following selections:

Select the Automatic option to replicate all existing data, in addition to changes to the data, from the source into the destination. In the Objects excluded from automatic backfill field, enter the objects in the source org that you want to restrict Datastream from backfilling into the destination.
Select the Manual option to replicate only changes to the data into the destination.

Click CONTINUE. The Define connection profile panel of the Create stream page appears for your destination type.

Select a destination connection profile

If you have created a destination connection profile, then select it from the list of connection profiles.

If you haven't created a destination connection profile, then create one by clicking CREATE CONNECTION PROFILE at the bottom of the drop-down list, and then perform the same steps as in Create connection profiles.

Note: For BigLake Iceberg tables destination, use or create a BigQuery connection profile.
Click CONTINUE. The Configure stream destination panel of the Create stream page appears.

Configure information about the destination for the stream

This section describes what information you need to provide to configure your BigQuery or Cloud Storage destination.

BigQuery destination

In the Specify how Datastream should stream into a BigQuery dataset section, select one of the following options from the Schema grouping drop-down:
- Dataset for each schema: Datastream creates a BigQuery dataset for each source schema, based on the schema name.
  
  If you select this option, then for every schema in the source database, Datastream creates a dataset in the project that contains the stream.
  
  In addition, if you select the Dataset for each schema option, you need to provide the following:
  1. In the Location type section, specify where you want to store your data in BigQuery. Select Region to specify a single geographical location, or Multi-region to specify a larger area that contains two or more regions. If your priority is to optimize costs and performance, select Region, and keep your datasets alongside other Google Cloud resources that you manage. If you want to achieve higher quota limits, select Multi-region to let BigQuery use multiple regions within a group. When you select Region, the field defaults to the region that you selected for the stream.
  2. (Optional) In the Dataset prefix field, define a prefix for the dataset created by the stream. The string that you specify is added to the source schema name. For example, if the schema is called <mySchema>, and you define the dataset prefix as <myPrefix>, Datastream creates a dataset called <myPrefix><mySchema>.
  3. Expand the Show advanced encryption options section.
  4. Specify how you want your data to be encrypted in BigQuery by selecting a Google-managed or customer-managed encryption key.
    
    Note: The Dataset for each schema option isn't supported for Salesforce sources.
- Single dataset for all schemas: Datastream creates all the tables inside the existing BigQuery dataset that you specify. Each table that Datastream creates is named using a combination of the source schema name and table name, separated by an underscore (for example, <mySchemaName>_<myTableName>). The datasets available are grouped by region. You can also create a new dataset.
Specify the Stream write mode. Select your write mode to define how data is written and processed in BigQuery:
- Merge: select this option if you want your data to be synchronised with the source.
- Append-only: select this option if you want to keep all historical data for your change events.
Specify the limit of data staleness. Only available when Merge mode is selected in the preceding step. Select a limit to balance BigQuery query performance and cost versus data freshness. BigQuery applies the changes in the background on an ongoing basis, or at query run time, according to the staleness limit. Lower staleness (fresher data) may increase BigQuery processing costs.
If you select 0 minutes, BigQuery always returns the query results based on the freshest data. This results in the highest cost of the query.
Click CONTINUE. The Review stream details and create panel of the Create stream page appears.

BigLake Iceberg tables destination

In the Specify how Datastream should stream into a BigQuery dataset section, select how you want Datastream to create new schemas and tables from the Schema grouping drop-down:
- Dataset for each schema: Datastream creates a BigQuery dataset for each source schema, based on the schema name.
  
  If you select this option, then for every schema in the source database, Datastream creates a dataset in the project that contains the stream.
  
  In addition, if you select the Dataset for each schema option, you need to provide the following:
  1. In the Location type section, specify where you want to store your data in BigQuery. Select Region to specify a single geographical location, or Multi-region to specify a larger area that contains two or more regions. If your priority is to optimize costs and performance, select Region, and keep your datasets alongside other Google Cloud resources that you manage. If you want to achieve higher quota limits, select Multi-region to let BigQuery use multiple regions within a group. When you select Region, the field defaults to the region that you selected for the stream.
  2. (Optional) In the Dataset prefix field, define a prefix for the dataset created by the stream. The string that you specify is added to the source schema name. For example, if the schema is called <mySchema>, and you define the dataset prefix as <myPrefix>, Datastream creates a dataset called <myPrefix><mySchema>.
  3. Expand the Show advanced encryption options section.
  4. Specify how you want your data to be encrypted in BigQuery by selecting a Google-managed or customer-managed encryption key.
    
    Note: The Dataset for each schema option isn't supported for Salesforce sources.
- Single dataset for all schemas: Datastream creates all the tables inside the existing BigQuery dataset that you specify. Each table that Datastream creates is named using a combination of the source schema name and table name, separated by an underscore (for example, <mySchemaName>_<myTableName>). The datasets available are grouped by region. You can also create a new dataset.
In the Define Datastream connection to BigLake Iceberg tables section, provide the following information:
- From the Connection ID drop-down, select the identifier of the BigQuery connection that you created and that you want to use for your stream. For information about creating a Cloud resource connection in BigQuery, see Configure BigLake Iceberg tables in BigQuery.
  
  Note: To prevent your stream from failing, make sure that the connection ID has write permissions to the underlying Cloud Storage bucket.
- Enter or select the URI address of the bucket that you created in Cloud Storage. For information about creating a Cloud Storage bucket, see Create a bucket.
Review the File format and Table format. Datastream supports the Parquet file format and the Iceberg table format.
Review the Stream write mode. Datastream only supports the Append-only write mode when streaming to BigLake Iceberg tables.
Click CONTINUE. The Review stream details and create panel of the Create stream page appears.

Cloud Storage destination

Optionally, in the Stream path prefix field, enter the folder of the destination bucket into which Datastream will transfer schemas, tables, and data from a source database.

For example, if you want Datastream to transfer data from the source database into the /root/example folder in the destination bucket, then enter /root/example in the Stream path prefix field.
In the Output format field, choose the format of files written to Cloud Storage.
Datastream supports two output formats: Avro and JSON.
Optional. If you select the JSON format, then two checkboxes appear:
1. Include a Unified Types schema file in the file path: If you select this checkbox, then Datastream writes two files to Cloud Storage: a JSON data file and an Avro schema file. The schema file has the same name as the data file, with a .schema extension. If you don't select this checkbox, then Datastream writes only the JSON data file to Cloud Storage. By default, this checkbox isn't selected.
2. Enable gzip compression: If you select this checkbox, then Datastream uses the gzip utility to compress the files that Datastream writes to Cloud Storage. If you don't select this checkbox, then Datastream writes the files to Cloud Storage without compressing them. By default, this checkbox is selected.
Click CONTINUE. The Review stream details and create panel of the Create stream page appears.

Create the stream

Verify details about the stream as well as the source and destination connection profiles that the stream will use to transfer data from a source database into a destination.

Click RUN VALIDATION to validate the stream. By validating a stream, Datastream checks that the source is configured properly, validates that the stream can connect to both the source and the destination, and verifies the end-to-end configuration of the stream.

If your source database is Oracle, then Datastream performs the following checks:

Check	Description
Objects list validity	Datastream verifies that the exclude list doesn't subsume the include list.
Forward SSH tunnel connectivity	Datastream verifies that it can connect to a bastion host through a Forward SSH tunnel.
Connectivity to Oracle database	Datastream verifies that it can connect to the source Oracle database.
Oracle user permissions	Datastream verifies that the user which it uses to connect to the source database has all of the necessary permissions to retrieve schemas, tables, and data from the database so that Datastream can stream this information into the destination.
Logging mode configuration	Datastream verifies that the logging mode for Oracle database is set to ARCHIVELOG.
Supplemental logging configuration	Datastream verifies that supplemental logging is turned on for the database tables that are being streamed from the source into the destination.
Oracle CDC configuration	Datastream verifies that the CDC method that you selected complies with your database configuration.
Archive log files configuration	Datastream verifies that archive logging is configured in the source, and that archive log files are present.
Cloud Storage permissions (for Cloud Storage destinations)	Datastream verifies that it has the necessary permissions to write to the destination bucket in Cloud Storage.

If your source database is MySQL, then Datastream performs the following checks:

Check	Description
Replication permissions	Datastream verifies that it has permissions to replicate a MySQL database.
Database version	Datastream verifies that MySQL database version is supported. For information about supported editions, see Versions.
Binary log format configuration	Datastream verifies that the binary log format of the MySQL database is set to `ROW`.
Binary logging is enabled	Datastream verifies that the binary log files are configured correctly, and that there are log files present.
Binary log row image configuration	Datastream verifies that the binlog row image is set to `FULL`.
Binlog compression disabled	Datastream verifies that the binlog compression is disabled.
Log replica or slave updates enabled	Datastream verifies that the log replica or slave updates are enabled.
GTID mode configuration (`applies only to GTID-based CDC`)	Datastream verifies that the GTID mode is set to `ON` if GTID based replication is used.
GtidSet doesn't contain tagged GTIDs (`applies only to GTID-based CDC`)	Datastream verifies that the GTID set doesn't contain any tagged GTIDs.
Cloud Storage permissions (for Cloud Storage destinations)	Datastream verifies that it has the necessary permissions to write to the destination bucket in Cloud Storage.

Warnings check	Description
Expire logs days or Expire logs seconds or Binlog retention hours configuration	Datastream verifies that the `expire_logs_days`, `binlog_expire_logs_seconds`, `binlog retention hours` are configured to values greater than `7`, `604800`, and `168`, respectively.
Max allowed packet configuration	Datastream verifies that the `max_allowed_packet` is set to 1GB.
Enforce GTID consistency or GTID strict mode (`applies only to GTID-based CDC`)	Datastream verifies that the GTID consistency or GTID strict mode is enabled.

If your source database is PostgreSQL, then Datastream performs the following checks:

Check	Description
Connectivity to PostgreSQL database	Datastream verifies that it can connect to the source PostgreSQL database.
Logical decoding is enabled	Datastream verifies that `wal_level` parameter of the PostgreSQL database is set to `logical`.
Replication slot configuration	Datastream verifies whether the PostgreSQL replication slot exists and is active, whether Datastream has the permissions required to access it, and whether it is configured correctly.
Publication configuration	Datastream verifies that the publication exists and that the tables to replicate for the stream are included in the publication.
Backfill permissions	Datastream verifies that tables required to backfill historical data exist and that it has the necessary permissions to read them.

If your source database is SQL Server, then Datastream performs the following checks:

Check	Description
SQL Server edition	Datastream verifies that the source SQL Server database edition is supported. For information about supported editions, see Versions.
SQL Server user permissions	Datastream verifies that the user has the permissions required to access the source SQL Server database.
SQL Server CDC is enabled	Datastream verifies whether CDC is enabled for the database and all tables included in the stream.
SQL Server database validations	Datastream checks whether the source database is configured properly, whether all replication requirements are met and whether there are no unsupported data types in the columns.
Cloud Storage or BigQuery permissions	Datastream verifies that the user has the necessary permissions to write to the destination.

If your source database is Salesforce, then Datastream performs the following checks:

Check	Description
Credentials login	Datastream verifies that it can authenticate with the customer's Salesforce org successfully using the provided credentials.
Bulk API	Datastream verifies that it has the necessary permissions to access the Salesforce Bulk API 2.0. Bulk API 2.0 is required to read large amounts of data.

Warnings check	Description
API limits	Datastream calculates an estimation of the expected API usage based on the number of objects and polling interval selected. If the expected daily number of requests is higher than your limit enforced by Salesforce, Datastream issues a warning. You can still create and run the stream, even if the validation check doesn't pass.

If your source database is MongoDB, then Datastream performs the following checks:

Check	Description
User permissions	Datastream verifies that the `readAnyDatabase` role is assigned to the user.
Connection configurations	Datastream uses the `hello` command to verify that the connection configurations are valid.
MongoDB version	Datastream verifies that the MongoDB version is 5.0 or later.
Cloud Storage or BigQuery permissions	Datastream verifies that the user has the necessary permissions to write to the destination.

After all validation checks pass, click CREATE & START to create the stream and start it immediately, or CREATE to create the stream without starting it immediately.

If you don't start the stream now, then you can start it from the Streams page by clicking START.

Note: The stream creation process may incur an increased load on the source database. To put off that load, create the stream without starting it, and come back to start the stream when the load can be incurred.

After creating a stream, you can view high-level and detailed information about it.

What's next

To learn more about streams, see Stream lifecycle.
To learn how to run a stream, see Run a stream.
To learn how to modify your streams, see Modify a stream.
To learn how to recover a stream, see Recover a stream.