Create a stream

Stay organized with collections Save and categorize content based on your preferences.

Overview

In this section, you learn how to create a stream. Datastream uses this stream to transfer data from a source Oracle, MySQL or PostgreSQL (Preview) database into BigQuery (Preview) or Cloud Storage.

Creating a stream includes:

  • Defining settings for the stream.
  • Selecting the connection profile that you created for your source Oracle, MySQL or PostgreSQL database (the source connection profile), or creating a source connection profile if you haven't created one.
  • Configuring information about the source database for the stream by specifying the tables and schemas in the source database that Datastream:
    • Can transfer into the destination.
    • Is restricted from transferring into the destination.
  • Determining whether Datastream will backfill historical data, as well as stream ongoing changes into the destination, or stream only changes to the data. When enabling historical backfill, you can optionally specify schemas and tables in the source database that Datastream shouldn't backfill into the destination.

  • Selecting the connection profile that you created for BigQuery or Cloud Storage (the destination connection profile), or creating a destination connection profile if you haven't created one.

  • Configuring information about the destination for the stream. This information includes:

    • For BigQuery:
      • The datasets into which Datastream will replicate schemas, tables, and data from a source database.
    • For Cloud Storage:
      • The folder of the destination bucket into which Datastream will transfer schemas, tables, and data from a source database.
  • Validating the stream to ensure that it will run successfully. Validating a stream checks:

    • Whether the source is configured properly to allow Datastream to stream data from it.
    • Whether the stream can connect to both the source and the destination.

    • The end-to-end configuration of the stream.

Define settings for the stream

  1. Go to the Streams page in the Google Cloud Console.

    Go to the Streams page

  2. Click CREATE STREAM.

  3. Use the following table to populate the fields of the Define stream details section of the Create stream page:

    FieldDescription
    Stream nameEnter the display name of the stream.
    Stream IDDatastream populates this field automatically based on the stream name that you enter. You can keep the ID that's auto-generated or change it.
    RegionSelect the region where the stream is stored. Streams, like all resources, are saved in a region. Region selection doesn't impact whether your stream can connect to the source database or the destination, but can impact availability if the region experiences downtime.
    Source type

    Select the profile type that you specified when you created a connection profile for a source Oracle, MySQL or PostgreSQL database. Or, if you haven't created a connection profile for the source database, you can create one now.

    Destination type

    Select the profile type that you specified when you created a connection profile for a BigQuery or Cloud Storage destination. Or, if you haven't created a connection profile for the destination, you can create one now.

    Encryption

    By default, your data is encrypted with a key that's managed by Google Cloud. If you want to manage your encryption, then you can use a customer-managed encryption key (CMEK). To do so:

    1. Select the Use a CMEK check box.
    2. From the Select a CMEK drop-down menu, select your CMEK.

    If you don't see your key, then click ENTER KEY RESOURCE NAME to provide the resource name of the key that you want to use. For example, you can enter projects/my-project-name/locations/my-location/keyRings/my-keyring/cryptoKeys/my-key in the Key resource name field, and then click SAVE.

  4. Optionally, use labels to organize your Datastream resources.

    1. To create a label, click ADD LABEL, and then enter the key-value pair for the label.
    2. To remove the label, click the trashcan icon to the right of the row that contains the label.
  5. Review the required prerequisites that are generated automatically to reflect how the environment must be prepared for a stream. These prerequisites can include how to configure the source database and how to connect it to the destination. It's best to complete these prerequisites at this step, but you can complete them at any time before you test the stream or start it. For more information about these prerequisites, see Configure your database.

  6. Click CONTINUE. The Define connection profile panel of the Create stream page appears for your source database type.

Specify information about the source connection profile

  1. If you have created a source connection profile for Oracle, MySQL or PostgreSQL, then select it from the list of connection profiles.

    If you haven't created a source connection profile, then create one by clicking CREATE CONNECTION PROFILE at the bottom of the drop-down list, and then perform the same steps as in Create connection profiles.

  2. Click RUN TEST to verify that the source database and Datastream can communicate with each other.

    If the test fails, then the issue associated with the connection profile appears. Make the necessary changes to correct the issue, and then retest.

  3. Click CONTINUE. The Configure stream source panel of the Create stream page appears.

Configure information about the source database for the stream

Configuring PostgreSQL source database

If you selected PostgreSQL as your source database, you need to define its replication properties.

  1. In the Replication properties section, specify the following properties:
    1. In the Replication slot name field, enter the name of the slot you created specifically for this stream. The database server uses this slot to send events to Datastream.
    2. In the Publication name field, enter the name of the publication you created in the database. A publication is a group of all tables that you want to replicate changes from using this stream.
  2. Click Continue. Follow the steps described in the Configuring source databases section below.

Configuring source databases

  1. In the Select objects to include section, use the Objects to include drop-down menu to specify the tables and schemas in the source database that Datastream can transfer into the destination.

    1. If you want Datastream to transfer all tables and schemas, then select All tables from all schemas.
    2. If you want Datastream to transfer only specific tables and schemas, select Specific schemas and tables, and then select the check boxes for the schemas and tables that you want Datastream to pull.

    3. If you want want to provide a textual definition of the tables and schemas that you want Datastream to transfer, select Custom manual definition, and then, in the Object matching criteria field, enter the schemas and tables that you want Datastream to pull.

  2. Optionally, expand the Select objects to exclude node. In the Objects to exclude field, enter the tables and schemas in the source database that you want to restrict Datastream from transferring into the folder in the Cloud Storage destination bucket.

  3. Optionally, expand the Choose backfill mode for historical data node, and then make one of the following selections:
    1. Select the Automatic option to stream all existing data, in addition to changes to the data, from the source into the destination.

      In the Objects excluded from automatic backfill field, enter the tables and schemas in the source database that you want to restrict Datastream from backfilling into the destination.

    2. Select the Manual option to stream only changes to the data into the destination.
  4. Click CONTINUE. The Define connection profile panel of the Create stream page appears for your destination type.

Select a destination connection profile

  1. If you have created a destination connection profile, then select it from the list of connection profiles.

    If you haven't created a destination connection profile, then create one by clicking CREATE CONNECTION PROFILE at the bottom of the drop-down list, and then perform the same steps as in Create connection profiles.

  2. Click CONTINUE. The Configure stream destination panel of the Create stream page appears.

Configure information about the destination for the stream

BigQuery destination

  1. Under Configure the connection from Datastream to BigQuery, specify how Datastream should stream into a BigQuery dataset. Select one of the following options:

    • Dataset for each schema: Datastream creates a BigQuery dataset for each source schema, based on the schema name.

      If you select this option, then for every schema in the source database, Datastream creates a dataset in the project that contains the stream.

      In addition, when selecting the Dataset for each schema option:

      1. In the Location field, enter the region where you want the new dataset to be created.
      2. (Optional) In the Prefix field, define a prefix for the dataset created by the stream. The string that you specify is added to the source schema name. For example, if the schema is called <mySchema>, and you define the dataset prefix as <myPrefix>, Datastream creates a dataset called <myPrefix><mySchema>.
      3. Specify how you want your data to be encrypted in BigQuery by selecting a Google-managed or customer-managed encryption key.
    • Single dataset for all schemas: Datastream creates all the tables inside the existing BigQuery dataset that you specify. Each table that Datastream creates is named using a combination of the source schema name and table name, separated by an underscore (for example, <mySchemaName>_<myTableName>).

  2. Specify the limit of data staleness. Select a limit to how stale your data may be when queried. Lower staleness (fresher data) may increase the processing time and the cost of your queries.

  3. Click CONTINUE. The Review stream details and create panel of the Create stream page appears.

Cloud Storage destination

  1. Optionally, in the Stream path prefix field, enter the folder of the destination bucket into which Datastream will transfer schemas, tables, and data from a source database.

    For example, if you want Datastream to transfer data from the source database into the /root/example folder in the destination bucket, then enter /root/example in the Stream path prefix field.

  2. In the Output format field, choose the format of files written to Cloud Storage.

  3. Optional. If you select the JSON format, then two check boxes appear:

    1. Include a Unified Types schema file in the file path: If you select this check box, then Datastream writes two files to Cloud Storage: a JSON data file and an Avro schema file. The schema file has the same name as the data file, with a .schema extension. If you don't select this check box, then Datastream writes only the JSON data file to Cloud Storage. By default, this check box isn't selected.
    2. Enable gzip compression: If you select this check box, then Datastream uses the gzip utility to compress the files that Datastream writes to Cloud Storage. If you don't select this check box, then Datastream writes the files to Cloud Storage without compressing them. By default, this check box is selected.
  4. Click CONTINUE. The Review stream details and create panel of the Create stream page appears.

Create the stream

  1. Verify details about the stream as well as the source and destination connection profiles that the stream will use to transfer data from a source database into a destination.
  2. Click RUN VALIDATION to validate the stream. By validating a stream, Datastream checks that the source is configured properly, validates that the stream can connect to both the source and the destination, and verifies the end-to-end configuration of the stream.

    1. If your source database is Oracle, then Datastream performs the following checks:

      CheckDescription
      Objects list validityDatastream verifies that the exclude list doesn't subsume the include list.
      Forward SSH tunnel connectivityDatastream verifies that it can connect to a bastion host through a Forward SSH tunnel.
      Connectivity to Oracle databaseDatastream verifies that it can connect to the source Oracle database.
      Oracle user permissionsDatastream verifies that the user which it uses to connect to the source database has all of the necessary permissions to retrieve schemas, tables, and data from the database so that Datastream can stream this information into the destination.
      Logging mode configurationDatastream verifies that the logging mode for Oracle database is set to ARCHIVELOG.
      Supplemental logging configurationDatastream verifies that supplemental logging is turned on for the database tables that are being streamed from the source into the destination.
      Archive log files configurationDatastream verifies that archive logging is configured in the source, and that archive log files are present.
      Cloud Storage permissions (for Cloud Storage destinations) Datastream verifies that it has the necessary permissions to write to the destination bucket in Cloud Storage.
    2. If your source database is MySQL, then Datastream performs the following checks:

      CheckDescription
      Objects list validityDatastream verifies that the exclude list doesn't subsume the include list.
      Forward SSH tunnel connectivityDatastream verifies that it can connect to a bastion host through a Forward SSH tunnel.
      Connectivity to MySQL databaseDatastream verifies that it can connect to the source MySQL database.
      Binary logging is enabledDatastream verifies that the binary log files are configured correctly, and that there are log files present.
      Binary log format configurationDatastream verifies that the binary log format of the MySQL database is set to ROW.
      Replication permissionsDatastream verifies that it has permissions to replicate a MySQL database.
      Cloud Storage permissions (for Cloud Storage destinations)Datastream verifies that it has the necessary permissions to write to the destination bucket in Cloud Storage.
    3. If your source database is PostgreSQL (Preview), then Datastream performs the following checks:

      CheckDescription
      Connectivity to PostgreSQL databaseDatastream verifies that it can connect to the source PostgreSQL database.
      Logical decoding is enabledDatastream verifies that wal_level parameter of the PostgreSQL database is set to logical.
      Replication slot configurationDatastream verifies whether the PostgreSQL replication slot exists and is active, whether Datastream has the permissions required to access it, and whether it is configured correctly.
      Publication configurationDatastream verifies that the publication exists and that the tables to replicate for the stream are included in the publication.
      Backfill permissionsDatastream verifies that tables required to backfill historical data exist and that it has the necessary permissions to read them.

    4. After all validation checks pass, click CREATE & START to create the stream and start it immediately, or CREATE to create the stream without starting it immediately.

      If you don't start the stream now, then you can start it from the Streams page by clicking START.

    After creating a stream, you can view high-level and detailed information about it.