Create a stream

Overview

In this section, you learn how to create a stream. Datastream uses this stream to transfer data from a source Oracle or MySQL database to a destination bucket in Cloud Storage.

Creating a stream includes:

  • Defining settings for the stream.
  • Selecting the connection profile that you created for your source Oracle or MySQL database (the source connection profile), or creating a source connection profile if you haven't created one.
  • Configuring information about the source database for the stream by specifying the tables and schemas in the source database that Datastream:
    • Can transfer into the destination.
    • Is restricted from transferring into the destination.
  • Determining whether Datastream will backfill historical data, as well as stream ongoing changes into the destination, or stream only changes to the data. When enabling historical backfill, you can optionally specify schemas and tables in the source database that Datastream shouldn't backfill into the destination.
  • Selecting the connection profile that you created for Cloud Storage.
  • Configuring information about the destination bucket for the stream. This information includes:
    • The folder of the destination bucket into which Datastream will transfer schemas, tables, and data from a source database.
    • The size (in MBytes) of files that contain data that's being transferred from the source database into a folder in the Cloud Storage destination bucket. By default, as data is being retrieved from the source database, it's written into 50-MB files. If any data exceeds this size, then the data will be segmented into multiple 50-MB files.
    • The number of seconds that will elapse before Datastream closes an existing file in a folder of the Cloud Storage destination bucket and opens another file to contain data being transferred from the source database. By default, the file rotation interval is set to 60 seconds.
    • The format of files written to Cloud Storage.
  • Validating the stream to ensure that it will run successfully. Validating a stream checks:

    • Whether the source is configured properly to allow Datastream to stream data from it.
    • Whether the stream can connect to both the source and the destination.

    • The end-to-end configuration of the stream.

Define settings for the stream

  1. Go to the Streams page in the Google Cloud Console.

    Go to the Streams page

  2. Click CREATE STREAM.

  3. Use the following table to populate the fields of the Define stream details section of the Create stream page:

    FieldDescription
    Stream nameEnter the display name of the stream.
    Stream IDDatastream populates this field automatically based on the stream name that you enter. You can keep the ID that's auto-generated or change it.
    RegionSelect the region where the stream is stored. Streams, like all resources, are saved in a region. Region selection doesn't impact whether your stream can connect to the source database or the destination bucket in Cloud Storage, but can impact availability if the region experiences downtime.
    Source type

    Select the profile type that you specified when you created a connection profile for a source database. Or, if you haven't created a connection profile for the source database, you can create one now.

    Destination type

    Select the profile type that you specified when you created a connection profile for a destination storage Cloud service. Or, if you haven't created a connection profile for the destination, you can create one now.

  4. Optionally, use labels to organize your Datastream resources.

    1. To create a label, click ADD LABEL, and then enter the key-value pair for the label.
    2. To remove the label, click the trashcan icon to the right of the row that contains the label.
  5. Review the required prerequisites that are generated automatically to reflect how the environment must be prepared for a stream. These prerequisites can include how to configure the source database and how to connect it to the destination bucket in Cloud Storage. It's best to complete these prerequisites at this step, but you can complete them at any time before you test the stream or start it. For more information about these prerequisites, see Configure your source Oracle database.

  6. Click CONTINUE. The Define connection profile panel of the Create stream page appears for your source database type.

Specify information about the source connection profile

  1. If you have created a source connection profile for Oracle or MySQL, then select it from the list of connection profiles.

    If you haven't created a source connection profile, then create one by clicking CREATE CONNECTION PROFILE at the bottom of the drop-down list, and then perform the same steps as in Create connection profiles.

  2. Click RUN TEST to verify that the source database and Datastream can communicate with each other.

    If the test fails, then the issue associated with the connection profile appears. Make the necessary changes to correct the issue, and then retest.

  3. Click CONTINUE. The Configure stream source panel of the Create stream page appears.

Configure information about the source database for the stream

  1. In the Select objects to include section, use the Objects to include drop-down menu to specify the tables and schemas in the source database that Datastream can transfer into a folder in the destination bucket in Cloud Storage.

    1. If you want Datastream to transfer all tables and schemas, then select All tables from all schemas.
    2. If you want Datastream to transfer only specific tables and schemas, select Specific schemas and tables, and then select the check boxes for the schemas and tables that you want Datastream to pull.

    3. If you want want to provide a textual definition of the tables and schemas that you want Datastream to transfer, select Custom manual definition, and then, in the Object matching criteria field, enter the schemas and tables that you want Datastream to pull.

  2. Optionally, expand the Select objects to exclude node. In the Objects to exclude field, enter the tables and schemas in the source database that you want to restrict Datastream from transferring into the folder in the Cloud Storage destination bucket.

  3. Optionally, expand the Modify historical data backfill node. Select the Backfill historical data check box to stream all existing data, in addition to changes to the data, from the source into the destination.

  4. In the Objects excluded from backfill field, enter the tables and schemas in the source database that you want to restrict Datastream from backfilling into the folder in the Cloud Storage destination bucket.

  5. Click CONTINUE. The Define Cloud Storage connection profile panel of the Create stream page appears.

Select a destination connection profile

  1. If you have created a destination connection profile, then select it from the list of connection profiles.

    If you haven't created a destination connection profile, then create one by clicking CREATE CONNECTION PROFILE at the bottom of the drop-down list, and then perform the same steps as in Create connection profiles.

  2. Click CONTINUE. The Configure stream destination panel of the Create stream page appears.

Configure information about the destination for the stream

  1. Optionally, in the Stream path prefix field, enter the folder of the destination bucket into which Datastream will transfer schemas, tables, and data from a source database.

    For example, if you want Datastream to transfer data from the source database into the /root/example folder in the destination bucket, then enter /root/example in the Stream path prefix field.

  2. In the Output format field, choose the format of files written to Cloud Storage.

  3. Optional. If you select the JSON format, then two check boxes appear:

    1. Include a Unified Types schema file in the file path: If you select this check box, then Datastream writes two files to Cloud Storage: a JSON data file and an Avro schema file. The schema file has the same name as the data file, with a .schema extension. If you don't select this check box, then Datastream writes only the JSON data file to Cloud Storage. By default, this check box isn't selected.
    2. Enable gzip compression: If you select this check box, then Datastream uses the gzip utility to compress the files that Datastream writes to Cloud Storage. If you don't select this check box, then Datastream writes the files to Cloud Storage without compressing them. By default, this check box is selected.
  4. Click CONTINUE. The Review stream details and create panel of the Create stream page appears.

Create the stream

  1. Verify details about the stream as well as the source and destination connection profiles that the stream will use to transfer data from a source database to a destination bucket in Cloud Storage.

  2. Click RUN VALIDATION to validate the stream. By validating a stream, Datastream checks that the source is configured properly, validates that the stream can connect to both the source and the destination, and verifies the end-to-end configuration of the stream.

    1. If your source database is Oracle, then Datastream performs the following checks:

      CheckDescription
      Objects list validityDatastream verifies that the exclude list doesn't subsume the include list.
      Forward SSH tunnel connectivityDatastream verifies that it can connect to a bastion host through a Forward SSH tunnel.
      Connectivity to Oracle databaseDatastream verifies that it can connect to the source Oracle database.
      Logging mode configurationDatastream verifies that the logging mode for Oracle database is set to ARCHIVELOG.
      Supplemental logging configurationDatastream verifies that supplemental logging is turned on for the database tables that are being streamed from the source into the destination.
    2. If your source database is MySQL, then Datastream performs the following checks:

      CheckDescription
      Objects list validityDatastream verifies that the exclude list doesn't subsume the include list.
      Forward SSH tunnel connectivityDatastream verifies that it can connect to a bastion host through a Forward SSH tunnel.
      Connectivity to MySQL databaseDatastream verifies that it can connect to the source MySQL database.
      Binary log format configurationDatastream verifies that the binary log format of the MySQL database is set to ROW.
      Replication permissionsDatastream verifies that it has permissions to replicate a MySQL database.
  3. After all validation checks pass, click CREATE & START to create the stream and start it immediately, or CREATE to create the stream without starting it immediately.

    If you don't start the stream now, then you can start it from the Streams page by clicking START.

After creating a stream, you can view high-level and detailed information about it.