For businesses with many isolated data sources, access to enterprise data across the organization, especially in a real-time manner, can be difficult. This results in a world of limited and slow data access, preventing the organization's ability to introspect.
Datastream provides near real-time access to change data from a variety of on-premises and cloud-based data sources to create access to organizational data. Datastream provides a simple setup experience and a unified consumption API which democratizes the organization's access to the freshest enterprise data available across the organization, powering integrated near real-time scenarios.
One such scenario is transferring data from a source database into a Cloud-based storage service or messaging queue and transforming this data into a form that's readable by other applications and services that communicate with this storage service or messaging queue.
In this tutorial, you learn how to use Datastream to transfer schemas, tables, and data from a source Oracle database into a folder in a Cloud Storage bucket. Cloud Storage is a web service for storing and accessing data on Google Cloud. The service combines the performance and scalability of Google's cloud with advanced security and sharing capabilities.
As part of transferring this information into a folder in the destination Cloud Storage bucket, Datastream translates this information into Avro. Avro is defined by a schema that's written in JavaScript Object Notation (JSON). Doing this translation lets you read data across different data sources in a uniform way.
Objectives
In this tutorial, you learn how to:- Set environment variables. You'll use these variables when you make requests to Datastream to create and manage both connection profiles and a stream.
- Create and manage connection profiles for a source database and a destination bucket in Cloud Storage. By creating these connection profiles, you're creating records that contain information about the source database and destination Cloud Storage bucket. The stream in Datastream uses the information in the connection profiles to transfer data from the source database into a folder in the destination bucket.
- Create and manage a stream. Datastream uses this stream to transfer data, schemas, and tables from the source database into a folder in the destination bucket.
- Verify that Datastream transfers the data and tables associated with a schema of the source Oracle database into a folder in the destination bucket, and translates this data into the Avro file format.
- Clean up the resources that you created on Datastream so they won't take up quota and you won't be billed for them in the future.
Costs
In this document, you use the following billable components of Google Cloud:
- Cloud Storage
To generate a cost estimate based on your projected usage,
use the pricing calculator.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
- Enable the Datastream API.
- Make sure that you have the Datastream Admin role assigned to your user account.
- Make sure that you have a source database that Datastream can access. For this tutorial, an Oracle database is used as the source.
- Configure your source database to allow incoming connections from Datastream public IP addresses. See IP allowlists and regions to access the locations of all Datastream regions and their associated public IP addresses.
- Make sure that you have configured a destination Cloud Storage bucket that Datastream can access by using the IP allowlist, forward SSH tunnel, or VPC peering network connectivity method.
- Make sure that you have data, tables, and schemas in the source database that Datastream can transfer into a folder in the destination Cloud Storage bucket.
- Download and install Cloud Shell. This client application provides you with command-line access to your cloud resources (including Datastream).
- Install and configure the
jq
utility. This utility is a lightweight and flexible command-line JSON processor. You'll use this processor to display complexcURL
commands in easy-to-read text.
Setting environment variables
In this procedure, you'll set the following variables:
$PROJECT
: This variable is associated with your Google Cloud project. Any Google Cloud resources that you allocate and use must belong to a project.$TOKEN
: This variable is associated with an access token. The access token provides a session that Cloud Shell uses to perform tasks in Datastream using REST APIs.
Launch your Cloud Shell application.
After authenticating into your application with your Google Account, enter
gcloud auth login
.At the
Do you want to continue (Y/n)?
prompt, enterY
.Open a web browser and copy the URL into the browser.
Authenticate to the Google Cloud SDK with your Google Account. A code appears on the Sign in page. This code is your access token.
Copy the access token, paste it into the
Enter verification code:
parameter in your Cloud Shell application, and pressEnter
.At the prompt, enter
PROJECT="YOUR_PROJECT_NAME"
to set the$PROJECT
environment variable to your Google Cloud project.At the prompt, enter
gcloud config set project YOUR_PROJECT_NAME
to set the project that you'd like to work on to your Google Cloud project.Your command prompt updates to reflect your active project and respects this format:
USERNAME@cloudshell:~ (YOUR_PROJECT_NAME)$
At the prompt, enter
TOKEN=$(gcloud auth print-access-token)
to retrieve the access token and store it as a variable.At the prompt, enter the following commands to ensure that your
$PROJECT
and$TOKEN
variables are set correctly:echo $PROJECT
echo $TOKEN
Now that you set your variables, you can make requests to Datastream to create and manage both connection profiles and a stream.
Create and manage connection profiles
In this section, you create and manage connection profiles for a source Oracle database and a destination bucket in Cloud Storage.
When you create these connection profiles, you create records that contain information about the source database and destination Cloud Storage bucket. Datastream uses the information in the connection profiles to transfer data from the source database into a folder in the destination bucket.
Creating and managing connection profiles includes:
- Creating connection profiles for a source Oracle database and a destination bucket in Cloud Storage
- Retrieving information about a connection profile
- Modifying a connection profile
- Performing a discover API call on the source Oracle connection profile. This call lets you to look inside the database to see the objects associated with it. These objects include the schemas and tables that contain the data of the database. When you use Datastream to configure a stream, you may not want to pull all objects out of the database, but rather a subset of the objects (for example, only certain tables and schemas of the database). Use the discover API to help you find (or discover) the subset of database objects that you want to pull.
Create connection profiles
In this procedure, you create two connection profiles: one to a source Oracle database and another to a destination bucket in Cloud Storage.
- Create a connection profile to a source Oracle database. At the prompt, enter the following command:
ORACLE="{\"displayName\":\"DISPLAY_NAME\",\"oracle_profile\":{\"hostname\":\"HOSTNAME\",\"username\":\"USERNAME\",\"password\":\"PASSWORD\",\"database_service\":\"DATABASE_SERVICE\", \"port\":"PORT_NUMBER\"},\"no_connectivity\":{}}"
Use the following table to help you understand the parameter values for the source Oracle database:
Parameter value | Replace with |
---|---|
DISPLAY_NAME | The display name of the connection profile to the source database. |
HOSTNAME | The hostname of the source database server. |
USERNAME | The username of the account for the source database (for example, ROOT). |
PASSWORD | The password of the account for the source database. |
DATABASE_SERVICE | The service that ensures that the source database is protected and monitored. For Oracle databases, the database service is typically ORCL. |
PORT_NUMBER | The port number reserved for the source database. For an Oracle database, the port number is typically 1521. |
At the prompt, enter the
echo $ORACLE | jq
command to see the source connection profile you created in easy-to-read text.{ "displayName": "DISPLAY_NAME", "oracle_profile": { "hostname": "HOSTNAME", "username": "USERNAME", "password": "PASSWORD", "database_service": "DATABASE_SERVICE", "port": PORT_NUMBER }, "no_connectivity": {} }
Submit the Oracle connection profile so that it can be created. At the prompt, enter the following command:
curl -X POST -d $ORACLE -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/connectionProfiles? connection_profile_id=SOURCE_CONNECTION_PROFILE_ID Use the following table to help you understand the parameter values for this command:
Parameter value Replace with DATASTREAM_API_VERSION The current version of the Datastream API (for example, v1
).PROJECT_PATH The full path of your Google Cloud project (for example, projects/$PROJECT/locations/YOUR_PROJECT_LOCATION
).SOURCE_CONNECTION_PROFILE_ID The unique identifier reserved for this connection profile (for example, cp-1). Verify that you see the following lines of code:
{ "name": "PROJECT_PATH/operations/operation-SOURCE_CONNECTION_PROFILE_OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.datastream.DATASTREAM_API_VERSION.OperationMetadata", "createTime": "DATE_AND_TIME_STAMP", "target": "datastream.googleapis.com/DATASREAM_VERSION/PROJECT_PATH/connectionProfiles/
SOURCE_CONNECTION_PROFILE_ID", "verb": "create", "requestedCancellation": false, "apiVersion": "DATASTREAM_API_VERSION" }, "done": false } Create a connection profile to a destination bucket in Cloud Storage. At the prompt, enter the following command:
GOOGLECLOUDSTORAGE="{\"displayName\":\"DISPLAY_NAME\",\"gcs_profile\":{\"bucket_name\":\"BUCKET_NAME\",
\"root_path\":\"/FOLDER_PATH\"},\"no_connectivity\":{}}" Use the following table to help you understand the parameter values for the destination bucket:
Parameter value Replace with DISPLAY_NAME The display name of the connection profile to the destination bucket. BUCKET_NAME The name of the destination bucket. FOLDER_PATH The folder in the destination bucket into which Datastream will transfer data from the source database (for example, /root/path). At the prompt, enter the
echo $GOOGLECLOUDSTORAGE | jq
command to see the destination connection profile you created in easy-to-read text.{ "displayName": "DISPLAY_NAME", "gcs_profile": { "bucket_name": "BUCKET_NAME", "root_path": "/FOLDER_PATH" }, "no_connectivity": {} }
Submit the Cloud Storage connection profile so that it can be created. At the prompt, enter the following command:
curl -X POST -d $GOOGLECLOUDSTORAGE -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/connectionProfiles? connection_profile_id=DESTINATION_CONNECTION_PROFILE_ID Verify that you see the following lines of code:
{ "name": "PROJECT_PATH/operations/operation-DESTINATION_CONNECTION_PROFILE_OPERATION_ID", "metadata": { "@type": "@type": "type.googleapis.com/google.cloud.datastream.DATASTREAM_API_VERSION.
OperationMetadata", "createTime": "DATE_AND_TIME_STAMP", "target": "datastream.googleapis.com/DATASTREAM_VERSION/PROJECT_PATH/connectionProfiles/ DESTINATION_CONNECTION_PROFILE_ID", "verb": "create", "requestedCancellation": false, "apiVersion": "DATASTREAM_API_VERSION" }, "done": false } Confirm that both connection profiles are created. At the prompt, enter the following command:
curl -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/connectionProfiles Verify that you receive two returned results for both the source and destination connection profiles.
{ "connectionProfiles": [ { "name": "PROJECT_PATH/connectionProfiles/DESTINATION_CONNECTION_PROFILE_ID", "createTime": "DATE_AND_TIME_STAMP", "updateTime": "DATE_AND_TIME_STAMP", "displayName": "DISPLAY_NAME", "gcsProfile": { "bucketName": "BUCKET_NAME", "rootPath": "FOLDER_PATH" }, "noConnectivity": {} }, { "name": "PROJECT_PATH/connectionProfiles/SOURCE_CONNECTION_PROFILE_ID", "createTime": "DATE_AND_TIME_STAMP", "updateTime": "DATE_AND_TIME_STAMP", "displayName": "DISPLAY_NAME", "oracleProfile": { "hostname": "HOSTNAME", "port": PORT_NUMBER, "username": "USERNAME", "databaseService": "DATABASE_SERVICE" }, "noConnectivity": {} } ] }
Manage connection profiles
In this procedure, you manage the connection profiles that you created for a source Oracle database and a destination bucket in Cloud Storage. This includes:
- Retrieving information about the destination Cloud Storage connection profile
- Modifying this connection profile. For this tutorial, you'll change the folder of the destination Cloud Storage bucket to /root/tutorial. Datastream transfers data from the source database into this folder.
- Performing a discover API call on the source Oracle connection profile
Retrieve information about the destination Cloud Storage connection profile. At the prompt, enter the following command:
curl -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/connectionProfiles/ DESTINATION_CONNECTION_PROFILE_ID Verify that you see information about this connection profile.
{ "name": "PROJECT_PATH/connectionProfiles/DESTINATION_CONNECTION_PROFILE_ID", "createTime": "DATE_AND_TIME_STAMP", "updateTime": "DATE_AND_TIME_STAMP", "displayName": "DISPLAY_NAME", "gcsProfile": { "bucketName": "BUCKET_NAME", "rootPath": "FOLDER_PATH" }, "noConnectivity": {} }
Modify this connection profile. To do this, first, set an UPDATE variable. This variable contains the values of the connection profile that you want to change. For this tutorial, you'll change the folder of the destination bucket to /root/tutorial.
To set the variable, at the prompt, enter the following command:
UPDATE="{\"gcsProfile\":{\"rootPath\":\"/root/tutorial\"}}"
At the prompt, enter the following command:
curl -X PATCH -d $UPDATE -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/connectionProfiles/ DESTINATION_CONNECTION_PROFILE_ID?update_mask=gcsProfile.rootPath Verify that you see the following lines of code:
{ "name": "PROJECT_PATH/operations/operation-DESTINATION_CONNECTION_PROFILE_OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.datastream.DATASTREAM_API_VERSION.OperationMetadata", "createTime": "DATE_AND_TIME_STAMP", "target": "PROJECT_PATH/connectionProfiles/DESTINATION_CONNECTION_PROFILE_ID", "verb": "update", "requestedCancellation": false, "apiVersion": "DATASTREAM_API_VERSION" }, "done": false }
Confirm that the connection profile is modified. At the prompt, enter the following command:
curl -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/connectionProfiles/ DESTINATION_CONNECTION_PROFILE_ID Verify that the folder of the destination bucket of the Cloud Storage connection profile is now /root/tutorial.
{ "name": "PROJECT_PATH/connectionProfiles/DESTINATION_CONNECTION_PROFILE_ID", "createTime": "DATE_AND_TIME_STAMP", "updateTime": "DATE_AND_TIME_STAMP", "displayName": "DISPLAY_NAME", "gcsProfile": { "bucketName": "BUCKET_NAME", "rootPath": "/root/tutorial" }, "noConnectivity": {} }
Use the Datastream discover API to discover the schemas and tables of the source Oracle database. Datastream provides access to this database through the source connection profile.
Discover the schemas of the Oracle database. At the prompt, enter the following command:
curl -X POST -d "{\"connection_profile_name\":\"projects/YOUR_PROJECT_NUMBER/locations/
YOUR_PROJECT_LOCATION/connectionProfiles/SOURCE_CONNECTION_PROFILE_ID\", \"oracle_rdbms\":{\"oracleSchemas\":[{\"schema\":\"ROOT\"}]}}" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/connectionProfiles:discover Verify that Datastream retrieves all schemas of your database.
Retrieve the tables of a schema in your database. For this tutorial, you'll use the discover API to retrieve the tables of the ROOT schema. However, you can discover the tables of any schema in your database.
At the prompt, enter the following command:
curl -X POST -d "{\"connection_profile_name\":\"projects/YOUR_PROJECT_NUMBER/locations/
YOUR_PROJECT_LOCATION/connectionProfiles/SOURCE_CONNECTION_PROFILE_ID\", \"oracle_rdbms\":{\"oracleSchemas\":[{\"schema\":\"ROOT\"}]}}" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/connectionProfiles:discover - Verify that Datastream retrieves all tables of the schema that you specified (for this tutorial, the ROOT schema).
Now that you created and managed connection profiles for a source Oracle database and a destination bucket in Cloud Storage, you're ready to create and manage a stream in Datastream.
Creating and managing a stream
In this section, you create and manage a stream. Datastream uses this stream to transfer data, schemas, and tables from the source database into a folder in the destination Cloud Storage bucket.
Creating and managing a stream includes:
- Validating a stream to ensure that the stream will run successfully, and that all validation checks pass. These checks include:
- Whether the source is configured properly to allow Datastream to stream data from it.
- Whether the stream can connect to both the source and the destination.
- The end-to-end configuration of the stream.
- Creating the stream with the following lists:
- An allow list. This list specifies the tables and schemas in the source database that Datastream can transfer into a folder in the destination bucket in Cloud Storage. For this tutorial, this is the /root/tutorial folder.
- A reject list. This list specifies the tables and schemas in the source database that Datastream is restricted from transferring into the folder in the Cloud Storage destination bucket.
- Retrieving information about the stream
- Modifying the stream
- Starting the stream so that Datastream can transfer data, schemas, and tables from the source database into a folder in the destination Cloud Storage bucket.
- Using the Fetch Errors API to detect any errors associated with the stream
- Pausing the stream. When a stream is paused, Datastream won't pull any new data from the source database into the destination bucket.
- Resuming the paused stream so that Datastream can continue to transfer data into the destination bucket.
Create a stream
In this procedure, you create a stream from the source Oracle database into a folder in the destination Cloud Storage bucket. The stream you create will include both an allow list and a reject list.
Set a SCHEMAS variable. This variable defines the schemas that contain the data and tables that you want Datastream to retrieve from the source database and transfer into the /root/tutorial folder of the Cloud Storage destination bucket. For this tutorial, you'll set the SCHEMAS variable to be associated with the ROOT schema.
At the prompt, enter the following command:
SCHEMAS="{\"oracleSchemas\":[{\"schema\":\"ROOT\"}]}"
At the prompt, enter the echo $SCHEMAS | jq command to see the ROOT schema that you defined for this variable in easy-to-read text.
Create a stream. At the prompt, enter the following command:
STREAM="{\"display_name\":\"DISPLAY_NAME\",\"source_config\":{\"source_connection_profile_name\":\"
PROJECT_PATH/connectionProfiles/SOURCE_CONNECTION_PROFILE_ID",\"oracle_source_config\": {\"allowlist\":$SCHEMAS,\"rejectlist\":{}}},\"destination_config\":{\"destination_connection_profile_name\" :\"PROJECT_PATH/connectionProfiles/DESTINATION_CONNECTION_PROFILE_ID\",\"gcs_destination_config\": {\"file_rotation_mb\":5,\"file_rotation_interval\":{\"seconds\":15},\"avro_file_format\":{}}, \"backfill_all\":{}}}" At the prompt, enter the
echo $STREAM | jq
command to see the stream you created in easy-to-read text.{ "display_name": "DISPLAY_NAME", "source_config": { "source_connection_profile_name": "PROJECT_PATH/connectionProfiles/SOURCE_CONNECTION_PROFILE_ID", "oracle_source_config": { "allowlist": { "oracleSchemas": [ { "schema": "ROOT" } ] }, "rejectlist": {} } }, "destination_config": { "destination_connection_profile_name": "PROJECT_PATH/connectionProfiles/
DESTINATION_CONNECTION_PROFILE_ID", "gcs_destination_config": { "file_rotation_mb": 5, "file_rotation_interval": { "seconds": 15 }, "avro_file_format": {} } }, "backfill_all": {} } Use this table to help you understand the following parameters of the stream:
Parameter Description allowlist The schemas, containing tables and data, that will be transferred from the source database into a folder of the Cloud Storage destination bucket. For this tutorial, all tables and data from the ROOT schema (and only this schema) will be transferred into the /root/tutorial folder of the destination bucket. rejectlist Any schemas, containing tables and data, that won't be transferred into a folder of the Cloud Storage destination bucket. For this tutorial, the {} value signifies that no tables and data from the source database will be prevented from being transferred into the destination bucket. file_rotation_mb The size (in MBytes) of files that contain data that's being transferred from the source database into a folder in the Cloud Storage destination bucket. For this tutorial, as data is being retrieved from the source database, it's written into 5-MB files. If any data exceeds this size, then the data will be segmented into multiple 5-MB files. file_rotation_interval The number of seconds that will elapse before Datastream closes an existing file in a folder of the Cloud Storage destination bucket and opens another file to contain data being transferred from the source database. For this tutorial, the file rotation interval is set to 15 seconds. avro_file_format The format of the files that Datastream will transfer from the source database into a folder of the Cloud Storage destination bucket. For this tutorial, Avro is the file format.
backfill_all This parameter is associated with historical backfill. By setting this parameter to an empty dictionary ({}), Datastream will backfill:
- Historical data, in addition to ongoing changes to the data, from the source database into the destination.
- Schemas and tables, from the source into the destination.
Validate the stream to ensure that the stream will run successfully, and that all validation checks pass. At the prompt, enter the following command:
curl -X POST -d $STREAM -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
"https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/streams?stream_id= STREAM_ID&validate_only=true" Verify that you see the
{}
line of code. This indicates that the stream has passed all validation checks and there are no errors associated with the stream.Submit the stream so that it can be created. At the prompt, enter the following command:
curl -X POST -d $STREAM -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/streams?stream_id=STREAM_ID Verify that you see the following lines of code:
{ "name": "PROJECT_PATH/operations/operation-STREAM_OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.datastream.DATASTREAM_API_VERSION.OperationMetadata", "createTime": "DATE_AND_TIME_STAMP", "target": "PROJECT_PATH/streams/STREAM_ID", "verb": "create", "requestedCancellation": false, "apiVersion": "DATASTREAM_API_VERSION" }, "done": false }
Confirm that the stream is created. At the prompt, enter the following command:
curl -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/streams Verify that you receive a returned result for the stream that you created.
{ "streams": [ { "name": "PROJECT_PATH/streams/STREAM_ID", "createTime": "DATE_AND_TIME_STAMP", "updateTime": "DATE_AND_TIME_STAMP", "displayName": "DISPLAY_NAME", "sourceConfig": { "sourceConnectionProfileName": "projects/YOUR_PROJECT_NUMBER/locations/YOUR_PROJECT_LOCATION
/connectionProfiles/SOURCE_CONNECTION_PROFILE_ID", "oracleSourceConfig": { "allowlist": { "oracleSchemas": [ { "schema": "ROOT" } ] }, "rejectlist": {} } }, "destinationConfig": { "destinationConnectionProfileName": "projects/YOUR_PROJECT_NUMBER/locations/YOUR_PROJECT_LOCATION /connectionProfiles/DESTINATION_CONNECTION_PROFILE_ID", "gcsDestinationConfig": { "fileRotationMb": 5, "fileRotationInterval": "15s" "avroFileFormat": {} } }, "state": "CREATED", "backfillAll": {} } ] }
Manage the stream
In this procedure, you use the stream that you created to transfer data from a source Oracle database into a folder in a Cloud Storage destination bucket. This includes:
- Retrieving information about the stream
- Modifying the stream
- Starting the stream
- Using the Fetch Errors API to detect any errors associated with the stream
- Pausing and resuming the stream
Retrieve information about the stream. At the prompt, enter the following command:
curl -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/streams/STREAM_ID Verify that you see information about this stream.
{ "name": "PROJECT_PATH/streams/STREAM_ID", "createTime": "DATE_AND_TIME_STAMP", "updateTime": "DATE_AND_TIME_STAMP", "displayName": "DISPLAY_NAME", "sourceConfig": { "sourceConnectionProfileName": "projects/YOUR_PROJECT_NUMBER/locations/YOUR_PROJECT_LOCATION
/connectionProfiles/SOURCE_CONNECTION_PROFILE_ID", "oracleSourceConfig": { "allowlist": { "oracleSchemas": [ { "schema": "ROOT" } ] }, "rejectlist": {} } }, "destinationConfig": { "destinationConnectionProfileName": "projects/YOUR_PROJECT_NUMBER/locations/YOUR_PROJECT_LOCATION /connectionProfiles/DESTINATION_CONNECTION_PROFILE_ID", "gcsDestinationConfig": { "fileRotationMb": 5, "fileRotationInterval": "15s" "avroFileFormat": {} } }, "state": "CREATED", "backfillAll": {} } Modify this stream. To do this, first, set an UPDATE variable. This variable contains the values of the stream that you want to change. For this tutorial, change the size (in MBytes) of files that contain data that's being transferred from the source database into a folder in the Cloud Storage destination bucket (from 5 MBytes to 100 MBytes). As data is being retrieved from the source database, it's now written into 100-MB files. If any data exceeds this size, then the data will be segmented into multiple 100-MB files.
To set the variable, at the prompt, enter the following command:
UPDATE="{\"destination_config\":{\"gcs_destination_config\":{\"file_rotation_mb\":100}}}"
At the prompt, enter the following command:
curl -X PATCH -d $UPDATE -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/streams/STREAM_ID/ ?update_mask=destination_config.gcs_destination_config.file_rotation_mb Verify that you see the following lines of code:
{ "name": "PROJECT_PATH/operations/operation-STREAM_OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.datastream.DATASTREAM_API_VERSION.OperationMetadata", "createTime": "DATE_AND_TIME_STAMP", "target": "PROJECT_PATH/streams/STREAM_ID", "verb": "update", "requestedCancellation": false, "apiVersion": "DATASTREAM_API_VERSION" }, "done": false }
Confirm that the stream is modified. At the prompt, enter the following command:
curl -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/streams/STREAM_ID Verify that the value of the fileRotationMb parameter for the Cloud Storage connection profile is now
100
.{ "name": "PROJECT_PATH/streams/STREAM_ID", "createTime": "DATE_AND_TIME_STAMP", "updateTime": "DATE_AND_TIME_STAMP", "displayName": "DISPLAY_NAME", "sourceConfig": { "sourceConnectionProfileName": "projects/YOUR_PROJECT_NUMBER/locations/YOUR_PROJECT_LOCATION
/connectionProfiles/SOURCE_CONNECTION_PROFILE_ID", "oracleSourceConfig": { "allowlist": { "oracleSchemas": [ { "schema": "ROOT" } ] }, "rejectlist": {} } }, "destinationConfig": { "destinationConnectionProfileName": "projects/YOUR_PROJECT_NUMBER/locations/YOUR_PROJECT_LOCATION /connectionProfiles/DESTINATION_CONNECTION_PROFILE_ID", "gcsDestinationConfig": { "fileRotationMb": 100, "fileRotationInterval": "15s" "avroFileFormat": {} } }, "state": "CREATED", "backfillAll": {} } Start the stream. To do so:
Change the
UPDATE
variable. At the prompt, enter the following command:UPDATE="{\"state\":\"RUNNING\"}"
Then, enter the following command:
curl -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/streams/STREAM_ID ?updateMask=state
Verify that you see the following lines of code.
{ "name": "PROJECT_PATH/operations/operation-STREAM_OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.datastream.DATASTREAM_API_VERSION.OperationMetadata", "createTime": "DATE_AND_TIME_STAMP", "target": "PROJECT_PATH/streams/STREAM_ID", "verb": "start", "requestedCancellation": false, "apiVersion": "DATASTREAM_API_VERSION" }, "done": false }
After a few minutes, retrieve information about the stream to confirm that it started:
curl -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/streams/STREAM_ID Verify that the state of the stream has changed from
CREATED
toRUNNING
.{ "name": "PROJECT_PATH/streams/STREAM_ID", "createTime": "DATE_AND_TIME_STAMP", "updateTime": "DATE_AND_TIME_STAMP", "displayName": "DISPLAY_NAME", "sourceConfig": { "sourceConnectionProfileName": "projects/YOUR_PROJECT_NUMBER/locations/YOUR_PROJECT_LOCATION
/connectionProfiles/SOURCE_CONNECTION_PROFILE_ID", "oracleSourceConfig": { "allowlist": { "oracleSchemas": [ { "schema": "ROOT" } ] }, "rejectlist": {} } }, "destinationConfig": { "destinationConnectionProfileName": "projects/YOUR_PROJECT_NUMBER/locations/YOUR_PROJECT_LOCATION /connectionProfiles/DESTINATION_CONNECTION_PROFILE_ID", "gcsDestinationConfig": { "fileRotationMb": 100, "fileRotationInterval": "15s" "avroFileFormat": {} } }, "state": "RUNNING", "backfillAll": {} } Using the Fetch Errors API to retrieve any errors associated with the stream.
At the prompt, enter the following command:
curl -X POST -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/streams/ STREAM_ID:fetchErrors Verify that you see the following lines of code:
{ "name": "PROJECT_PATH/operations/operation-FETCH_ERRORS_OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.datastream.DATASTREAM_API_VERSION.OperationMetadata", "createTime": "DATE_AND_TIME_STAMP", "target": "PROJECT_PATH/streams/STREAM_ID", "verb": "fetchErrors", "requestedCancellation": false, "apiVersion": "DATASTREAM_API_VERSION" }, "done": false }
At the prompt, enter the following command:
curl -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/operations/ operation-FETCH_ERRORS_OPERATION_ID Verify that you see the following lines of code:
{ "name": "PROJECT_PATH/operations/operation-FETCH_ERRORS_OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.datastream.DATASTREAM_API_VERSION
.OperationMetadata", "createTime": "DATE_AND_TIME_STAMP", "endTime": "DATE_AND_TIME_STAMP", "target": "PROJECT_PATH/streams/STREAM_ID", "verb": "fetchErrors", "requestedCancellation": false, "apiVersion": "DATASTREAM_API_VERSION" }, "done": true, "response": { "@type": "type.googleapis.com/google.cloud.datastream.DATASTREAM_API_VERSION .FetchErrorsResponse" } }
Pause the stream. To do so:
Change the
UPDATE
variable. At the prompt, enter the following command:UPDATE="{\"state\":\"PAUSED\"}"
Then, enter the following command:
curl -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/streams/STREAM_ID ?updateMask=state
Verify that you see the following lines of code.
{ "name": "PROJECT_PATH/operations/operation-STREAM_OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.datastream.DATASTREAM_API_VERSION.OperationMetadata", "createTime": "DATE_AND_TIME_STAMP", "target": "PROJECT_PATH/streams/STREAM_ID", "verb": "start", "requestedCancellation": false, "apiVersion": "DATASTREAM_API_VERSION" }, "done": false }
Retrieve information about the stream to confirm that it's paused.
curl -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/streams/STREAM_ID Verify that the state of the stream has changed from
RUNNING
toPAUSED
.{ "name": "PROJECT_PATH/streams/STREAM_ID", "createTime": "DATE_AND_TIME_STAMP", "updateTime": "DATE_AND_TIME_STAMP", "displayName": "DISPLAY_NAME", "sourceConfig": { "sourceConnectionProfileName": "projects/YOUR_PROJECT_NUMBER/locations/YOUR_PROJECT_LOCATION
/connectionProfiles/SOURCE_CONNECTION_PROFILE_ID", "oracleSourceConfig": { "allowlist": { "oracleSchemas": [ { "schema": "ROOT" } ] }, "rejectlist": {} } }, "destinationConfig": { "destinationConnectionProfileName": "projects/YOUR_PROJECT_NUMBER/locations/YOUR_PROJECT_LOCATION /connectionProfiles/DESTINATION_CONNECTION_PROFILE_ID", "gcsDestinationConfig": { "fileRotationMb": 100, "fileRotationInterval": "15s" "avroFileFormat": {} } }, "state": "PAUSED", "backfillAll": {} } Resume the paused stream. To do so:
Change the
UPDATE
variable. At the prompt, enter the following command:UPDATE="{\"state\":\"RUNNING\"}"
Then, enter the following command:
curl -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/streams/STREAM_ID ?updateMask=state
Verify that you see the following lines of code.
{ "name": "PROJECT_PATH/operations/operation-STREAM_OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.datastream.DATASTREAM_API_VERSION.OperationMetadata", "createTime": "DATE_AND_TIME_STAMP", "target": "PROJECT_PATH/streams/STREAM_ID", "verb": "start", "requestedCancellation": false, "apiVersion": "DATASTREAM_API_VERSION" }, "done": false }
After a few seconds, retrieve information about the stream to confirm that it's running again.
curl -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/streams/STREAM_ID Verify that the state of the stream has changed from
PAUSED
back toRUNNING
.{ "name": "PROJECT_PATH/streams/STREAM_ID", "createTime": "DATE_AND_TIME_STAMP", "updateTime": "DATE_AND_TIME_STAMP", "displayName": "DISPLAY_NAME", "sourceConfig": { "sourceConnectionProfileName": "projects/YOUR_PROJECT_NUMBER/locations/YOUR_PROJECT_LOCATION
/connectionProfiles/SOURCE_CONNECTION_PROFILE_ID", "oracleSourceConfig": { "allowlist": { "oracleSchemas": [ { "schema": "ROOT" } ] }, "rejectlist": {} } }, "destinationConfig": { "destinationConnectionProfileName": "projects/YOUR_PROJECT_NUMBER/locations/YOUR_PROJECT_LOCATION /connectionProfiles/DESTINATION_CONNECTION_PROFILE_ID", "gcsDestinationConfig": { "fileRotationMb": 100, "fileRotationInterval": "15s" "avroFileFormat": {} } }, "state": "RUNNING", "backfillAll": {} }
Now that you created and managed a stream, confirmed that there are no errors associated with the stream, and that the state of the stream is RUNNING
, you're ready to verify that it can transfer data from the source database into a folder in the Cloud Storage destination bucket.
Verifying the stream
In this procedure, you confirm that Datastream:
- Transfers the data from all tables associated with the
ROOT
schema of your source Oracle database into the/root/tutorial
folder in the Cloud Storage destination bucket. - Translates the data into the Avro file format.
Go to the Storage browser page in Cloud Storage.
Click the link that contains your bucket.
If the OBJECTS tab isn't active, then click it.
Click the root folder, and then click the tutorial folder.
Verify that you see folders that represent tables of the
ROOT
schema of your source Oracle database.Click one of the table folders and drill down until you see data that's associated with the table.
Click a file that represents the data and click DOWNLOAD.
Open this file in an Avro tool (for example, Avro Viewer) to ensure that the content is readable. This confirms that Datastream also translated the data into the Avro file format.
Clean up
After you've finished this tutorial, you can clean up the resources that you created on Datastream so they won't take up quota and you won't be billed for them in the future. The following sections describe how to delete or turn off these resources.
Delete the project
The easiest way to eliminate billing is to delete the project that you created for the tutorial.
To delete the project:
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Delete the Cloud Storage destination bucket
In the left-hand Navigation Drawer of Cloud Storage, click the Browser item.
Select the checkbox to the left of your bucket and click DELETE.
In the Delete bucket? window, enter the name of your bucket in the text field and click CONFIRM.
Delete the stream
Make sure that your Cloud Shell application is active.
At the prompt, enter the following command:
curl -X DELETE -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/streams/STREAM_ID Verify that you see the following lines of code:
{ "name": "PROJECT_PATH/operations/operation-STREAM_OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.datastream.DATASTREAM_API_VERSION.OperationMetadata", "createTime": "DATE_AND_TIME_STAMP", "target": "PROJECT_PATH/streams/STREAM_ID", "verb": "delete", "requestedCancellation": false, "apiVersion": "DATASTREAM_API_VERSION" }, "done": false }
Confirm that the stream is deleted. At the prompt, enter the following command:
curl -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/streams Verify that a null
{}
value is returned. This signifies that there aren't any streams in Datastream anymore, and the stream that you created is deleted.
Delete the connection profiles
Delete the connection profile to the source Oracle database. At the prompt, enter the following command:
curl -X DELETE -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/connectionProfiles/ SOURCE_CONNECTION_PROFILE_ID Verify that you see the following lines of code:
{ "name": "PROJECT_PATH/operations/operation-SOURCE_CONNECTION_PROFILE_OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.datastream.DATASTREAM_API_VERSION.OperationMetadata", "createTime": "DATE_AND_TIME_STAMP", "target": "PROJECT_PATH/connectionProfiles/SOURCE_CONNECTION_PROFILE_ID", "verb": "delete", "requestedCancellation": false, "apiVersion": "DATASTREAM_API_VERSION" }, "done": false }
Delete the connection profile to the destination bucket in Cloud Storage. At the prompt, enter the following command:
curl -X DELETE -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/connectionProfiles/ DESTINATION_CONNECTION_PROFILE_ID Verify that you see the following lines of code:
{ "name": "PROJECT_PATH/operations/operation-DESTINATION_CONNECTION_PROFILE_OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.datastream.DATASTREAM_API_VERSION.OperationMetadata", "createTime": "DATE_AND_TIME_STAMP", "target": "PROJECT_PATH/connectionProfiles/DESTINATION_CONNECTION_PROFILE_ID", "verb": "delete", "requestedCancellation": false, "apiVersion": "DATASTREAM_API_VERSION" }, "done": false }
Confirm that both connection profiles are deleted. At the prompt, enter the following command:
curl -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json"
https://datastream.googleapis.com/DATASTREAM_API_VERSION/PROJECT_PATH/connectionProfiles Verify that a null
{}
value is returned. This signifies that there aren't any connection profiles in Datastream anymore, and the profiles that you created are deleted.
What's next
- Learn more about Datastream.
- Try out other Google Cloud features for yourself. Have a look at our tutorials.