The Java Database Connectivity (JDBC) to Pub/Sub template is a batch pipeline that ingests data from JDBC source and writes the resulting records to a pre-existing Pub/Sub topic as a JSON string.
Pipeline requirements
- The JDBC source must exist prior to running the pipeline.
- The Pub/Sub output topic must exist prior to running the pipeline.
Template parameters
Required parameters
- driverClassName: The JDBC driver class name. For example,
com.mysql.jdbc.Driver
. - connectionUrl: The JDBC connection URL string. You can pass in this value as a string that's encrypted with a Cloud KMS key and then Base64-encoded. For example: 'echo -n "jdbc:mysql://some-host:3306/sampledb" | gcloud kms encrypt --location=
- driverJars: Comma-separated Cloud Storage paths for JDBC drivers. For example,
gs://your-bucket/driver_jar1.jar,gs://your-bucket/driver_jar2.jar
. - query: The query to run on the source to extract the data. For example,
select * from sampledb.sample_table
. - outputTopic: The Pub/Sub topic to publish to. For example,
projects/<PROJECT_ID>/topics/<TOPIC_NAME>
.
Optional parameters
- username: The username to use for the JDBC connection. You can pass in this value as a string that's encrypted with a Cloud KMS key and then Base64-encoded. For example,
echo -n 'some_username' | glcloud kms encrypt --location=my_location --keyring=mykeyring --key=mykey --plaintext-file=- --ciphertext-file=- | base64
. - password: The password to use for the JDBC connection. You can pass in this value as a string that's encrypted with a Cloud KMS key and then Base64-encoded. For example,
echo -n 'some_password' | glcloud kms encrypt --location=my_location --keyring=mykeyring --key=mykey --plaintext-file=- --ciphertext-file=- | base64
. - connectionProperties: The properties string to use for the JDBC connection. The format of the string must be
[propertyName=property;]*
. For example,unicode=true;characterEncoding=UTF-8
. - KMSEncryptionKey: The Cloud KMS Encryption Key to use to decrypt the username, password, and connection string. If a Cloud KMS key is passed in, the username, password, and connection string must all be passed in encrypted and base64 encoded. For example,
projects/your-project/locations/global/keyRings/your-keyring/cryptoKeys/your-key
. - disabledAlgorithms: Comma separated algorithms to disable. If this value is set to
none
, no algorithm is disabled. Use this parameter with caution, because the algorithms disabled by default might have vulnerabilities or performance issues. For example,SSLv3, RC4
. - extraFilesToStage: Comma separated Cloud Storage paths or Secret Manager secrets for files to stage in the worker. These files are saved in the /extra_files directory in each worker. For example,
gs://<BUCKET_NAME>/file.txt,projects/<PROJECT_ID>/secrets/<SECRET_ID>/versions/<VERSION_ID>
.
Run the template
- Go to the Dataflow Create job from template page. Go to Create job from template
- In the Job name field, enter a unique job name.
- Optional: For Regional endpoint, select a value from the drop-down menu. The default
region is
us-central1
.For a list of regions where you can run a Dataflow job, see Dataflow locations.
- From the Dataflow template drop-down menu, select the JDBC to Pub/Sub template.
- In the provided parameter fields, enter your parameter values.
- Click Run job.
In your shell or terminal, run the template:
gcloud dataflow jobs runJOB_NAME \ --gcs-location gs://dataflow-templates-REGION_NAME /VERSION /flex/Jdbc_to_PubSub \ --regionREGION_NAME \ --parameters \ driverClassName=DRIVER_CLASS_NAME ,\ connectionURL=JDBC_CONNECTION_URL ,\ driverJars=DRIVER_PATHS ,\ username=CONNECTION_USERNAME ,\ password=CONNECTION_PASSWORD ,\ connectionProperties=CONNECTION_PROPERTIES ,\ query=SOURCE_SQL_QUERY ,\ outputTopic=OUTPUT_TOPIC ,\ KMSEncryptionKey=KMS_ENCRYPTION_KEY
Replace the following:
JOB_NAME
: a unique job name of your choiceVERSION
: the version of the template that you want to useYou can use the following values:
latest
to use the latest version of the template, which is available in the non-dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/latest/- the version name, like
2023-09-12-00_RC00
, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/
REGION_NAME
: the region where you want to deploy your Dataflow job—for example,us-central1
DRIVER_CLASS_NAME
: the driver class nameJDBC_CONNECTION_URL
: the JDBC connection URLDRIVER_PATHS
: the comma-separated Cloud Storage path(s) of the JDBC driver(s)CONNECTION_USERNAME
: the JDBC connection usernameCONNECTION_PASSWORD
: the JDBC connection passwordCONNECTION_PROPERTIES
: the JDBC connection properties, if necessarySOURCE_SQL_QUERY
: the SQL query to be run on the source databaseOUTPUT_TOPIC
: the Pub/Sub to publish toKMS_ENCRYPTION_KEY
: the Cloud KMS Encryption Key
To run the template using the REST API, send an HTTP POST request. For more information on the
API and its authorization scopes, see
projects.templates.launch
.
POST https://dataflow.googleapis.com/v1b3/projects/PROJECT_ID /locations/LOCATION /flexTemplates:launch { "launchParameter": { "jobName": "JOB_NAME ", "containerSpecGcsPath": "gs://dataflow-templates-LOCATION /VERSION /flex/Jdbc_to_PubSub" "parameters": { "driverClassName": "DRIVER_CLASS_NAME ", "connectionURL": "JDBC_CONNECTION_URL ", "driverJars": "DRIVER_PATHS ", "username": "CONNECTION_USERNAME ", "password": "CONNECTION_PASSWORD ", "connectionProperties": "CONNECTION_PROPERTIES ", "query": "SOURCE_SQL_QUERY ", "outputTopic": "OUTPUT_TOPIC ", "KMSEncryptionKey":"KMS_ENCRYPTION_KEY " }, "environment": { "zone": "us-central1-f" } } }
Replace the following:
PROJECT_ID
: the Google Cloud project ID where you want to run the Dataflow jobJOB_NAME
: a unique job name of your choiceVERSION
: the version of the template that you want to useYou can use the following values:
latest
to use the latest version of the template, which is available in the non-dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/latest/- the version name, like
2023-09-12-00_RC00
, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/
LOCATION
: the region where you want to deploy your Dataflow job—for example,us-central1
DRIVER_CLASS_NAME
: the driver class nameJDBC_CONNECTION_URL
: the JDBC connection URLDRIVER_PATHS
: the comma-separated Cloud Storage path(s) of the JDBC driver(s)CONNECTION_USERNAME
: the JDBC connection usernameCONNECTION_PASSWORD
: the JDBC connection passwordCONNECTION_PROPERTIES
: the JDBC connection properties, if necessarySOURCE_SQL_QUERY
: the SQL query to be run on the source databaseOUTPUT_TOPIC
: the Pub/Sub to publish toKMS_ENCRYPTION_KEY
: the Cloud KMS Encryption Key
Template source code
What's next
- Learn about Dataflow templates.
- See the list of Google-provided templates.