Stream changes to Pub/Sub using optional Cloud Function trigger


This tutorial shows how to use the Bigtable change streams to Pub/Sub template, including how to set up a topic and configure the template. You can optionally create a Cloud Function, in the programming language of your choice, that is triggered by the event stream.

This tutorial is intended for technical users who are familiar with Bigtable , writing code, and event streaming services.

Objectives

This tutorial shows you how to do the following:

  • Create a Bigtable table with a change stream enabled.
  • Create a Pub/Sub topic with the Bigtable change stream schema.
  • Deploy a Bigtable change stream to a Pub/Sub pipeline on Dataflow using the template.
  • View the event stream in Pub/Sub directly or in the logs of a Cloud Function.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, see Clean up.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Dataflow, Cloud Bigtable API, Cloud Bigtable Admin API, Pub/Sub, Cloud Functions, and Cloud Storage APIs.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project.

  7. Enable the Dataflow, Cloud Bigtable API, Cloud Bigtable Admin API, Pub/Sub, Cloud Functions, and Cloud Storage APIs.

    Enable the APIs

  8. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

  9. Update and install the cbt CLI .
    gcloud components update
    gcloud components install cbt
    

Create a Pub/Sub topic

  1. In the Google Cloud console, go to the Pub/Sub Topics page.

    Go to Topics

  2. Click Create topic.

  3. Set the ID to bigtable-change-stream-topic.

  4. Select Use a schema.

  5. In the Select a Pub/Sub schema drop-down, click Create new schema. This opens a new tab where you define the schema.

    1. Set the schema ID to bigtable-change-stream-schema.
    2. Set the schema type to Avro.
    3. Paste the following as the schema definition. More information about the schema can be found on the template documentation page.
      {
          "name" : "ChangelogEntryMessage",
          "type" : "record",
          "namespace" : "com.google.cloud.teleport.bigtable",
          "fields" : [
            { "name" : "rowKey", "type" : "bytes"},
            {
              "name" : "modType",
              "type" : {
                "name": "ModType",
                "type": "enum",
                "symbols": ["SET_CELL", "DELETE_FAMILY", "DELETE_CELLS", "UNKNOWN"]}
            },
            { "name": "isGC", "type": "boolean" },
            { "name": "tieBreaker", "type": "int"},
            { "name": "columnFamily", "type": "string"},
            { "name": "commitTimestamp", "type" : "long"},
            { "name" : "sourceInstance", "type" : "string"},
            { "name" : "sourceCluster", "type" : "string"},
            { "name" : "sourceTable", "type" : "string"},
            { "name": "column", "type" : ["null", "bytes"]},
            { "name": "timestamp", "type" : ["null", "long"]},
            { "name": "timestampFrom", "type" : ["null", "long"]},
            { "name": "timestampTo", "type" : ["null", "long"]},
            { "name" : "value", "type" : ["null", "bytes"]}
        ]
      }
    
    1. Click Create to create the schema.
  6. Close the Create schema tab, refresh the schema list, and select your newly defined schema.

  7. Click Create to create the topic.

Optional: Create a Cloud Function

You might want to process the Pub/Sub stream with a Cloud Function.

  1. On the Details page for the bigtable-change-stream-topic topic, click Trigger Cloud Function .
  2. In the Function name field, enter the name bt-ps-tutorial-function.
  3. In the Source Code section, click the Runtime drop-down, and then select the runtime and programming language of your choice. A hello world is generated that prints out the change stream as it comes in. See the documentation to learn more about writing Cloud Functions.
  4. Use the default values for all other fields.
  5. Click Deploy function.

Create a table with a change stream enabled

  1. In the Google Cloud console, go to the Bigtable Instances page.

    Go to Instances

  2. Click the ID of the instance that you are using for this tutorial.

    If you don't have an instance available, create an instance with the default configurations in a region near you.

  3. In the left navigation pane, click Tables.

  4. Click Create a table.

  5. Name the table change-streams-pubsub-tutorial.

  6. Add a column family named cf.

  7. Select Enable change stream.

  8. Click Create.

Initialize a data pipeline to capture the change stream

  1. On the Bigtable Tables page, find your table change-streams-pubsub-tutorial.
  2. In the Change stream column, click Connect.
  3. In the dialog, select Pub/Sub.
  4. Click Create Dataflow job.
  5. On the Dataflow Create job page, set the output Pub/Sub topic name to: bigtable-change-stream-topic.
  6. Set the Bigtable application profile ID to default.
  7. Click Run job.
  8. Wait until the job status is Starting or Running before proceeding. It takes around 5 minutes once the job is queued.

Write some data to Bigtable

  1. In the Cloud Shell, write a few rows to Bigtable so the change log can write some data to the Pub/Sub stream. As long as you write the data after the job is created, the changes appear. You don't have to wait for the job status to become running.

    cbt -instance=BIGTABLE_INSTANCE_ID -project=YOUR_PROJECT_ID \
        set change-streams-pubsub-tutorial user123 cf:col1=abc
    cbt -instance=BIGTABLE_INSTANCE_ID -project=YOUR_PROJECT_ID \
        set change-streams-pubsub-tutorial user546 cf:col1=def
    cbt -instance=BIGTABLE_INSTANCE_ID -project=YOUR_PROJECT_ID \
        set change-streams-pubsub-tutorial user789 cf:col1=ghi
    

View the change logs in Pub/Sub

  1. In the Google Cloud console, go to the Pub/Sub Subscriptions page.

    Go to Subscriptions

  2. Click the automatically created subscription for your topic bigtable-change-stream-topic. It should be named bigtable-change-stream-topic-sub.

  3. Go to to the Messages tab.

  4. Click Pull.

  5. Explore the list of messages and view the data that you wrote.

    Change log messages in
Pub/Sub

Optional: View the changes in the Cloud Functions logs

If you created a Cloud Functions function, you can view the changes in the logs.

  1. In the Google Cloud console, go to Cloud Functions.

    Go to Cloud Functions

  2. Click your function bt-ps-tutorial-function.

  3. Go to the Logs tab.

  4. Ensure that Severity is set to at least Info so you can see the logs.

  5. Explore the logs and view the data that you wrote.

The output looks similar to the following:

Pub/Sub message: {"rowKey":"user789","modType":"SET_CELL","isGC":false,"tieBreaker":0,"columnFamily":"cf","commitTimestamp":1695653833064548,"sourceInstance":"YOUR-INSTANCE","sourceCluster":"YOUR-INSTANCE-c1","sourceTable":"change-streams-pubsub-tutorial","column":{"bytes":"col1"},"timestamp":{"long":1695653832278000},"timestampFrom":null,"timestampTo":null,"value":{"bytes":"ghi"}}

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

Delete the Bigtable table

  1. In the Google Cloud console, go to the Bigtable Instances page.

    Go to Instances

  2. Click the ID of the instance that you are using for this tutorial.

  3. In the left navigation pane, click Tables.

  4. Find the change-streams-pubsub-tutorial table.

  5. Click Edit.

  6. Clear Enable change stream.

  7. Click Save.

  8. Open the overflow menu for the table.

  9. Click Delete and input the table name to confirm.

Stop the change stream pipeline

  1. In the Google Cloud console, go to the Dataflow Jobs page.

    Go to Jobs

  2. Select your streaming job from the job list.

  3. In the navigation, click Stop.

  4. In the Stop job dialog, cancel your pipeline, and then click Stop job.

Delete the Pub/Sub topic and subcription

  1. In the Google Cloud console, go to the Pub/Sub Topics page.

    Go to Topics

  2. Select the bigtable-change-stream-topic topic.

  3. Click Delete and confirm.

  4. Click Subscriptions in the sidebar.

  5. Select the bigtable-change-stream-topic-sub subscription.

  6. Click Delete and confirm.

Delete the Cloud Function

  1. In the Google Cloud console, go to Cloud Functions.

    Go to Cloud Functions

  2. Select the bt-ps-tutorial-function function.

  3. Click Delete and confirm.

What's next