Copying data to and from Pub/Sub Lite

This page shows how to use the Pub/Sub Copy Pipeline to copy messages between Pub/Sub Lite and other messaging systems.

The Pub/Sub Copy Pipeline is a Dataflow}} Flex Template that copies all data from a Pub/Sub, Pub/Sub Lite, or Apache Kafka topic to a Pub/Sub, Pub/Sub Lite, or Apache Kafka topic, or BigQuery}} table.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.

  4. Enable the Pub/Sub Lite API.

    Enable the API

  5. Create a service account:

    1. In the Cloud Console, go to the Create service account page.

      Go to Create service account
    2. Select a project.
    3. In the Service account name field, enter a name. The Cloud Console fills in the Service account ID field based on this name.

      In the Service account description field, enter a description. For example, Service account for quickstart.

    4. Click Create and continue.
    5. Click the Select a role field.

      Under Quick access, click Basic, then click Owner.

    6. Click Continue.
    7. Click Done to finish creating the service account.

      Do not close your browser window. You will use it in the next step.

  6. Create a service account key:

    1. In the Cloud Console, click the email address for the service account that you created.
    2. Click Keys.
    3. Click Add key, then click Create new key.
    4. Click Create. A JSON key file is downloaded to your computer.
    5. Click Close.
  7. Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the JSON file that contains your service account key. This variable only applies to your current shell session, so if you open a new session, set the variable again.

  8. In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  9. Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.

  10. Enable the Pub/Sub Lite API.

    Enable the API

  11. Create a service account:

    1. In the Cloud Console, go to the Create service account page.

      Go to Create service account
    2. Select a project.
    3. In the Service account name field, enter a name. The Cloud Console fills in the Service account ID field based on this name.

      In the Service account description field, enter a description. For example, Service account for quickstart.

    4. Click Create and continue.
    5. Click the Select a role field.

      Under Quick access, click Basic, then click Owner.

    6. Click Continue.
    7. Click Done to finish creating the service account.

      Do not close your browser window. You will use it in the next step.

  12. Create a service account key:

    1. In the Cloud Console, click the email address for the service account that you created.
    2. Click Keys.
    3. Click Add key, then click Create new key.
    4. Click Create. A JSON key file is downloaded to your computer.
    5. Click Close.
  13. Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the JSON file that contains your service account key. This variable only applies to your current shell session, so if you open a new session, set the variable again.

Copying data from Pub/Sub to Pub/Sub Lite

Create a Lite topic

Create a Lite topic using the following steps:

  1. In the Cloud Console, go to the Lite Topics page.

    Go to the Lite Topics page

  2. Click Create Lite topic.

  3. Select a region and a zone.

  4. In the Name section, enter your-lite-topic as the Lite topic ID. The Lite topic name includes the Lite topic ID, the zone, and the project number.

  5. Click Create.

Create a Lite subscription

Create a Lite subscription using the following steps:

  1. In the Cloud Console, go to the Lite Subscriptions page.

    Go to the Lite Subscriptions page

  2. Click Create Lite subscription.

  3. In the Lite subscription ID field, enter your-lite-subscription.

  4. Select a Lite topic to receive messages from.

  5. In the Delivery requirement section, select Deliver messages after stored.

  6. Click Create.

The Lite subscription is in the same zone as the Lite topic.

Run the pipeline

Run the following gcloud command to copy taxi ride data from a public Pub/Sub topic to your new Lite topic.

gcloud dataflow flex-template run "copy-taxirides-to-pubsub-lite-`date +%Y%m%d-%H%M%S`" \
    --template-file-gcs-location "gs://pubsub-streaming-sql-copier/template/copier.json" \
    --region "REGION" \
    --parameters sourceType=pubsub \
    --parameters sourceLocation="projects/pubsub-public-data/topics/taxirides-realtime" \
    --parameters sinkType=pubsublite \
    --parameters sinkLocation="projects/PROJECT_ID/locations/ZONE/topics/your-lite-topic"

Copying data from Pub/Sub Lite to Pub/Sub

In this section, you will run a pipeline to copy the data from your newly populated Pub/Sub Lite subscription to a new Pub/Sub topic.

Create a Pub/Sub topic

  1. Go to the Pub/Sub topics page in the Cloud Console.

    Go to the Pub/Sub topics page

  2. Click Create a topic.

    Screenshot that shows the Create a topic dialog in the console

  3. In the Topic ID field, provide a unique topic name, for example, your-pubsub-topic.

  4. Click Save.

Run the pipeline

Run the following gcloud command to copy the data from your Pub/Sub Lite subscription (created in the section above) to the Pub/Sub topic you just created.

gcloud dataflow flex-template run "copy-taxirides-to-pubsub-lite-`date +%Y%m%d-%H%M%S`" \
    --template-file-gcs-location "gs://pubsub-streaming-sql-copier/template/copier.json" \
    --region "REGION" \
    --parameters sourceType=pubsublite \
    --parameters sourceLocation="projects/PROJECT_ID/locations/ZONE/subscriptions/your-lite-subscription" \
    --parameters sinkType=pubsublite \
    --parameters sinkLocation="projects/PROJECT_ID/locations/ZONE/topics/your-pubsub-topic"

Other available data sources (optional)

The same pipeline can be used to copy to and from other data sources.

Copying data to/from Apache Kafka

Set the relevant sourceType or sinkType parameter to kafka, and set the sourceLocation or sinkLocation parameter to <host:port>/<topic name> (e.g. 111.128.2.22:8000/my-topic), with the host and port of a broker to bootstrap with.

The IP must be accessible on the same network as the dataflow pipeline. If running on Compute Engine, this is the internal IP address; if running elsewhere, you will need to configure Cloud Interconnect to expose your broker IP addresses to virtual machines running within Google Cloud. If the VPC is not the default network, you will need to configure the network parameter when running the template.

Copying data to BigQuery

The pipeline can also copy data to (but not from) a BigQuery table. To do this, set the sinkType parameter to bigquery, and the sinkLocation parameter to your table identifier in bq command line tool format. To use a BigQuery table as a sink, it must have the following format:

CREATE TABLE (
    message_key BYTES,
    event_timestamp TIMESTAMP,
    attributes ARRAY>>,
    payload BYTES,
)

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this page, follow these steps.

  1. In the Cloud Console, go to the Lite Topics page.

    Go to the Lite Topics page

  2. Click your-lite-topic.

  3. In the Lite topic details page, click Delete.

  4. In the field that appears, enter delete to confirm that you want to delete the Lite topic.

  5. Click Delete.

  6. Repeat these steps for your Pub/Sub topic.

    Go to the Topics page

What's next