Introducing dynamic topic destinations in Pub/Sub using Dataflow
Mike Nadal
Analytics Specialist, Cloud Customer Engineer
Sachin Agarwal
Group Product Manager, Google Cloud
Pub/Sub plays a key role in powering data streaming insights across a number of verticals. With the increasing adoption of streaming analytics, we’re seeing companies with more complex and nuanced use cases. One particular use case for using Pub/Sub is being able to ingest data from multiple various teams and data sources, sort that data into multiple topics based on particular message attributes, and do so while the data is in flight. Today, we’re introducing a new feature in Dataflow called dynamic destinations for Pub/Sub topics to help with this.
Today, when you want to publish messages to a single topic, you create an instance of the Beam to Pub/Sub sink. And when publishing to an additional topic, you repeat this pattern, or you can have multiple publishers publishing to a single topic. However, when you want to start publishing to tens or even hundreds of topics, needing to use that many publisher clients becomes unwieldy and expensive. Dynamic destinations allows you to use a single publisher client to dictate which messages go to which topics.
There are two methods for using dynamic destinations: provide a function to extract the destination from an attribute in the Pub/Sub message, or put the topic in a PubsubMessage class and write to it using the writeMessagesDynamic method. Please note that the Pub/Sub topic needs to already exist before using dynamic destinations. Below are two examples of assigning a message to a topic based on the country in the message:
Function Extract:
Example: Secured Topics
With dynamic destinations, you can also start to enforce data security further up the pipeline. If you write all your data to a single topic you can always filter the data when writing out to your final data sink, but that opens up the possibility for people to have access to data they aren’t supposed to. When you’re able to delineate data based on message attributes you can keep data for each team restricted to their dedicated topic and then control end-user access to that topic. As an example, a gaming company can ingest data from Marketing, Player Analytics, and IT teams into a single pipeline. Each team would tag their data with the appropriate attribute value, which in turn assigns the data to the appropriate topic, and can view the information that’s relevant to just their team. Security concerns and back charge capabilities become much easier to tackle with this new feature.
To get started with dynamic destinations, ensure that you’ve upgraded to Apache Beam 2.50 release or later and check out the documentation for dynamic destinations in the PubsubIO here.