Jump to Content
Data Analytics

Data movement for the masses with Dataflow Templates

April 25, 2022
https://storage.googleapis.com/gweb-cloudblog-publish/images/dataflow_templates.max-2600x2600.jpg
Mehran Nazir

Dataflow PM, Google Cloud

It’s 4 PM on a Friday afternoon, and your mind has already checked out for the weekend. Just as you are about to close your laptop, you see an e-mail come in from your engineering manager. You dread what lies ahead.

“Our data science team needs to analyze streaming data from our Kafka cluster. They need the data in BigQuery. Can you deliver this ask by Monday morning?

Sounds simple enough.

You might be tempted to write an ETL script that pulls data from the Kafka cluster every 30 minutes. But that suddenly becomes complicated when you have to introduce logic for retries. What if data written to your BigQuery table is not in the right format? And what if users are looking to filter a subset of the inbound data or convert certain fields into a different format?

Then you have to get into other user requirements quickly. What about non-functional requirements that are table stakes for any production data pipeline, such as monitoring & logging? Not to mention the operational challenges it would take to scale a homegrown ETL stack to the wider organization.

Not so simple a request anymore. Looks like your weekend is totally shot.

What if there was a cloud native way for this data movement use case?

Enter Dataflow Templates.

Dataflow Templates allow you to set your data in motion in just a handful of clicks. Dataflow Templates provides a user interface to select a source-sink combination from a dropdown menu, enter the values for required parameters, select optional settings, and deploy a pipeline. Once a pipeline is launched, it leverages the industry-leading, fully-managed Dataflow service, which includes horizontal & vertical autoscaling, dynamic work rebalancing, and limitless backends like Shuffle & Streaming Engine

Retry patterns? We’ve got code samples, not to mention support for snapshots, which protects you from data loss.

Need file format conversion? We’ve got a template for that.

Filter data using our built-in UDF support.

Monitoring & logging? Provided out of the box.

What about those pesky duplicates? We have that covered. 

No wonder studies have found that Dataflow boosts data engineering productivity by 55%.

Looks like your weekend might not be over after all.

The Dataflow team is excited to announce the general availability of 24 Google-Provided Dataflow templates, listed below:

If you are new to Dataflow, Dataflow Templates is absolutely the right place to begin your Dataflow journey.

If you have been using Dataflow for some time, you might note that Dataflow Templates have been around for as long as you can remember. It’s true that we introduced Dataflow Templates in 2017, and since then, thousands of customers have come to rely on Dataflow Templates to automate many of their data movements between different data stores. What’s new is that we now have the structure and personnel in place to provide technical support for these open-source contributions. We have made the requisite investments with dedicated staffing, and now when you use these Dataflow Templates, you can feel confident that your production workloads will be supported no differently than any other workload you run on Google Cloud.

What’s Next

Dataflow Templates might serve your immediate data processing needs, but as any data engineer knows, requirements evolve and customizations are necessary. Thankfully, Dataflow is well-positioned to serve those use cases too.

Posted in