Google Cloud Big Data and Machine Learning Blog

Innovation in data processing and machine learning technology

Scheduling and sampling arrive for Google Cloud Dataprep

Monday, November 6, 2017

By Eric Anderson, Product Manager

Google Cloud Dataprep, which has been available to the public in a beta release for just a month, had its first public update on Thursday. Included is a fresh UI, job scheduling, and richer sampling options. Let’s take a look at each of them.

Flow scheduling

Throughout our early releases, users’ most common request has been Flow scheduling. As of Thursday’s release, Flows can be scheduled with minute granularity at any frequency. When a Flow schedule executes, any designated Datasets are published. Your scheduled publishing destination can even be different from that used for manual execution (development). can even specify different publishing destinations for.

A fresh user interface

Cloud Dataprep is easy to explain because people understand its value almost immediately upon seeing it. In part, that’s because the pain of data preparation is almost universally known, but also because the visual experience of Dataprep is intuitive. That said, there is a world of functionality and expressiveness within Dataprep that may not have been immediately apparent, until today.

One of the many “helpers” to get you started with Cloud Dataprep

With this release, new users of Dataprep are greeted with a preloaded sample dataset, a step-by-step in-product walkthrough, and videos to guide the way. If you haven’t tried Dataprep yet, now’s a good time. If you have tried Dataprep, you’ll notice a reorganized and updated visual interface, as shown here:

 

The Step Builder is now vertically oriented, providing a natural top-down progression and greater information density.

Step suggestions are also vertically oriented, with previews generated as users hover over them. This allows users to see more suggestions at a glance and multiple previews at a time.

Powerful sampling

Finally, power users shared that they wanted more expressive sampling options. Consider a dataset with lots of mistakes. Not all of those errors are likely to be included in a simple top-of-file sample. As such, they may go untreated and end up in your published datasets.  For the example described, you might use the new stratified sampling technique to ensure all the permutations of a column are included in the sample.

New sampling techniques included in the latest release.

To experience all of the above for yourself, head over to Cloud Dataprep right now and get started. Meanwhile, we are already hard at work on the next release and can’t wait to share what’s next.

  • Big Data Solutions

  • Product deep dives, technical comparisons, how-to's and tips and tricks for using the latest data processing and machine learning technologies.

  • Learn More

12 Months FREE TRIAL

Try BigQuery, Machine Learning and other cloud products and get $300 free credit to spend over 12 months.

TRY IT FREE