Jump to Content
Data Analytics

Google Cloud adds smart analytics frameworks for AI Platform Notebooks

May 28, 2020
Christopher Crosbie

Product Manager, Data Analytics

Mehran Nazir

Dataflow PM, Google Cloud

Google Cloud is announcing the beta release of smart analytics frameworks for AI Platform Notebooks. Smart Analytics Frameworks  brings closer the model training and deployment offered by AI Platform with the ingestion, preprocessing, and exploration capabilities of our smart analytics platform. With smart analytics frameworks for AI Platform Notebooks, you can run petabyte-scale SQL queries with BigQuery, generate personalized Spark environments with Dataproc Hub, and develop interactive Apache Beam pipelines to launch on Dataflow, all from the same managed notebooks service that provides Google Cloud AI Platform.

These new frameworks can help bridge the gap between cloud tools and bring a secure way to explore all kinds of data. Whether you’re sharing visualizations, presenting an analysis, or interacting with live code in more than 40 programming languages, the Jupyter notebook is the prevailing user interface for working with data. As data volumes grow and businesses aim to get more out of that data, there has been a rapid uptake in the types of data pipelines, data source availability, and plugins offered by these notebooks. While this proliferation of functionality has enabled data users to discover deep insights into the toughest business questions, the increased data analysis capabilities have been coupled with increased toil: Data engineering and data science teams spend too much time with library installations, piecing together integrations between different systems, and configuring infrastructure. At the same time, IT operators struggle to create enterprise standards and enforce data protections in these notebook environments.

Our new smart analytics frameworks for AI Platform Notebooks powers Jupyter notebooks with our smart analytics suite of products, so data scientists and engineers can quickly tap into data without the integration burden that comes with unifying AI and data engineering systems. IT operators can also rest assured that notebook security is enforced through a single hub, whether the data workflow is pulling data from BigQuery, transforming data with Dataproc, or running an interactive Apache Beam pipeline. End-to-end support in AI Platform Notebooks allows the modern notebook interface to act as the trusted gateway to data in your organization. 

How to use the new frameworks

To get started with a smart analytics framework, go to the AI Platform Notebooks page in the Google Cloud Console. Select New Instance, then from the Data Analytics menu choose either Apache Beam or Dataproc Hub. The Apache Beam option will launch a VM that is pre-configured with an interactive environment for prototyping Apache Beam pipelines on Beam’s direct runner. The Dataproc Hub option will launch a VM running a customized JupyterHub instance that will spawn production-grade, isolated, autoscaling Apache Spark environments that can be pre-defined by administrators but personalized by each data user. All AI Notebooks Platform frameworks come pre-packaged with BigQuery libraries, making it easy to use BigQuery as your notebook’s data source. 

Apache Beam is an open source framework that unifies batch and streaming pipelines so that developers don’t need to manage two separate systems for their various data processing needs. The Apache Beam framework in AI Platforms Notebooks allows you to interactively develop your pipelines in Apache Beam, using a workflow that simplifies the path from prototyping to production. Developers can inspect their data transformations and perform analytics on intermediate data, then launch onto Dataflow, a fully managed data processing service that distributes your workload across a fleet of virtual machines with zero to little overhead. With the Apache Beam interactive framework, it is easier than ever for Python developers to get started with streaming analytics, and setting up your environment is a matter of just a few clicks. We’re excited to see what this innovative community will build once they start adopting Apache Beam in notebooks and launching Dataflow pipelines in production.

In the past, companies have hit roadblocks along the cloud journey because it has been difficult to transition from the monolithic architecture patterns that are ingrained into Hadoop/Spark. Dataproc Hub makes it simple to modernize the inefficient multi-tenant clusters that were running on prem. With this new approach to Spark notebooks, you can provide users with an environment that data scientists can fully control and personalize in accordance with the security standards and data access policies of their company. 

The smart analytics frameworks for AI Notebooks Platform is a publicly available beta that you can use now. There is no charge for using any of the notebooks. You pay only for the cloud resources you use within the instance: BigQuery, Cloud Storage, Dataproc, or Compute Engine. Learn more and get started today.

Posted in