An annual roundup of Google Data Analytics innovations
Sudhir Hasbe
Sr. Director of Product Management, Google Cloud
October 23rd (this past Sunday) was my 5th Googleversery and we just wrapped up an incredible Google Next 2022! It was great to see so many customers and my colleagues in person this year in New York City. This blog is an attempt to share progress we have made since last year (4th year anniversary blog post 2021 Next).
Bringing BigQuery to the heart of your Data Cloud
Since last year we have made significant progress across the whole portfolio. I want to start with BigQuery, which is at the heart of our customers' Data Cloud. We have enhanced BigQuery with key launches like multi-statement transactions, Search and operational log analytics, native JSON support, slot recommender, interactive SQL translation from various dialects like Teradata, Hive, Spark, materialized views enhancements and table snapshots. Additionally we have launched various enhancements to SQL language, accelerate customer cloud migration with BigQuery migration services and introduced scalable data transformation pipelines in BigQuery using SQL with the Dataform preview.
One of the most significant enhancements to BigQuery is support for unstructured data in BigQuery through object tables. Object tables enable you to take advantage of common security and governance across your data. You can now build data products that unify structured and unstructured data in BigQuery.
To support data openness, at Next ’22 we announced the general availability of BigLake, to help you break down data silos by unifying lakes and warehouses. BigLake innovations add support for Apache Iceberg, which is becoming the standard for open source table format for data lakes. And soon, we’ll add support for formats including Delta Lake and Hudi.
To help customers bring analytics to their data irrespective of where it resides, we launched BigQuery Omni. Now we are adding new capabilities such as cross-cloud transfer and cross-cloud larger query results that will make it easier to combine and analyze data across cloud environments. We also launched on-demand pricing support which enables you to get started at a low cost for BigQuery Omni.
To help customers break down data boundaries across organizations, we launched Analytics Hub. Analytics Hub is a data exchange platform that enables organizations to create private or public exchanges with their business partners. We have added Google data, which includes highly valuable datasets like Google Trends. With hundreds of partners sharing valuable commercial datasets, Analytics Hub helps customers reach data beyond their organizational walls. We also partnered with the Google Earth Engine team to use BigQuery to get access to and value from the troves of satellite imagery data available within Earth Engine.
We’ve also invested to bring BigQuery together with operational databases to help customers build intelligent, data-driven applications. Innovations include federated queries for Spanner, Cloud SQL and Bigtable, allowing customers to analyze data residing in operational databases in real-time with BigQuery. At Next ’22, we announced Datastream for BigQuery which provides easy replication of data from operational database sources such as AlloyDB, PostgreSQL, MySQL, and Oracle, directly into BigQuery with a few simple clicks.
From Data to AI, with built-in intelligence for BigQuery and Vertex AI
We launched BigQuery Machine Learning in 2018 to make machine learning accessible to data analysts and data scientists across the globe. Now, customers create millions of models and tens of millions of predictions every month using BigQuery ML. Vertex AI enables ML Ops from data model to deployment in production and running predictions in real-time. Over the past year we have tightly integrated BigQuery and Vertex AI to simplify the ML experience.
Now you can create models in BigQuery using BigQuery ML which are instantly visible in Vertex AI model registry. You can then directly deploy these models to Vertex AI endpoints for real-time serving, use VertexAI pipelines to monitor and train models and view detailed explanations for your predictions through BigQuery ML and Vertex AI integration.
Additionally, we announced an integration between Colab and BigQuery which allows users to explore results quickly with a data science notebook on Colab. “Colab” was developed by Google Research to allow users to execute arbitrary Python code and became a favorite tool for data scientists and machine learning researchers. The BigQuery integration enables seamless workflows for data scientists to run descriptive statistics, generate visualizations, create a predictive analysis, or share your results with others.
Learn more about innovations to bring data and AI closer together, check out my session at Next with June Yang, VP of Cloud AI and Industry Solutions.
Delivering the best of open source
We have always believed in making Google Cloud the best platform to run Open Source Software. Cloud Dataproc enables you to run various OSS engines like Spark, Flink, Hive. We have made a lot of enhancements over the past year in Dataproc. One of the most significant enhancements was to create a Serverless Spark offering that enables you to get away from clusters and focus on just running Spark Jobs. At Cloud Next 2022, we added built-in support for Apache Spark in BigQuery will allow data practitioners to create BigQuery stored procedures unifying their work in Spark with their SQL pipelines. This also provides integrated BigQuery billing with access to a curated library of highly valuable, internal and external assets.
Powering streaming analytics
Streaming analytics is a key area of differentiation for Google Cloud with products like Cloud Dataflow and Cloud Pub/Sub. This year, our goal was to push the boundaries of innovation in real-time processing through Dataflow Prime and make it seamless to get real-time data coming to Pub/Sub to land into BigQuery for advanced analytics. At the beginning of the year, we introduced over 25 new Dataflow Templates as Generally Available. At July’s Data Engineer Spotlight, we made Dataflow Prime, Dataflow ML, and Dataflow Go Generally Available. We also introduced a number of new Observability features for Dataflow to give you more visibility and control over your Dataflow pipelines.
Earlier this year we introduced a new type of Pub/Sub subscription called a “BigQuery subscription” that writes directly from Cloud Pub/Sub to BigQuery. With this integration, customers no longer need to pay for data ingestion into BigQuery - you only pay for the Pub/Sub you use.
Unified business intelligence
In Feb 2020 we closed the Looker acquisition and since then we have been busy at work in building Looker capabilities and integrating it into Google Cloud. Additionally, Data Studio has been our self service BI offering for many years. It has the strongest tie-in with BigQuery and many of our BigQuery customers use Data Studio. Announced at Next’22, we are bringing all BI assets under the single umbrella of Looker. Data Studio will become Looker Studio and include a paid version that will provide enterprise support.
With tight integration between Looker and Google Workspace productivity tools, customers gain easy access via spreadsheets and other documents, to consistent, trusted answers from curated data sources across your organization. Looker integration with Google Sheets is in preview now and increased accessibility of BigQuery to Connected Sheets allows more people to analyze large amounts of data. You can read more details here.
Intelligent data management and governance
Lastly, a challenge that is top of mind for all data teams is data management and governance across distributed data systems. Our data cloud provides customers with an end-to-end data management and governance layer, with built-in intelligence to help enable trust in data and accelerate time to insights. Earlier this year we launched Dataplex as our Data Management and Governance service. Dataplex helps organizations centrally manage and govern distributed data. Furthermore, we unified Data Catalog with Dataplex to provide a streamlined experience for customers to centrally discover their data with business context and govern and manage that data with built-in data intelligence.
At Next we introduced data lineage capabilities with Dataplex to gain end-to-end lineage from ingestion of data to analysis to ML models. Advancements for automatic data quality in Dataplex ensure confidence in your data which is critical to get accurate predictions. Based on customer input we’ve also added enhanced data discovery for automatic cataloging to databases and Looker from a business glossary and added a Spark-powered data exploration workbench. And Dataplex is now fully integrated with BigLake so you can now manage fine grained access control at scale.
An open data ecosystem
Over the past 5 years, the Data Analytics team goal has been to make Google Cloud the best place to run analytics. One of the key tenets of this was to ensure we have the most vibrant partner ecosystem. We have a rich ecosystem of hundreds of tech partner integrations and have 40+ partners who have been certified through the Cloud Ready-BigQuery initiative.
Additionally, more than 800 technology partners are building their applications on top of our Data Cloud. Data Sharing continues to be one of the top capabilities leveraged by these partners to easily share information at any scale with their enterprise customers.
We also announced new updates and integrations with Collibra, Elastic, MongoDB, Palantir, ServiceNow, Sisu Data, Reltio, Striim and Qlik to help customers move data between platforms of your choice and bring more Google’s Data Cloud capabilities to partner platforms.
Finally, we established a Data Cloud Alliance together with 17 of our key partners who provide the most widely-adopted and fastest-growing enterprise data platforms today across analytics, storage, databases and business intelligence. Our mission is to collaborate to solve modern data challenges providing an acceleration path to value. The first key areas where we are focusing are related to : data interoperability, data governance and solving for skills gap through education.
Customer momentum across a variety of industries and use cases
We’re super excited for organizations to share their Data Cloud best practices at Next, including Walmart, Boeing, Twitter, Televisa Univision, L’Oreal, CNA Insurance, Wayfair, MLB, British Telecom, Telus, Mercado Libre, LiveRamp, and Home Depot. Check out all the Data Analytics sessions and resources from Next and get started on your Data Cloud journey today. We look forward to hearing your story at a future Google Cloud event.