BigQuery's AI-assisted data preparation is now in preview
Tim Bezold
Product Manager
Amit Virmani
Engineering Manager
In today's data-driven world, the ability to efficiently transform raw data into actionable insights is paramount. However, data preparation and cleaning is often a significant challenge. According to Gartner®1 “Gartner clients now report that 90% or more of their time is spent preparing data (as high as 94% in complex industries) for advanced analytics, data science and data engineering.
Reducing this time and efficiently transforming raw data into insights is crucial for staying competitive. Earlier this month, we introduced BigQuery data preparation, an AI-first solution that streamlines and simplifies the data preparation process as part of Gemini in BigQuery.
Now in preview, BigQuery data preparation provides a number of capabilities:
- AI-powered suggestions: BigQuery data preparation uses Gemini in BigQuery to analyze your data and schema and provide intelligent suggestions for cleaning, transforming, and enriching the data. This significantly reduces the time and effort required for manual data preparation tasks.
- Data cleansing and standardization: Easily identify and rectify inconsistencies, missing values, and formatting errors in your data.
- Visual data pipelines: The intuitive, low-code visual interface helps both technical and non-technical users easily design complex data pipelines, and leverage BigQuery's rich and extensible SQL capabilities.
- Data pipeline orchestration: Automate the execution and monitoring of your data pipelines. The SQL generated by BigQuery data preparation can become part of a Dataform data engineering pipeline that you can deploy and orchestrate with CI/CD, for a shared development experience.
BigQuery data preparation helps you ensure the accuracy and reliability of your data, leading to more informed business decisions. BigQuery data preparation automates data quality checks and integrates with other Google Cloud services such as Dataform and Cloud Storage, providing a unified and scalable environment for your data needs.
How does it work?
Getting started is easy. When you sample a BigQuery table in BigQuery data preparation, it uses state-of-the-art foundation models to evaluate the data and schema using Gemini in BigQuery to generate data preparation recommendations like filter and transformation suggestions. For example, it knows how to identify valid date formats by country and which columns can act as join keys, accelerating the data engineering process.
In the above example (using synthetic data), the Birthdate column contains two different date formats and is of type STRING. BigQuery data preparation suggests to “Convert column Birthdate from type string to date with the following format(s): '%Y-%m-%d','%m/%d/%Y”. After you apply the suggestion card, you can verify the transformed preview data in a DATE format column.
With BigQuery’s AI-assisted data preparation, you can:
- Significantly reduce time spent discovering data quality issues and cleaning data by leveraging Gemini-assisted suggestion cards
- Customize your own suggestion cards by providing an example in the data grid
- Increase operational efficiency by deploying data preparation with incremental data processing
What BigQuery customers are saying
Customers are already solving numerous challenges with BigQuery data preparation.
GAF is a major manufacturer of roofing materials in North America, and is adopting data preparation for creating data transformation pipelines on BigQuery.
“GAF is looking to modernize the ETL infrastructure and adopt a BigQuery native, low-code solution. BigQuery data preparation will help our skilled business users and the analytics team in the data preparation processes for the enablement of self-service analytics.” - Puja Panchagnula, Management Director - Enterprise Data Management & Analytics, GAF
mCloud Technologies helps businesses in sectors like energy, buildings, and manufacturing to optimize the performance, reliability, and sustainability of their assets.
“We receive data feeds from our partners. BigQuery data preparation allows our product managers to prepare and operate the file data feeds with little to no help from our data engineering team.” - Jim Christian, Chief Product and Technology Officer, mCloud Technologies
Public Value Technologies is a joint venture between two German public broadcasting organizations (ARD).
“Public Value Technologies receives data feeds from our media partners for our data mesh solution and AI applications. BigQuery data preparation allows our data analysts and scientists to rapidly integrate the data feeds that standardize and preprocess the data in a low code way.” - Korbinian Schwinger, Team Lead Data Engineer, Public Value Technologies
Getting started
With its powerful AI capabilities, intuitive interface, and tight integration with the Google Cloud ecosystem, BigQuery data preparation is set to revolutionize the way organizations manage and prepare their data. By automating tedious tasks, improving data quality, and empowering users, this innovative solution reduces the time you spend preparing data and improves your productivity.
To get started with BigQuery data preparation, explore the following resources:
1. Gartner, State of Metadata Management: Aggressively Pursue Metadata to Enable AI and Generative AI, By Mark Beyer, Guido De Simoni, 4. September 2024. GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.