New BigQuery capabilities for data and AI governance
Lu Yang
Product Lead, BigQuery Data Governance
Chai Pydimukkala
Product Lead, BigQuery Data Governance, Security & Data Sharing
Across industries and disciplines, generative AI is transforming the way we work, from sparking new forms of creativity and revolutionizing customer experiences to unlocking hidden insights within complex data. At the same time, this revolution hinges on high-quality, well-governed, and accessible data.
Data may be the foundation of training and grounding AI models, but for decades, governance of that data has been an afterthought in the enterprise. With the rise of AI, however, it is now front and center of enterprises’ data strategies, even as they struggle to discover, govern, and understand their distributed data assets. In fact, 66% of organizations report that at least half their data remains unused or undiscovered, while only 44% of data leaders fully trust the quality of their organization’s data. As a result, poorly managed data leads to flawed AI and unreliable insights, hindering effective decision-making.
These are precisely the challenges that Dataplex is designed to address. Dataplex is the unified governance foundation for the entire BigQuery platform, providing automated data discovery, curation and management at scale. More importantly, Dataplex minimizes tedious, error-prone and manual governance processes, instead making them pervasive, contextual and always-on. By deeply integrating with Google Cloud services, Dataplex creates a unified inventory of metadata across projects, regions and storage systems. This comprehensive view empowers users to perform global search over distributed data, enrich and organize that data, manage governance policies effectively, and maintain strong security, all while fostering data democratization. Moreover, Dataplex offers a variety of intelligent data management capabilities, including lineage tracking, data profiling and automated quality checks to help users build trust in their data and maximize data-related ROI. As a result, Dataplex has been widely adopted since its launch in 2022, with over 95% of top Google Cloud data analytics customers using it for their data management and governance needs.
Cloud content management provider Box.Inc uses Dataplex as its go-to tool for enhanced data governance, discovery and observability.
“Leveraging Dataplex, we embarked on a transformative journey to enhance our Data Platform by enhancing developer efficiency while tightening security policies across all regions. Dataplex serves as our central data catalog, providing data discovery, lineage tracking, and governance capabilities.” - Yeshvant Kumar Bhavnasi Venkat Satya, Senior Software Engineer and Asmita Kulkarni, Senior Product Manager, Box.Inc.
This year, we've supercharged Dataplex with powerful new features to help you navigate the complexities of data in the era of generative AI. Read on to learn more about Dataplex’s newest features and how they position you to take the most advantage of generative AI with full confidence in the quality of your data assets.
1. Automated cataloging: Discover your data and AI assets in a unified way
Dataplex automatically harvests, ingests, and indexes metadata from across your data estate. In addition to data assets in BigQuery, Pub/Sub and Cloud Storage, we’ve extended Dataplex’s automated cataloging capability to the following sources recently:
-
Vertex AI: Models, datasets, and features from Vertex AI are now cataloged in Dataplex in near real-time, providing a coherent view of your data and AI assets.
-
Operational databases: Cloud SQL, Spanner, and Bigtable assets are now automatically cataloged, providing end-to-end visibility of your data landscape that spans the entire lifecycle.
-
Looker: A preview of managed cataloging for Looker assets is coming soon, allowing you to discover and manage your BI assets alongside data and AI resources.
With this comprehensive inventory in place, you can easily search, organize, and enrich your data and AI assets, establishing the critical metadata foundation for effective data-to-AI governance.
2. Enhanced lineage tracking: Understand your data's end-to-end journey
Dataplex automatically captures the complete lineage of your data, allowing you to trace its origins, transformations, and destinations across your entire data landscape. This comprehensive view is now even more powerful with the following latest enhancements:
-
Lineage for Vertex AI Pipelines: In addition to native integration with BigQuery, Dataproc and Composer, Dataplex is now integrated with Vertex AI Pipelines. This enables traceability of data from processing and analytics through to AI model training and deployment — essential for responsible AI governance and regulatory compliance.
-
Column-level lineage for BigQuery: You can now dive deeper into your data with field-level lineage tracking in BigQuery. This granular view enables precise impact and root-cause analysis, facilitates the management of sensitive data, and helps ensure compliance with data privacy regulations.
3. Intelligent search: Find what you need, faster
Finding the right data quickly is essential for any data-driven organization. Dataplex has been providing global, governed catalog search capabilities, and now we're taking data discovery to the next level:
-
Semantic search: Ask questions in natural language and Dataplex will understand your intent to retrieve the most relevant results, with the upcoming semantic search capability. This makes it much easier for everyone in your organization to find the data they need, regardless of their role or technical expertise.
-
Full catalog search in BigQuery: We will also launch full catalog search in BigQuery soon, enabling users to search the entire catalog and discover data and AI resources directly within the familiar BigQuery interface.
4. AI-powered data insights: Jumpstart your analysis
Once relevant data is discovered, Dataplex can help you overcome the "cold start" problem with Data Insights. This feature automatically generates suggested questions and validated SQL queries for your data, jumpstarting your analysis and accelerating your time to insight. This capability helps users of all skill levels quickly uncover insights without writing a line of code, and is an efficiency multiplier for expert users to customize generated queries for deeper analysis.
5. Governance rules: Enforce metadata-driven policies at scale
Unified metadata is the foundation of Dataplex. In addition to leveraging metadata for search and discovery, we are launching Dataplex governance rules in preview, allowing you to define and enforce governance policies based on metadata. You can use Dataplex's search capabilities to pinpoint the data assets or specific fields that need to be governed, and easily create governance rules based on your specific requirements and policies. Dataplex then automatically applies and enforces these rules across your distributed data environment, with built-in monitoring to ensure compliance.
This centralized approach simplifies governance management, reduces security risks, and provides a unified control plane for all your data. Our initial private preview focuses on fine-grained access control, allowing you to efficiently manage access policies across BigQuery and Cloud Storage at scale.
With these new innovations, Dataplex empowers you to navigate the complexities of the data landscape and unlock the full potential of your data in the age of generative AI. Discover, govern, understand, and activate your data to drive innovation and transform your organization. Learn more about Dataplex and begin your data-driven journey today.