Dataplex Universal Catalog overview

Dataplex Universal Catalog is a unified, intelligent governance solution for data and AI assets in Google Cloud. Through Dataplex Universal Catalog, you can use AI to simplify data queries, quality assurance, and business insights.

Dataplex Universal Catalog performs governance at scale. For example, consider a global retail company that generates large amounts of sales, inventory, and customer data that's stored in Cloud Storage, Spanner, and Pub/Sub. With data distributed across systems, it can be complex and time-consuming to manage governance, ensure quality, and maintain compliance. Dataplex Universal Catalog simplifies this process by providing a central view to discover, profile, validate, track the lineage of, and control access to organizational data assets.

Why use Dataplex Universal Catalog?

Dataplex Universal Catalog governs data through the following features:

Metadata cataloging. Retrieve metadata for Google Cloud resources (in BigQuery, Cloud SQL, Spanner, Vertex AI, Pub/Sub, Dataform, Dataproc Metastore), and third-party resources you bring into Dataplex Universal Catalog, for a snapshot of your data assets.
Data discovery. Scan for structured and unstructured data in Cloud Storage buckets to extract and catalog their metadata.
Data insights. Use AI to generate natural language questions about your data, to uncover patterns, assess data quality, and perform statistical analyses.
Data profiling. Identify common characteristics of the column data in your BigQuery tables, for example, typical data values, data distribution, and null counts, which can inform data classification and quality assurance.
Data quality. Define and measure the quality of the data in your BigQuery tables, by validating data against organizational policies and logging alerts if data doesn't meet quality criteria.
Business glossary. Manage business-related terminology and definitions across your organization, and attach terms to table columns to promote a consistent understanding of data usage.
Data lineage. Track how data moves through your systems: where it comes from, where it is passed to, and what transformations are applied to it.

Dataplex Universal Catalog supports an end-to-end data lifecycle, from distributed discovery to business insights. Governance features are also available through BigQuery.

Use cases

You can use Dataplex Universal Catalog to do the following:

Discover and understand your data. Dataplex Universal Catalog provides visibility over your data resources across the organization. It lets you find relevant resources for data consumption needs. It provides context for data resources, which helps you understand the suitability of data resources for your data consumer's needs.
Enable data governance and data management. Dataplex Universal Catalog supplies metadata that can inform and power your data governance and data management capabilities.
Maintain an extensible and comprehensive repository for your metadata. Dataplex Universal Catalog stores and provides access to metadata that is automatically harvested from your Google Cloud resources. You can integrate your own metadata from non-Google Cloud systems. You can enrich all metadata with additional business and technical metadata annotations.

Get started

If this is your first time working with Dataplex Universal Catalog, consider following a quickstart:

Track data lineage for a BigQuery table

What's next

Learn about metadata management in Dataplex Universal Catalog.
Learn how to search for data assets.
Learn how to manage entries and ingest custom sources.
Learn how to import metadata into Dataplex Universal Catalog.
Learn about BigQuery governance.