Data Catalog can ingest and keep up-to-date metadata from several Google Cloud data sources as well as a number of popular on-premises ones.
With metadata ingested, Data Catalog does the following:
- Makes the existing metadata discoverable through search. For more information, see How to search.
- Allows the members of your organization to enrich your data with additional business metadata through tags. For more information, see Tags and tag templates.
While the integration with Google Cloud sources is automatic, to integrate with custom on-premises sources that your organization uses, you can:
- Set up and run corresponding connectors contributed by the community.
- Or leverage the Data Catalog API for custom entries.
Before you begin
If you're already using Data Catalog, you must already have a project with the enabled Data Catalog API. For more information on the recommended way to use multiple projects with Data Catalog, see Using tag templates in multiple projects.
If this is the first time you interact with the Data Catalog, do the following:
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.
- Enable the Data Catalog API.
Integrate Google Cloud data sources
BigQuery and Pub/Sub
If your organization already uses BigQuery and Pub/Sub, depending on your permissions, you can search for the metadata from those sources right away. If you can't see the corresponding entries in search results, look for the IAM roles that you and the users of your project might need in Identity and Access Management.
Dataproc Metastore (Preview)
To integrate with Dataproc Metastore, enable the sync to Data Catalog for new or existing services as described in Enabling Data Catalog sync.
Cloud Data Loss Prevention (Cloud DLP)
Additionally, Data Catalog integrates with Cloud Data Loss Prevention that allows you to scan specific Google Cloud resources for sensitive data and send results back to Data Catalog in the form of tags.
For more information, see Sending Cloud DLP scan results to Data Catalog.
Integrate on-premises data sources
To integrate on-premises data sources, you can use the corresponding Python connectors contributed by the community:
- Find your data source in the table below.
- Open its GitHub repository.
- Follow the setup instructions in the readme file.
|RDBMS||mysql-connector||Sample code for MySQL data source.||google-datacatalog-mysql-connector|
|postgresql-connector||Sample code for PostgreSQL data source.||google-datacatalog-postgresql-connector|
|sqlserver-connector||Sample code for SQLServer data source.||google-datacatalog-sqlserver-connector|
|redshift-connector||Sample code for Redshift data source.||google-datacatalog-redshift-connector|
|oracle-connector||Sample code for Oracle data source.||google-datacatalog-oracle-connector|
|teradata-connector||Sample code for Teradata data source.||google-datacatalog-teradata-connector|
|vertica-connector||Sample code for Vertica data source.||google-datacatalog-vertica-connector|
|greenplum-connector||Sample code for Greenplum data source.||google-datacatalog-greenplum-connector|
|rdbmscsv-connector||Sample code for generic RDBMS CSV ingestion.||google-datacatalog-rdbmscsv-connector|
|saphana-connector||Sample code for Sap Hana data source.||google-datacatalog-saphana-connector|
|BI||looker-connector||Sample code for Looker data source.||google-datacatalog-looker-connector|
|qlik-connector||Sample code for Qlik Sense data source.||google-datacatalog-qlik-connector|
|tableau-connector||Sample code for Tableau data source.||google-datacatalog-tableau-connector|
|Hive||hive-connector||Sample code for Hive data source.||google-datacatalog-hive-connector|
|apache-atlas-connector||Sample code for Apache Atlas data source.||google-datacatalog-apache-atlas-connector|
Integrate unsupported data sources
If you can't find a connector for your data source, you can still manually integrate it by creating entry groups and custom entries. To do that, you can:
- Use one of the Data Catalog Client Libraries in one of the following languages: C#, Go, Java, Node.js, PHP, Python, or Ruby.
- Or manually leverage the Data Catalog API.
To integrate your sources, first, learn about Entries and entry groups, then follow the instructions in Create custom Data Catalog entries for your data sources.
- Learn more about Identity and Access Management.
- Learn How to search.
- Go through the Tagging tables quickstart.