Integrate your data sources with Data Catalog

Data Catalog can ingest and keep up-to-date metadata from several Google Cloud data sources as well as a number of popular on-premises ones.

With metadata ingested, Data Catalog does the following:

  • Makes the existing metadata discoverable through search. For more information, see How to search.
  • Allows the members of your organization to enrich your data with additional business metadata through tags. For more information, see Tags and tag templates.

While the integration with Google Cloud sources is automatic, to integrate with custom on-premises sources that your organization uses, you can:

Before you begin

If you're already using Data Catalog, you must already have a project with the enabled Data Catalog API. For more information on the recommended way to use multiple projects with Data Catalog, see Using tag templates in multiple projects.

If this is the first time you interact with the Data Catalog, do the following:

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.

  4. Enable the Data Catalog API.

    Enable the API

Integrate Google Cloud data sources

BigQuery and Pub/Sub

If your organization already uses BigQuery and Pub/Sub, depending on your permissions, you can search for the metadata from those sources right away. If you can't see the corresponding entries in search results, look for the IAM roles that you and the users of your project might need in Identity and Access Management.

Dataproc Metastore (Preview)

To integrate with Dataproc Metastore, enable the sync to Data Catalog for new or existing services as described in Enabling Data Catalog sync.

Cloud Data Loss Prevention (Cloud DLP)

Additionally, Data Catalog integrates with Cloud Data Loss Prevention that allows you to scan specific Google Cloud resources for sensitive data and send results back to Data Catalog in the form of tags.

For more information, see Sending Cloud DLP scan results to Data Catalog.

Integrate on-premises data sources

To integrate on-premises data sources, you can use the corresponding Python connectors contributed by the community:

  1. Find your data source in the table below.
  2. Open its GitHub repository.
  3. Follow the setup instructions in the readme file.

Category Component Description Repository
RDBMS mysql-connector Sample code for MySQL data source. google-datacatalog-mysql-connector
postgresql-connector Sample code for PostgreSQL data source. google-datacatalog-postgresql-connector
sqlserver-connector Sample code for SQLServer data source. google-datacatalog-sqlserver-connector
redshift-connector Sample code for Redshift data source. google-datacatalog-redshift-connector
oracle-connector Sample code for Oracle data source. google-datacatalog-oracle-connector
teradata-connector Sample code for Teradata data source. google-datacatalog-teradata-connector
vertica-connector Sample code for Vertica data source. google-datacatalog-vertica-connector
greenplum-connector Sample code for Greenplum data source. google-datacatalog-greenplum-connector
rdbmscsv-connector Sample code for generic RDBMS CSV ingestion. google-datacatalog-rdbmscsv-connector
saphana-connector Sample code for Sap Hana data source. google-datacatalog-saphana-connector
BI looker-connector Sample code for Looker data source. google-datacatalog-looker-connector
qlik-connector Sample code for Qlik Sense data source. google-datacatalog-qlik-connector
tableau-connector Sample code for Tableau data source. google-datacatalog-tableau-connector
Hive hive-connector Sample code for Hive data source. google-datacatalog-hive-connector
apache-atlas-connector Sample code for Apache Atlas data source. google-datacatalog-apache-atlas-connector

Integrate unsupported data sources

If you can't find a connector for your data source, you can still manually integrate it by creating entry groups and custom entries. To do that, you can:

To integrate your sources, first, learn about Entries and entry groups, then follow the instructions in Create custom Data Catalog entries for your data sources.

What's next