Search for data assets with Data Catalog

This document explains how you can use Data Catalog to perform a search of data assets, such as:

  • BigQuery datasets, tables, views, and models
  • Pub/Sub data streams
  • Dataproc Metastore services, databases, and tables
  • Data Catalog tag templates, entry groups, and custom entries
  • Assets in enterprise data silos connected to Data Catalog

Search scope

Search results may be different for users with different permissions. Data Catalog search results are scoped according to the user's IAM role and permissions.

For example, if a user has BigQuery metadata read access to an object, that object will appear in their Data Catalog search results. To search for a table you need bigquery.tables.get permission for that table. To search for a dataset, you need bigquery.tables.get permission for that dataset. The BigQuery Metadata Viewer role (roles/bigquery.metadataViewer) includes the minimum required metadata read permissions for a dataset, table, or view to appear in search results.

The same access logic applies to all currently supported systems such as Pub/Sub and Data Catalog itself.

Date-sharded tables

Data Catalog aggregates date-sharded tables into a single logical entry. This entry has the same schema as the table shard with the most recent date, and contains aggregate information about the total number of shards. The entry derives its access level from the dataset it belongs to. Data Catalog search only shows these logical entries if the user has the access to the dataset that contains them. Individual date-sharded tables will not be visible in Data Catalog search, even if they are present in Data Catalog and can be tagged.

How to Search for data assets

Console

Console

  1. To perform a search for data assets, open the Data Catalog home page in the Google Cloud Console and enter a search query.
  2. When you click SEARCH or make a selection from the Explore data assets and Search tips panels on the Data Catalog home page, the Search page opens. If you made a selection from the panels on the home page, it will be carried over to the search box expression in order to qualify your search.
  3. You can also filter your search results by making selections from the Filters panel on the left.

Filters

Filters let you narrow down search results. All filters are grouped in sections:

  • Systems such as BigQuery, Pub/Sub, Dataproc Metastore, custom systems, and the Data Catalog itself.

  • Data types such as data streams, datasets, filesets, models, tables, views, services, databases, and custom types.

  • Projects lists all projects available to you.

  • Tags lists all tag templates available to you.

  • Datasets come from BigQuery.

The Tags section shows tag templates to filter by. A selected template filters for data assets with tags that use the chosen template. If no entries have such tags, all search results will be excluded even though the original search query may match some entries.

All sets of filters except Tags are refreshed depending on the search query change. Filters are populated using a sample of current search results. Therefore, the whole set of search results may include entries that match the current query but the filters that correspond to those entries may not be shown on the Filters panel.

You can manually add the following filters:

  • In Projects, a project filter by clicking the ADD PROJECT button, searching for a specific project, selecting it, and clicking OPEN.
  • In Tags, a tag template filter by clicking the Add more tags drop-down, searching for a specific template, selecting it, and clicking OK.

In addition, you can:

  • Check Include public datasets to search for data assets publicly available in Google Cloud in addition to the assets available to you.
  • Switch back to the old search experience by clicking the corresponding button in the top right corner. The old experience provides simpler filtering.

Search example

For example, let's search for the "trips" table that you set up in Quickstart for tagging tables:

  1. Enter "trips" in the search box and click SEARCH.
  2. Select BigQuery from the Systems section to exclude data assets with the same name that belong to other systems.
  3. Select your project ID from the Projects section to exclude data assets from other projects. If your project is not shown in the section, click ADD PROJECT and select it in the dialog window.
  4. Select the Demo Tag Template from the Tags section to see if a tag that uses this template is attached to the "trips" table. If this template is not shown in the section, click the Add more tags drop-down, find and select it, and click OK.

With all the selected filters, the search results should contain only one entry—the BigQuery "trips" table in your project with an attached tag that uses the "Demo Tag Template".

Additionally, you can do the following:

  1. Filter your search by adding a keyword:value to your search terms in the search box:

    KeywordDescription
    name: Match data asset name
    column: Match column name or nested column name
    description: Match table description

  2. Perform a tag search by adding one of the following tag keyword prefixes to your search terms in the search box:

    TagDescription
    tag:project-name.tag_template_name Match tag name
    tag:project-name.tag_template_name.key Match a tag key
    tag:project-name.tag_template_name.key:value Match tag key:string value pair

Java

Before trying this sample, follow the Java setup instructions in the Data Catalog quickstart using client libraries. For more information, see the Data Catalog Java API reference documentation.

import com.google.cloud.datacatalog.v1.DataCatalogClient;
import com.google.cloud.datacatalog.v1.DataCatalogClient.SearchCatalogPagedResponse;
import com.google.cloud.datacatalog.v1.SearchCatalogRequest;
import com.google.cloud.datacatalog.v1.SearchCatalogRequest.Scope;
import com.google.cloud.datacatalog.v1.SearchCatalogResult;
import java.io.IOException;

// Sample to search catalog
public class SearchAssets {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "my-project-id";
    String query = "type=dataset";
    searchCatalog(projectId, query);
  }

  public static void searchCatalog(String projectId, String query) throws IOException {
    // Create a scope object setting search boundaries to the given organization.
    // Scope scope = Scope.newBuilder().addIncludeOrgIds(orgId).build();

    // Alternatively, search using project scopes.
    Scope scope = Scope.newBuilder().addIncludeProjectIds(projectId).build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DataCatalogClient dataCatalogClient = DataCatalogClient.create()) {
      // Search the catalog.
      SearchCatalogRequest searchCatalogRequest =
          SearchCatalogRequest.newBuilder().setScope(scope).setQuery(query).build();
      SearchCatalogPagedResponse response = dataCatalogClient.searchCatalog(searchCatalogRequest);

      System.out.println("Search results:");
      for (SearchCatalogResult result : response.iterateAll()) {
        System.out.println(result);
      }
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Data Catalog quickstart using client libraries. For more information, see the Data Catalog Node.js API reference documentation.

// Import the Google Cloud client library.
const {DataCatalogClient} = require('@google-cloud/datacatalog').v1;
const datacatalog = new DataCatalogClient();

async function searchAssets() {
  // Search data assets.

  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  // const projectId = 'my_project'; // Google Cloud Platform project

  // Set custom query.
  const query = 'type=dataset';

  // Create request.
  const scope = {
    includeProjectIds: [projectId],
    // Alternatively, search using Google Cloud Organization scopes.
    // includeOrgIds: [organizationId],
  };

  const request = {
    scope: scope,
    query: query,
  };

  const [result] = await datacatalog.searchCatalog(request);

  console.log(`Found ${result.length} datasets in project ${projectId}.`);
  console.log('Datasets:');
  result.forEach(dataset => {
    console.log(dataset.relativeResourceName);
  });
}
searchAssets();

Python

Before trying this sample, follow the Python setup instructions in the Data Catalog quickstart using client libraries. For more information, see the Data Catalog Python API reference documentation.

from google.cloud import datacatalog_v1

datacatalog = datacatalog_v1.DataCatalogClient()

# TODO: Set these values before running the sample.
project_id = "project_id"

# Set custom query.
search_string = "type=dataset"
scope = datacatalog_v1.types.SearchCatalogRequest.Scope()
scope.include_project_ids.append(project_id)

# Alternatively, search using organization scopes.
# scope.include_org_ids.append("my_organization_id")

search_results = datacatalog.search_catalog(scope=scope, query=search_string)

print("Results in project:")
for result in search_results:
    print(result)

REST & CMD LINE

REST & CMD LINE

If you do not have access to Cloud Client libraries for your language or want to test the API using REST requests, see the following examples and refer to the Data Catalog REST API documentation.

1. Search catalog.

Before using any of the request data, make the following replacements:

HTTP method and URL:

POST https://datacatalog.googleapis.com/v1/catalog:search

Request JSON body:

{
  "query":"trips",
  "scope":{
    "includeOrgIds":[
      "organization-id"
    ]
  }
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "results":[
    {
      "searchResultType":"ENTRY",
      "searchResultSubtype":"entry.table",
"relativeResourceName":"projects/project-id/locations/US/entryGroups/@bigquery/entries/entry1-id",
      "linkedResource":"//bigquery.googleapis.com/projects/project-id/datasets/demo_dataset/tables/taxi_trips"
    },
    {
      "searchResultType":"ENTRY",
      "searchResultSubtype":"entry.table",
      "relativeResourceName":"projects/project-id/locations/US/entryGroups/@bigquery/entries/entry2-id",
      "linkedResource":"//bigquery.googleapis.com/projects/project-id/datasets/demo_dataset/tables/tlc_yellow_trips_2018"
    }
  ]
}