How to search with Data Catalog

This document explains how you can use Data Catalog to perform a search of data assets, such as datasets, tables, views, and Cloud Pub/Sub topics in your Google Cloud projects.

Search scope

Search results may be different for users with different permissions. Data Catalog search results are scoped according to the user's IAM role and permissions. If a user has BigQuery metadata read access to an object, that object will appear in their Data Catalog search results.

For example, to search for a table you need bigquery.tables.get permission for that table; to search for a dataset, you need bigquery.datasets.get permission that dataset. The BigQuery Metadata Viewer role (roles/bigquery.metadataViewer) includes the minimum required metadata read permissions for a dataset, table, or view to appear in search results.

Date-sharded tables

Data Catalog aggregates date-sharded tables into a single logical entry. This entry has the same schema as the table shard with the most recent date, and contains aggregate information about the total number of shards. The entry derives its access level from the dataset it belongs to. Data Catalog search only shows these logical entries if the user has the access to the dataset that contains them. Individual date-shared tables will not be visible in Data Catalog search, even if they are present in Data Catalog and can be tagged.

How to Search for data assets

Console

  1. You can perform a search for data assets from the Data Catalog home page in the Google Cloud Console. You can also makes selections from the Data Assets and Search Tips panels to filter your search.

  2. When you click Search or make a selection from the Data Assets and Search Tips panels on the Data Catalog home page, the Search page opens. If you made a Data Asset or Search Tip selection on the home page, it will be carried over to the Type panel or search box expression in order to qualify your search.

  3. You can filter for particular data asset types by clicking the Data types box below the search bar and selecting the types to search for.

    Example: To list public taxi trips datasets, select the Dataset type, enter "trips" in the search box, then click Search. Make sure Include public datasets is checked to display these results:

  4. You can filter your search by adding a keyword:value to your search terms in the search box:

    KeywordDescription
    name: Match data asset name
    column: Match column name
    description: Match table description

  5. You can perform a tag search by adding one of the following tag keyword prefixes to your search terms in the search box:

    TagDescription
    tag:project-name.tag_template_name Match tag name
    tag.project-name.tag_template_name.key Match an tag key
    tag.project-name.tag_template_name.key:value Match tag key:string value pair

Python

"""This application demonstrates how to perform search operations with the
Cloud Data Catalog API.

For more information, see the README.md under /datacatalog and the
documentation at https://cloud.google.com/data-catalog/docs.
"""

import argparse


def search(organization_id, search_string):
    """Searches Data Catalog entries for a given organization."""
    from google.cloud import datacatalog_v1

    datacatalog = datacatalog_v1.DataCatalogClient()

    scope = datacatalog_v1.types.SearchCatalogRequest.Scope()
    scope.include_org_ids.append(organization_id)

    # Alternatively, search using project scopes.
    # scope.include_project_ids.append("my_project_id")

    return datacatalog.search_catalog(
        scope=scope,
        query=search_string)


if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.RawDescriptionHelpFormatter
    )

    parser.add_argument('organization_id',
                        help='Your Google Cloud organization ID')
    parser.add_argument('query', help='Your custom query')

    args = parser.parse_args()

    search_results = None

    if args.query:
        search_results = search(args.organization_id, args.query)
    else:
        raise Exception('Please provide valid search input.')

    for result in search_results:
        print(result)

Java

/*
This application demonstrates how to perform search operations with the
Cloud Data Catalog API.

For more information, see the README.md under /datacatalog and the
documentation at https://cloud.google.com/data-catalog/docs.
*/

package com.example.datacatalog;

import com.google.cloud.datacatalog.v1.SearchCatalogRequest;
import com.google.cloud.datacatalog.v1.SearchCatalogRequest.Scope;
import com.google.cloud.datacatalog.v1.SearchCatalogResult;
import com.google.cloud.datacatalog.v1.DataCatalogClient;
import com.google.cloud.datacatalog.v1.DataCatalogClient.SearchCatalogPagedResponse;

public class SearchCatalog {

  public static void searchCatalog() {
    // TODO(developer): Replace these variables before running the sample.
    String orgId = "111111000000";
    String query = "type=dataset";
    searchCatalog(orgId, query);
  }

  /**
   * Search Data Catalog entries for a given organization.
   *
   * @param orgId The organization ID to which the search will be scoped, e.g. '111111000000'.
   * @param query The query, e.g. 'type:dataset'.
   */
  public static void searchCatalog(String orgId, String query) {
    // Create a scope object setting search boundaries to the given organization.
    Scope scope = Scope.newBuilder().addIncludeOrgIds(orgId).build();

    // Alternatively, search using project scopes.
    // Scope scope = Scope.newBuilder().addAllIncludeProjectIds("my-project").build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DataCatalogClient dataCatalogClient = DataCatalogClient.create()) {

      // Search the catalog.
      SearchCatalogRequest searchCatalogRequest =
          SearchCatalogRequest.newBuilder().setScope(scope).setQuery(query).build();
      SearchCatalogPagedResponse response = dataCatalogClient.searchCatalog(searchCatalogRequest);

      System.out.println("Search results:");
      for (SearchCatalogResult result : response.iterateAll()) {
        System.out.println(result);
      }

    } catch (Exception e) {
      System.out.print("Error during SearchCatalog:\n" + e.toString());
    }
  }
}

Node.js

// This application demonstrates how to perform search operations with the
// Cloud Data Catalog API.

async function search() {

  // -------------------------------
  // Import required modules.
  // -------------------------------
  const { DataCatalogClient } = require('@google-cloud/datacatalog').v1;
  const datacatalog = new DataCatalogClient();

  // -------------------------------
  // Set your Google Cloud Organization ID.
  // -------------------------------
  // TODO: Replace your organization ID in the next line.
  const organizationId = 111111000000;

  // -------------------------------
  // Set your custom query.
  // -------------------------------
  // TODO: If applicable, edit the query string in the next line.
  const query = 'type=dataset'

  // Create request.
  const scope = {
    includeOrgIds: [organizationId],
    // Alternatively, search using project scopes.
    // includeProjectIds: ['my-project'],
  };

  const request = {
    scope: scope,
    query: query,
  };

  const [result] = await datacatalog.searchCatalog(request);
  return result;
}

search().then(response => { console.log(response) });

REST & CMD LINE

If you do not have access to Cloud Client libraries for your language or want to test the API using REST requests, see the following examples and refer to the Data Catalog REST API documentation.

1. Search catalog.

Before using any of the request data below, make the following replacements:

HTTP method and URL:

POST https://datacatalog.googleapis.com/v1/catalog:search

Request JSON body:

{
  "query":"trips",
  "scope":{
    "includeOrgIds":[
      "organization-id"
    ]
  }
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "results":[
    {
      "searchResultType":"ENTRY",
      "searchResultSubtype":"entry.table",
"relativeResourceName":"projects/project-id/locations/US/entryGroups/@bigquery/entries/entry1-id",
      "linkedResource":"//bigquery.googleapis.com/projects/project-id/datasets/demo_dataset/tables/taxi_trips"
    },
    {
      "searchResultType":"ENTRY",
      "searchResultSubtype":"entry.table",
      "relativeResourceName":"projects/project-id/locations/US/entryGroups/@bigquery/entries/entry2-id",
      "linkedResource":"//bigquery.googleapis.com/projects/project-id/datasets/demo_dataset/tables/tlc_yellow_trips_2018"
    }
  ]
}