如何使用 Data Catalog 进行搜索

本文档介绍了如何使用 Data Catalog 查找 Google Cloud 项目中的数据集、表、视图和 Cloud Pub/Sub 主题等数据资源。

搜索范围

如果用户拥有不同的权限,搜索结果可能会有所不同。Data Catalog 搜索结果根据用户的 IAM 角色和权限进行限定。如果用户拥有某个对象的 BigQuery 元数据读取权,则该对象会显示在其 Data Catalog 搜索结果中。

例如,如需搜索表,您需要拥有该表的 bigquery.tables.get 权限;如需搜索数据集,您需要拥有该数据集的 bigquery.datasets.get 权限。BigQuery Metadata Viewer 角色 (roles/bigquery.metadataViewer) 包含要在搜索结果中显示的数据集、表或视图所需的最低元数据读取权限。

日期分片表

Data Catalog 将日期分片表聚合为单个逻辑条目。此条目的架构与具有最新日期的表分片相同,并包含分片总数的聚合信息。条目的访问权限级来自其所属的数据集。仅当用户有权访问包含这些逻辑条目的数据集时,Data Catalog 搜索才会显示这些逻辑条目。 Data Catalog 搜索中不会显示单个日期分片表,即使它们存在于 Data Catalog 中并且可以标记也是如此。

如何搜索数据资源

控制台

  1. 您可以在 Google Cloud Console 的 Data Catalog 首页中搜索数据资源。您还可以从“数据资源”和“搜索提示”面板中进行选择以过滤搜索。

  2. 当您点击搜索或从 Data Catalog 首页中的“数据资源”和“搜索提示”面板中进行选择时,将打开“搜索”页面。如果您在首页上进行了数据资源或搜索提示选择,则会被转到“类型”面板或搜索框表达式,以使您的搜索符合要求。

  3. 您可以通过点击搜索栏下方的数据类型框并选择要搜索的类型来过滤特定数据资产类型。

    示例:要列出公共出租车行程数据集,选择数据集类型,在搜索框中输入“行程”,然后点击搜索。确保选中包括公共数据集,以显示这些结果:

  4. 您可以在搜索框中为搜索字词添加 keyword:value 来过滤搜索:

    关键字说明
    name: 匹配数据资源名称
    column: 匹配列名称
    description: 匹配表的说明

  5. 您可以通过给搜索框中的搜索字词添加以下标记关键字前缀来执行标记搜索:

    标记说明
    tag:project-name.tag_template_name 匹配标记名称
    tag.project-name.tag_template_name.key 匹配标记键
    tag.project-name.tag_template_name.key:value 匹配标记 key:string value

Python

"""This application demonstrates how to perform search operations with the
Cloud Data Catalog API.

For more information, see the README.md under /datacatalog and the
documentation at https://cloud.google.com/data-catalog/docs.
"""

import argparse

def search(organization_id, search_string):
    """Searches Data Catalog entries for a given organization."""
    from google.cloud import datacatalog_v1

    datacatalog = datacatalog_v1.DataCatalogClient()

    scope = datacatalog_v1.types.SearchCatalogRequest.Scope()
    scope.include_org_ids.append(organization_id)

    # Alternatively, search using project scopes.
    # scope.include_project_ids.append("my_project_id")

    return datacatalog.search_catalog(
        scope=scope,
        query=search_string)

if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.RawDescriptionHelpFormatter
    )

    parser.add_argument('organization_id',
                        help='Your Google Cloud organization ID')
    parser.add_argument('query', help='Your custom query')

    args = parser.parse_args()

    search_results = None

    if args.query:
        search_results = search(args.organization_id, args.query)
    else:
        raise Exception('Please provide valid search input.')

    for result in search_results:
        print(result)

Java

/*
This application demonstrates how to perform search operations with the
Cloud Data Catalog API.

For more information, see the README.md under /datacatalog and the
documentation at https://cloud.google.com/data-catalog/docs.
*/

package com.example.datacatalog;

import com.google.cloud.datacatalog.v1.SearchCatalogRequest;
import com.google.cloud.datacatalog.v1.SearchCatalogRequest.Scope;
import com.google.cloud.datacatalog.v1.SearchCatalogResult;
import com.google.cloud.datacatalog.v1.DataCatalogClient;
import com.google.cloud.datacatalog.v1.DataCatalogClient.SearchCatalogPagedResponse;

public class SearchCatalog {

  public static void searchCatalog() {
    // TODO(developer): Replace these variables before running the sample.
    String orgId = "111111000000";
    String query = "type=dataset";
    searchCatalog(orgId, query);
  }

  /**
   * Search Data Catalog entries for a given organization.
   *
   * @param orgId The organization ID to which the search will be scoped, e.g. '111111000000'.
   * @param query The query, e.g. 'type:dataset'.
   */
  public static void searchCatalog(String orgId, String query) {
    // Create a scope object setting search boundaries to the given organization.
    Scope scope = Scope.newBuilder().addIncludeOrgIds(orgId).build();

    // Alternatively, search using project scopes.
    // Scope scope = Scope.newBuilder().addAllIncludeProjectIds("my-project").build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DataCatalogClient dataCatalogClient = DataCatalogClient.create()) {

      // Search the catalog.
      SearchCatalogRequest searchCatalogRequest =
          SearchCatalogRequest.newBuilder().setScope(scope).setQuery(query).build();
      SearchCatalogPagedResponse response = dataCatalogClient.searchCatalog(searchCatalogRequest);

      System.out.println("Search results:");
      for (SearchCatalogResult result : response.iterateAll()) {
        System.out.println(result);
      }

    } catch (Exception e) {
      System.out.print("Error during SearchCatalog:\n" + e.toString());
    }
  }
}

Node.js

// This application demonstrates how to perform search operations with the
// Cloud Data Catalog API.

async function search() {

  // -------------------------------
  // Import required modules.
  // -------------------------------
  const { DataCatalogClient } = require('@google-cloud/datacatalog').v1;
  const datacatalog = new DataCatalogClient();

  // -------------------------------
  // Set your Google Cloud Organization ID.
  // -------------------------------
  // TODO: Replace your organization ID in the next line.
  const organizationId = 111111000000;

  // -------------------------------
  // Set your custom query.
  // -------------------------------
  // TODO: If applicable, edit the query string in the next line.
  const query = 'type=dataset'

  // Create request.
  const scope = {
    includeOrgIds: [organizationId],
    // Alternatively, search using project scopes.
    // includeProjectIds: ['my-project'],
  };

  const request = {
    scope: scope,
    query: query,
  };

  const [result] = await datacatalog.searchCatalog(request);
  return result;
}

search().then(response => { console.log(response) });

REST 和命令行

如果您无法使用针对您的语言的 Cloud 客户端库或者您想要使用 REST 请求来测试 API,请参阅以下示例并参阅 Data Catalog REST API 文档。

1.搜索目录

在使用下面的请求数据之前,请先进行以下替换:

HTTP 方法和网址:

POST https://datacatalog.googleapis.com/v1/catalog:search

请求 JSON 正文:

{
  "query":"trips",
  "scope":{
    "includeOrgIds":[
      "organization-id"
    ]
  }
}

如需发送您的请求,请展开以下选项之一:

您应该收到类似以下内容的 JSON 响应:

{
  "results":[
    {
      "searchResultType":"ENTRY",
      "searchResultSubtype":"entry.table",
"relativeResourceName":"projects/project-id/locations/US/entryGroups/@bigquery/entries/entry1-id",
      "linkedResource":"//bigquery.googleapis.com/projects/project-id/datasets/demo_dataset/tables/taxi_trips"
    },
    {
      "searchResultType":"ENTRY",
      "searchResultSubtype":"entry.table",
      "relativeResourceName":"projects/project-id/locations/US/entryGroups/@bigquery/entries/entry2-id",
      "linkedResource":"//bigquery.googleapis.com/projects/project-id/datasets/demo_dataset/tables/tlc_yellow_trips_2018"
    }
  ]
}