Search for data assets in Dataplex Catalog

Use search in Dataplex Catalog to search for data assets such as BigQuery datasets, Cloud SQL instances, and others. For more information about the Google Cloud assets that are supported in Dataplex Catalog, see Supported Google Cloud sources.

Search scope

The search results in Dataplex Catalog respect permissions that you have over the corresponding resources in source systems.

For example, if you have BigQuery metadata read access to an object, that object appears in your Dataplex Catalog search results. If you have access to a BigQuery table but not to the dataset containing that table, the table still shows up as expected in the Dataplex Catalog search.

The search results include only those resources that belong to the same VPC-SC perimeter as the project under which search is performed. When using the Google Cloud console, this is the project that is selected in the console.

To broaden the scope of your search results beyond the resources within your project's VPC Service Controls perimeter, use VPC Service Controls ingress and egress rules. These rules facilitate private and efficient data exchange across your organization. You can configure ingress and egress rules using the Google Cloud console or through JSON or YAML files. Refer to the following YAML example and consult the VPC Service Controls documentation to tailor the rule to your specific requirements.

egressPolicies:
  - egressFrom:
      identityType: ANY_USER_ACCOUNT
    egressTo:
      # Specify which resources should be present in the search results. In this example,
      # BigQuery.
      operations:
      - methodSelectors:
        - method: '*'
        serviceName: bigquery.googleapis.com
      # Specify project ids under which the search is performed.
      resources:
      - projects/SEARCH_PROJECT_ID
ingressPolicies:
  - ingressFrom:
      identityType: ANY_USER_ACCOUNT
      sources:
      - accessLevel: '*'
    ingressTo:
      # Specify which resources should be present in the search results. In this example,
      # BigQuery.
      operations:
      - methodSelectors:
        - method: '*'
        serviceName: bigquery.googleapis.com
      # Specify project ids to expose in search results.
      resources:
      - projects/INGRESS_PROJECT_ID

For more information about Dataplex Catalog IAM roles, see Dataplex IAM roles.

Recall limitations in search

Dataplex Catalog search queries don't guarantee full recall. Results that match your query might not be returned, even in subsequent result pages. Additionally, returned (and not returned) results can vary if you repeat search queries.

Filters

Filters let you narrow down the search results. All filters are grouped in sections:

  • Systems such as BigQuery, Cloud SQL, and others. The Dataplex system contains custom entries.
  • Aspects (tags) list all aspects available to you.
  • Project lists all projects available to you.
  • Type aliases that describe resource types, such as databases, datasets, models, tables, views, services, and custom types.
  • Datasets come from BigQuery.

You can combine filters from multiple sections to find assets that match at least one condition from every selected section. Multiple filters that are selected within a single section are evaluated using the OR logical operator.

For example, consider the filter combination in the following image (click image to enlarge). These search filters are selected: systems BigQuery, type aliases table and view, aspects My aspect type 1 and My aspect type 2, project my-test-project, and datasets test_bq_dataset.

Search filters showing multiple selections.

Dataplex Catalog looks for the following assets:

  • BigQuery tables in test_bq_dataset with aspect My aspect type 1
  • BigQuery tables in test_bq_dataset with aspect My aspect type 2
  • BigQuery views in test_bq_dataset with aspect My aspect type 1
  • BigQuery views in test_bq_dataset with aspect My aspect type 2

Filter by aspect value

The Aspects filters let you query for assets tagged using a specific template. You can use the Customize menu to further refine results and filter by specific aspect values. The aspect value filter conditions depend on that aspect field's data type. For example, for the datetime and number fields, you can specify a specific date or a range.

Filter visibility

The filters Systems, Type aliases, Project, and Datasets are displayed depending on the current query in the Search field.

Before you begin

Before you search for data assets, do the following things.

Required roles

This section describes the roles and permissions required to search for data assets and to access the search results.

For more information about granting roles, see Manage access.

You might also be able to get the required permissions through custom roles or other predefined roles.

Required roles for searching entries

To search for entries, you need at least one of the Dataplex Catalog IAM roles on the project that is used for search. Permissions on search results are checked independently of the selected project.

Required roles for accessing search results

The search results in Dataplex Catalog are scoped according to your role. To search for an asset in Dataplex Catalog, you must have permissions to access the corresponding resource in the source system. For more information, see the Search scope section of this document.

For example, to search for BigQuery datasets, tables, views, and models, you need respective permissions for those entries. For more information, see BigQuery permissions. The following list describes the minimum permissions required:

  • To search for a table, you need bigquery.tables.get permission for that table.
  • To search for a dataset, you need bigquery.datasets.get permission for that dataset.
  • To search for metadata for a dataset or a table, you need the BigQuery Metadata Viewer role (roles/bigquery.metadataViewer).

As another example, to search for Cloud SQL instances, databases, schemas, tables, and views, you need respective permissions on those entries. For more information, see Cloud SQL roles and permissions.

To search for custom entries, you need the Dataplex Catalog Viewer role (roles/dataplex.catalogViewer).

Enable the API

Enable the Dataplex API.

Enable the API

Search for data assets

Console

To search for data assets, follow these steps:

  1. In the Google Cloud console, go to the Dataplex Search page.

    Go to Search

  2. For Choose search platform, select Dataplex Catalog as the search mode.

    Selecting Dataplex Catalog lets you search over the Dataplex Catalog metadata storage. Selecting Data Catalog lets you search over your Data Catalog repository, if you're an existing Data Catalog user.

  3. In the search field, enter your query, or use the Filters panel to refine the search parameters.

    You can manually add the following filters:

    • Add a project filter: in Project, click Add project. Search for a specific project, select the project, and then click Open.
    • Add an aspect type filter: in Aspects, click the Add more aspect types menu. Search for a specific template, select it, and then click OK.
  4. Optional: In addition to the assets available to you, you can search for data assets that are publicly available in Google Cloud by selecting Include public datasets.

Use the following tips to construct a search query:

  • Enclose your search expression in quotes if it contains spaces. For example, "search terms".
  • You can precede a keyword with NOT to match the logical negation of the keyword:term filter. You can also use AND and OR Boolean operators to combine search expressions. The AND, OR, and NOT operators aren't case-sensitive.

    For example, NOT column:term lists all columns except those that match the specified term. For a list of keywords and other terms you can use in a Dataplex Catalog search expression, see Search syntax.

gcloud

To search for data assets, use the gcloud dataplex entries search command.

REST

To search for data assets, use the searchEntries method.

View details of an entry

Console

Use Dataplex Catalog search to view the details of an entry.

  1. In the Google Cloud console, go to the Dataplex Search page.

    Go to Search

  2. Select Dataplex Catalog as the search mode.

  3. In the search box, enter the name of an entry.

  4. Click the entry.

    The entry details page opens. The page includes the following sections:

    • Entry details: includes information such as the entry type, system, platform, fully qualified name, creation time, last modification time, description, and stewards.
    • Overview: an overview of the entry, if available.
    • Aspects: the required and optional aspects defined for the entry. For more information, see Categories of aspects.

gcloud

To view the details of an entry, use the gcloud dataplex entries lookup command.

REST

To view the details of an entry, use the lookupEntry method.

What's next