Manage data assets in a lake

Stay organized with collections Save and categorize content based on your preferences.

An asset maps to data stored in either Cloud Storage or BigQuery. You can map data stored in separate Google Cloud projects as assets into a single zone within a lake. You can attach existing Cloud Storage buckets or BigQuery datasets to be managed from within the lake.

This page explains how to add Cloud Storage buckets and BigQuery datasets as assets in existing Dataplex zones.

Before you begin

  • If you haven't already, create a lake and a zone in that lake.

  • Most gcloud lakes commands require a location. You can specify the location by using the --location flag.

Access control

  • To remove assets, you must be granted IAM roles containing the dataplex.lakes.delete, dataplex.zones.delete, or dataplex.assets.delete IAM permissions. The Dataplex specific roles, roles/dataplex.admin and roles/dataplex.editor, can be used to grant these permissions.

  • To add assets, you must be granted IAM roles containing create - dataplex.lakes.create, dataplex.zones.create, or dataplex.assets.create. The roles roles/dataplex.admin and roles/dataplex.editor, contain these permissions.

  • You can also give permission to users or groups by using the roles/owner and roles/editor legacy roles.

  • The Dataplex service must be authorized on resources being attached to the Dataplex lake. This is automatically and implicitly done for resources in the project where the lake is being created. For other projects, this must be done explicitly.

    • To add a Cloud Storage bucket from another project, the lake service account (which is on the Lake details page in the console) must be granted the dataplex.serviceAgent role in Cloud Storage.

    • To add a BigQuery dataset from another project, the lake service account must be granted the BigQuery Admin role for the dataset.

For more information, see Dataplex IAM and access control.

VPC Service Controls considerations

Dataplex does not violate VPC Service Controls perimeters. Before adding an asset to the lake, be sure that the underlying bucket or dataset is in the same VPC Service Controls network as the lake.

For more information, See VPC service controls with Dataplex.

Add an asset

You can add a Cloud Storage bucket or a BigQuery dataset asset by issuing a Dataplex API method lakes.zones.assets.create or adding a bucket or dataset on the Data zone page opened in a local browser.

You can create multiple assets under a data zone concurrently. You can still use the data zone while the asset is being added.

The following instructions demonstrate how to add an asset using the Google Cloud console or the Dataplex API.

Grant roles

Role for Cloud Storage buckets

To attach a Cloud Storage bucket from another project to your lake, you must grant the Dataplex service account (service-CUSTOMER_PROJECT_NUMBER@gcp-sa-datalake.iam.gserviceaccount.com retrieved from the lake details page in the console) the Cloud Storage Admin role (roles/storage.admin). This role provides the lake service with the requisite admin level role (roles/dataplex.serviceAgent) to the bucket so that permissions can be set on the bucket itself.

Role for BigQuery datasets

To attach a BigQuery dataset from another project to your lake, you must grant the Dataplex service account BigQuery Admin role to the dataset, so that permissions can be set on the dataset.

Add a bucket or a dataset

Console

  1. In the Google Cloud console, open the Dataplex page:

    Open Dataplex in the Google Cloud console

  2. On the Dataplex page, click the lake name of the lake you'd like to add a Cloud Storage bucket or BigQuery dataset to. The lake page for that lake opens.

    lake detail page
  3. On the Zones tab, click the name of the data zone you'd like to add the asset to. The Data zone page for that data zone opens.

  4. On the Assets tab, click + Add Assets. The Add assets page opens.

  5. Click Add Asset, and then select either Add bucket or Add dataset.

  6. Click Browse to find and select your Cloud Storage bucket or BigQuery dataset.

  7. Choose the rest of the parameter values. For more information about security settings, see Lake security.

  8. Click the Save button to add the asset.

  9. Verify that you have returned to the Data zone page, and that your new asset appears in the assets list.

REST

Follow the API instructions to add a bucket by using the APIs Explorer.

When the addition succeeds, the data zone automatically enters active state. If it fails, then the data zone is rolled back to its previous healthy state.

Remove an asset

You can remove a Cloud Storage bucket or BigQuery dataset asset by issuing a Dataplex API method lakes.zones.assets.delete or by clicking Delete Asset on the Data zone page opened in a local browser. Remove the asset from the data zone or lake before attaching it to a different one.

The following instructions demonstrate how to remove a Dataplex asset using the Google Cloud console or the Dataplex API.

Console

  1. In the Google Cloud console, open the Dataplex page:

    Open Dataplex in the Google Cloud console

  2. On the Dataplex page, click the lake name of the lake you'd like to remove a Cloud Storage bucket or BigQuery dataset from. The lake page for that lake opens.

    lake detail page
  3. On the Zones tab, click the name of the data zone you'd like to remove the Cloud Storage bucket or BigQuery dataset from. The Data zone page for that data zone opens.

  4. On the Assets tab, select the asset by checking the box to the left of the asset name.

  5. Click Delete Asset to remove the asset.

  6. On the dialog, click Delete to confirm the detachment.

REST

Follow the API instructions to remove a bucket by using the Cloud Explorer.

What's next?