Manage data assets in a lake

This page explains how to add, upgrade, and remove Cloud Storage buckets and BigQuery datasets as assets in existing Dataplex zones.

Overview

An asset maps to data stored in either Cloud Storage or BigQuery. You can map data stored in separate Google Cloud projects as assets into a single zone within a lake. You can attach existing Cloud Storage buckets or BigQuery datasets to be managed from within the lake.

Before you begin

  • If you haven't already, create a lake and a zone in that lake.

  • Most gcloud lakes commands require a location. You can specify the location by using the --location flag.

Required roles

  • To remove assets, grant the IAM roles containing the permissions dataplex.lakes.delete, dataplex.zones.delete, or dataplex.assets.delete IAM permissions. The Dataplex specific roles/dataplex.admin and roles/dataplex.editor roles can be used to grant these permissions.

  • To add assets, grant the IAM roles containing the permissions create - dataplex.lakes.create, dataplex.zones.create, or dataplex.assets.create. The roles/dataplex.admin and roles/dataplex.editor roles contain these permissions.

  • You can also give permission to users or groups by using the roles/owner and roles/editor legacy roles.

  • You must authorize the Dataplex service on resources being attached to the Dataplex lake. The authorization is automatically and implicitly granted for resources in the project in which the lake is created. For other projects, authorize the Dataplex service on resources explicitly.

For more information, see Dataplex IAM and access control.

Grant roles for Cloud Storage buckets

To attach a Cloud Storage bucket from another project to your lake, you must grant the Dataplex service account (service-PROJECT_NUMBER@gcp-sa-dataplex.iam.gserviceaccount.com, retrieved from the lake details page in the console) the Dataplex service account role (roles/dataplex.serviceAgent) in the project that contains the bucket. This role provides the Dataplex service with the prerequisite administrator level role on the bucket so that permissions can be set on the bucket itself.

Grant roles for BigQuery datasets

To attach a BigQuery dataset from another project to your lake, you must grant the Dataplex service account, the BigQuery Administrator role on the dataset.

VPC Service Controls considerations

Dataplex doesn't violate VPC Service Controls perimeters. Before adding an asset to the lake, make sure that the underlying bucket or dataset is in the same VPC Service Controls network as the lake.

For more information, See VPC Service Controls with Dataplex.

Add an asset

If there is no overlap between the Dataplex lake region and one of the Cloud Storage buckets region, you can't add the bucket to a zone in your lake.

To learn more about the region location of a Cloud Storage asset and how Dataplex handles the location of a bucket when creating the publishing dataset, see Regional resources.

To add an asset, follow these steps:

Console

  1. In the Google Cloud console, go to the Dataplex page.

    Go to Dataplex

  2. On the Manage page, click the lake to which you want to add a Cloud Storage bucket or BigQuery dataset. The lake page opens.

  3. On the Zones tab, click the name of the data zone to which you want to add the asset. The Data zone page for that data zone opens.

  4. On the Assets tab, click + Add Assets. The Add assets page opens.

  5. Click Add an Asset.

  6. In the Type field, and select either BigQuery dataset or Cloud Storage bucket.

  7. In the Display name field, enter a name for the new asset.

  8. In the ID field, enter a unique ID for the asset.

  9. Optional: Enter a Description.

  10. In the Dataset or Bucket field (based on the type of your asset), click Browse to find and select your Cloud Storage bucket or BigQuery dataset.

  11. Optional: If your asset type is Cloud Storage bucket and if you want Dataplex to manage the asset, then select the Upgrade to Managed checkbox. If you choose this option, you don't have to upgrade the asset separately. This option isn't available for BigQuery datasets.

  12. Click Continue.

  13. Choose the rest of the parameter values. For more information about security settings, see Lake security.

  14. Click Submit.

  15. Verify that you have returned to the data zone page, and that your new asset appears in the assets list.

REST

To add an asset, use the lakes.zones.assets.create method.

When the addition succeeds, the data zone automatically enters active state. If it fails, then the data zone is rolled back to its previous healthy state.

Upgrade a Cloud Storage bucket asset

When you add an asset of type Cloud Storage bucket, Dataplex automatically publishes BigQuery external tables for the tables hosted in the asset.

When you upgrade a Cloud Storage bucket asset, Dataplex removes the attached external tables and creates BigLake tables. BigLake tables support better fine-grained security, including row-level, column-level, and dynamic data masking.

To upgrade a Cloud Storage bucket asset, follow these steps:

Console

  1. In the Google Cloud console, go to the Dataplex page.

    Go to Dataplex

  2. On the Manage page, click the name of the lake. The lake page opens.

  3. On the Zones tab, click the name of the data zone. The data zone page opens.

  4. On the Assets tab, click the name of the asset that you want to upgrade.

  5. Click Upgrade to Managed.

REST

To upgrade a bucket asset, use the lakes.zones.assets.patch method.

Downgrade a Cloud Storage bucket asset

When you downgrade a Cloud Storage bucket asset, Dataplex removes the attached BigLake tables and creates external tables.

Console

  1. In the Google Cloud console, go to the Dataplex page.

    Go to Dataplex

  2. On the Manage page, click the name of the lake. The lake page opens.

  3. On the Zones tab, click the name of the data zone. The data zone page opens.

  4. On the Assets tab, click the name of the asset that you want to upgrade.

  5. Click Downgrade from Managed.

REST

To downgrade a bucket asset, use the lakes.zones.assets.patch method. Make sure that you set the readAccessMode field to DIRECT in ResourceSpec.

Remove an asset

Remove the asset from the data zone or lake before attaching it to a different one.

To remove an asset, follow these steps:

Console

  1. In the Google Cloud console, go to the Dataplex page.

    Go to Dataplex

  2. On the Manage page, click the lake from which you want to remove a Cloud Storage bucket or BigQuery dataset. The lake page for that lake opens.

  3. On the Zones tab, click the name of the data zone you want to remove the Cloud Storage bucket or BigQuery dataset from. The Data zone page for that data zone opens.

  4. On the Assets tab, select the asset by checking the box to the left of the asset name.

  5. Click Delete Asset.

  6. On the confirmation dialog, click Delete.

REST

To remove a bucket, use the lakes.zones,assets.delete method.

What's next