Quickstart

Dataplex allows you to logically organize your data stored in Cloud Storage and BigQuery into lakes and zones, and automate data management and governance across that data to power analytics at scale.

This page shows you the basics of getting started with Dataplex in the Google Cloud console.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.

  4. Enable the Dataplex, Dataproc, Dataproc Metastore, Data Catalog, BigQuery, and Cloud Storage APIs.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.

  7. Enable the Dataplex, Dataproc, Dataproc Metastore, Data Catalog, BigQuery, and Cloud Storage APIs.

    Enable the APIs

  8. Make sure you have the pre-defined roles roles/dataplex.admin or roles/dataplex.editor granted to you so that you can create, manage, and delete Dataplex resources. Follow the steps in the IAM documentation for granting roles.
  9. If you have a Cloud Storage bucket to use, skip this step. Otherwise, create a Cloud Storage bucket:
    1. In the console, go to the Cloud Storage Browser page.

      Go to Browser

    2. Click Create bucket.
    3. On the Create a bucket page, enter your bucket information. To go to the next step, click Continue.
      • For Name your bucket, enter a unique bucket name. Don't include sensitive information in the bucket name, because the bucket namespace is global and publicly visible.
      • For Choose where to store your data, do the following:
        • Select a Location type option.
        • Select a Location option.
      • For Choose a default storage class for your data, select the following: Standard.
      • For Choose how to control access to objects, select an Access control option.
      • For Advanced settings (optional), specify an encryption method, a retention policy, or bucket labels.
    4. Click Create.

Create a lake

The following steps show you how to create a lake using the Google Cloud console.

  1. Go to Dataplex in the console.

    Go to Dataplex

  2. Navigate to the Manage view.

  3. Click Create.

  4. Enter a Display name.

  5. The lake ID is automatically generated for you. If you prefer, you can provide your own ID.

  6. Specify the Region in which to create the lake.

    For lakes created in a given region (for example, us-central1), both single-region (us-central1) data and multi-region (us multi-region) data can be attached depending on the zone settings.

  7. Click Create.

Add a zone to your lake

After you create your lake, you can add zones to the lake. Zones are logical groupings of unstructured and structured data.

  1. In the Manage view, click the name of the lake you want to add a zone to.

  2. Click Add zone.

  3. Enter a Display name for your zone.

  4. Click the Type dropdown. Choose Raw Zone or Curated Zone. Learn more about supported zone types.

  5. Under Data locations select either Regional or Multi-regional. What you choose cannot be changed later. Single region and multi-region data cannot be mixed in the same zone.

  6. Click Create.

It may take a few minutes for the zone to be created.

Attach an asset

Data can be stored in Cloud Storage buckets or BigQuery datasets, and can be attached as assets to data zones within a Dataplex lake.

Follow these steps to attach the Cloud Storage bucket you created earlier as an asset.

  1. In the Manage view, click the name of your lake to which you want to attach a Cloud Storage bucket to.

  2. On the Zones tab, click the zone to add the asset to.

  3. On the Assets tab, click Add Assets.

  4. Click Add an asset.

  5. Under Type, select Storage bucket.

  6. Under Display name, enter a name for the asset.

  7. In the Bucket field, Click Browse. If you have a Cloud Storage bucket, find it and click Select. If you don't have a Cloud Storage bucket, you can create one by clicking the button.

    1. Enter a unique name for the bucket. Click Continue.

    2. Choose a Location type. Click Continue.

    3. Choose a default storage class for your data. Click Continue.

    4. Choose an access control level. Click Continue.

    5. Choose a data protection option or None. Click Continue.

    6. Click Create.

    7. Click Select

  8. Click Done.

  9. Click Continue.

  10. Under Discovery settings, select Inherit to inherit the Discovery settings from the zone level.

  11. Click Continue.

  12. Under Add assets, click Submit.

Wait for the Asset creation to finish.

Use your lake

After you create your lake, zones, and assets, you can:

Clean up

To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.

  1. In the console, go to the Manage resources page.

    Go to Manage resources

  2. If the project that you plan to delete is attached to an organization, expand the Organization list in the Name column.
  3. In the project list, select the project that you want to delete, and then click Delete.
  4. In the dialog, type the project ID, and then click Shut down to delete the project.

Alternatively, you can delete the resources used in this tutorial. Lakes cannot be deleted unless all data zone resources under it have been deleted. Similarly, data zones cannot be deleted unless all asset resources under it have been deleted:

Detach the storage bucket

The following steps show you how to detach the Dataplex asset you created.

  1. Go to Dataplex in the console.

    Go to Dataplex

  2. In the Manage view, click on the name of the lake you created.

  3. In the Zones tab, click on the name of the zone you created.

  4. In the Assets tab, select the asset to detach by checking the box to the left of the bucket name.

  5. Click Delete Asset.

  6. Click Delete to confirm the detachment.

Delete the zone

The following steps show you how to delete the Dataplex zone you created.

  1. Go to Dataplex in the console.

    Go to Dataplex

  2. In the Manage view, click the lake you created.

  3. On the Zones tab, select the zone to delete by checking the box to the left of the data zone name.

  4. Click Delete Zone.

  5. Click Delete to confirm the deletion.

Delete the lake

The following steps show you how to delete the Dataplex lake you created.

  1. Go to Dataplex in the console.

    Go to Dataplex

  2. In the Manage view, click the lake you created.

  3. At the top of the page, click Delete.

  4. Confirm deletion by typing "delete" in the text box.

  5. Click Delete Lake to confirm the deletion.

What's next