Quickstart: Create a lake
This document shows you how to get started with Dataplex in the Google Cloud console, by walking you through creating a lake, adding a zone, and attaching an asset.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Dataplex, Dataproc, Dataproc Metastore, Data Catalog, BigQuery, and Cloud Storage APIs.
-
Make sure that you have the following role or roles on the project: `roles/dataplex.admin`, `roles/dataplex.editor`
Check for the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
-
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
- For all rows that specify or include you, check the Role colunn to see whether the list of roles includes the required roles.
Grant the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
- Click Grant access.
-
In the New principals field, enter your user identifier. This is typically the email address for a Google Account.
- In the Select a role list, select a role.
- To grant additional roles, click Add another role and add each additional role.
- Click Save.
-
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Dataplex, Dataproc, Dataproc Metastore, Data Catalog, BigQuery, and Cloud Storage APIs.
-
Make sure that you have the following role or roles on the project: `roles/dataplex.admin`, `roles/dataplex.editor`
Check for the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
-
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
- For all rows that specify or include you, check the Role colunn to see whether the list of roles includes the required roles.
Grant the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
- Click Grant access.
-
In the New principals field, enter your user identifier. This is typically the email address for a Google Account.
- In the Select a role list, select a role.
- To grant additional roles, click Add another role and add each additional role.
- Click Save.
-
- Create a Cloud Storage bucket:
- In the Google Cloud console, go to the Cloud Storage Buckets page.
- Click Create bucket.
- On the Create a bucket page, enter your bucket information. To go to the next
step, click Continue.
- For Name your bucket, enter a unique bucket name. Don't include sensitive information in the bucket name, because the bucket namespace is global and publicly visible.
-
For Choose where to store your data, do the following:
- Select a Location type option.
- Select a Location option.
- For Choose a default storage class for your data, select the following: Standard.
- For Choose how to control access to objects, select an Access control option.
- For Advanced settings (optional), specify an encryption method, a retention policy, or bucket labels.
- Click Create.
Create a lake
A lake is a logical construct representing a data domain or business unit. For example, if you need to organize data based on group usage, you would create a lake for each department (for example, retail, sales, and finance).
The following steps show you how to create a lake using the Google Cloud console.
Go to Dataplex in the Google Cloud console.
Navigate to the Manage view.
Click
Create.Enter a Display name.
The lake ID is automatically generated for you.
Specify the Region in which to create the lake.
For lakes created in a given region (for example,
us-central1
), both single-region (us-central1
) data and multi-region (us multi-region
) data can be attached depending on the zone settings.Click Create.
Add a zone to your lake
After you create your lake, you can add zones to the lake. Zones are logical groupings within a lake, that are useful for categorizing structured and unstructured data.
In the Manage view, click the name of the lake you want to add a zone to.
Click
Add zone.Enter a Display name for your zone.
Click the Type drop-down. Choose Raw Zone or Curated Zone. Learn more about the types of zones.
Under Data locations select either Regional or Multi-regional. What you choose cannot be changed later. Single region and multi-region data cannot be mixed in the same zone.
Click Create.
It may take a few minutes for the zone to be created.
Attach an asset
Data can be stored in Cloud Storage buckets or BigQuery datasets, and can be attached as assets to data zones within a Dataplex lake.
To attach your Cloud Storage bucket as an asset, follow these steps:
In the Manage view, click the name of your lake to which you want to attach a Cloud Storage bucket to.
On the Zones tab, click the zone to add the asset to.
On the Assets tab, click
Add Assets.Click Add an asset.
Under Type, select Storage bucket.
Under Display name, enter a name for the asset.
In the Bucket field, Click Browse. If you have a Cloud Storage bucket, find it and click Select. If you don't have a Cloud Storage bucket, you can create one by clicking the
button.Enter a unique name for the bucket. Click Continue.
Choose a Location type. Click Continue.
Choose a default storage class for your data. Click Continue.
Choose an access control level. Click Continue.
Choose a data protection option or None. Click Continue.
Click Create.
Click Select
Click Done.
Click Continue.
Under Discovery settings, select Inherit to inherit the Discovery settings from the zone level.
Click Continue.
Under Add assets, click Submit.
Wait for the Asset creation to finish.
To use your lake, see the What's next section. Otherwise, delete the resources you created by following the steps in the Clean up section.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.
- In the Google Cloud console, go to the Manage resources page.
- If the project that you plan to delete is attached to an organization, expand the Organization list in the Name column.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Alternatively, you can delete the resources used in this tutorial. A lake isn't deleted until you delete all of its data zone resources. Similarly, a data zone isn't deleted unless you delete all of its asset resources.
Detach the storage bucket
To detach the Dataplex asset you created, follow these steps:
Go to Dataplex in the Google Cloud console.
In the Manage view, click the name of the lake you created.
In the Zones tab, click the name of the zone you created.
In the Assets tab, select the asset to detach by checking the box to the left of the bucket name.
Click Delete Asset.
Click Delete to confirm the detachment.
Delete the zone
To delete the Dataplex zone you created, follow these steps:
Go to Dataplex in the Google Cloud console.
In the Manage view, click the lake you created.
On the Zones tab, select the zone to delete by checking the box to the left of the data zone name.
Click Delete Zone.
Click Delete to confirm the deletion.
Delete the lake
The following steps show you how to delete the Dataplex lake you created.
Go to Dataplex in the Google Cloud console.
In the Manage view, click the lake you created.
At the top of the page, click Delete.
Confirm deletion by typing "delete" in the field.
Click Delete Lake to confirm the deletion.