This document describes how to secure and manage access to Dataplex lakes.
The Dataplex security model lets you manage user permissions for the following tasks:
- Administering a lake (creating and attaching assets, zones, and additional lakes)
- Accessing data connected to a lake through the mapping asset (for example, Google Cloud resources, such as Cloud Storage buckets and BigQuery datasets)
- Accessing metadata about the data connected to a lake
An administrator for a lake controls access to Dataplex resources, such as lake, zone, and assets by granting the basic and predefined roles.
Basic roles
Role | Description |
---|---|
Dataplex Viewer ( roles/dataplex.viewer ) |
Ability to view (but not edit) the lake and its configured zones and assets. |
Dataplex Editor ( roles/dataplex.editor ) |
Ability to edit the lake. Can create and configure lakes, zones, assets, and tasks. |
Dataplex Administrator ( roles/dataplex.administrator ) |
Ability to fully administer a lake. |
Dataplex Developer ( roles/dataplex.developer ) |
Ability to run data analytics workloads on a lake. * |
To run a Spark job, create Dataproc clusters and submit Dataproc jobs in the project to which you want the compute attributed.
Predefined roles
Google Cloud manages the predefined roles that provide granular access for Dataplex.
Metadata roles
Metadata roles have the ability to view metadata, such as table schemas.
Role | Description |
---|---|
Dataplex Metadata Writer ( roles/dataplex.metadataWriter ) |
Ability to update the metadata of a certain resource. |
Dataplex Metadata Reader ( roles/dataplex.metadataReader ) |
Ability to read the metadata (for example, to query a table). |
Data roles
Granting data roles to a principal gives them the ability to read or write data in the underlying resources pointed to by the assets of the lake.
Dataplex maps its roles to the data roles for each underlying storage resource, such as Cloud Storage and BigQuery).
Dataplex translates and propagates Dataplex data roles to the underlying storage resource, setting the correct roles for each storage resource. You can grant a single Dataplex data role at the lake hierarchy (for example, a lake), and Dataplex maintains the specified access to data on all resources connected to that lake (for example, Cloud Storage buckets and BigQuery datasets are referred to by assets in the underlying zones).
For example, granting a principal the dataplex.dataWriter
role for a lake
gives the principal write access to all data within the lake, its
underlying zones and assets. Data access roles granted at a lower level (zone)
are inherited in the lake hierarchy to the underlying assets.
Role | Description |
---|---|
Dataplex Data Reader ( roles/dataplex.dataReader ) |
Ability to read data from storage attached to assets, including storage buckets and BigQuery datasets (and their contents). * |
Dataplex Data Writer ( roles/dataplex.dataWriter ) |
Ability to write to the underlying resources pointed to by the asset. * |
Dataplex Data Owner ( roles/dataplex.dataOwner ) |
Grants the Owner role to the underlying resources, including the ability to manage child resources. For example, as the Data Owner of a BigQuery dataset, you can manage the underlying tables. |
Secure your lake
You can secure and manage access to your lake, and the data attached to it. In the Google Cloud console, use one of the following views:
- The Dataplex Manage view on the Permissions tab
- The Dataplex Secure view
Using the Manage view
The Permissions tab lets you manage all the permissions on a lake resource, and presents an unfiltered view of all the permissions, including those inherited.
To secure your lake, follow these steps:
In the Google Cloud console, go to Dataplex.
Navigate to the Manage view.
Click the name of the lake that you created.
Click the Permissions tab.
Click the View by Roles tab.
Click Add to add a new role. Add the Dataplex Data Reader, Data Writer, and Data Owner roles.
Verify that the Dataplex Data Reader, Data Writer, and Data Owner roles appear.
Using the Secure view
The Dataplex Secure view in the Google Cloud console provides the following:
- A filterable view of only the Dataplex roles that are centered on a specific resource
- Separate data roles from lake resource roles
Policy management
After you specify your security policy, Dataplex propagates the permissions to the IAM policies of the managed resources.
The security policy configured at the lake level is propagated to all the resources managed within that lake. Dataplex provides propagation status and visibility into these large scale propagations on the Dataplex Manage > Permissions tab. It continuously monitors the managed resources for any changes to IAM policy outside of Dataplex.
Users that already have permissions on a resource continue to have them after a resource gets attached to a Dataplex lake. Similarly, non-Dataplex role bindings that are created or updated after attaching the resource to Dataplex stay the same.
Set column-level, row-level, and table-level policies
Cloud Storage bucket assets have associated BigQuery external tables attached to them.
You can upgrade a Cloud Storage bucket asset, which means that Dataplex removes the attached external tables and attaches BigLake tables instead.
You can use BigLake tables instead of external tables to give you fine-grained access control, including row-level controls, column-level controls, and column data masking.
Metadata security
Metadata primarily refers to schema information associated with user data present in resources managed by a lake.
Dataplex Discovery examines the data in managed resources and extracts tabular schema information. These tables are published to BigQuery, Dataproc Metastore, and Data Catalog systems.
BigQuery
Each discovered table has an associated table registered in BigQuery. For each zone, there is an associated BigQuery dataset under which all the external tables associated with tables discovered in that data zone are registered.
The discovered Cloud Storage-hosted tables are registered under the dataset created for the zone.
Dataproc Metastore
Databases and tables are made available in the Dataproc Metastore associated with the Dataplex lake instance. Each data zone has an associated database, and each asset can have one or more associated tables.
The data in a Dataproc Metastore service is secured by configuring your VPC-SC network. The Dataproc Metastore instance is provided to Dataplex during lake creation, which already makes it a user-managed resource.
Data Catalog
Each discovered table has an associated entry in Data Catalog, to enable search and discovery.
Data Catalog requires IAM policy names
during entry creation. Therefore, Dataplex provides the
IAM policy name of the Dataplex asset resource that
the entry should be associated with. As a result, the permissions on the
Dataplex entry are driven by the permissions on the asset resource.
Grant the Dataplex Metadata Reader role (roles/dataplex.metadataReader
) and
the Dataplex Metadata Writer role (roles/dataplex.metadataWriter
) on the asset
resource.
What's next?
- Learn more about Dataplex IAM.
- Learn more about Dataplex IAM roles.
- Learn more about Dataplex IAM permissions.