The Dataplex security model allows you to manage who has access to perform the following tasks:
- Administering a lake (creating and attaching assets, zones, and additional lakes)
- Accessing data connected to a lake through the mapping asset (Google Cloud resources such as Cloud Storage buckets and BigQuery datasets)
- Accessing metadata about the data connected to a lake
An administrator for a lake controls access to Dataplex resources (lake, zone, and assets) by granting the following basic and predefined roles.
|Ability to view (but not edit) the lake and configure relevant assets, zones, or lakes.|
|Ability to edit the lake. Can create and configure lakes, zones, assets, and tasks.|
|Ability to fully administer a lake.|
|Ability to run data analytics workloads on a lake. *|
To run a Spark job, create Dataproc clusters and submit Dataproc jobs in the project to which you want the compute attributed.
Google Cloud manages the following roles, that provide granular access for Dataplex.
Metadata roles have the ability to view metadata, such as table schemas.
|Dataplex Metadata Writer
|Ability to update the metadata of a certain resource.|
|Dataplex Metadata Reader
|Ability to read the metadata (for example, to query a table).|
Granting data roles to a principal gives them the ability to read or write data in the underlying resources pointed to by the assets of the lake.
Dataplex maps its roles to the data roles for each underlying storage resource (Cloud Storage, BigQuery).
Dataplex translates and propagates Dataplex data roles to the underlying storage resource, setting the correct roles for each storage resource. The benefit is that you can grant a single Dataplex data role at the lake hierarchy (for example, a lake), and Dataplex maintains the specified access to data on all resources connected to that lake (for example, Cloud Storage buckets and BigQuery datasets are referred to by assets in the underlying zones).
For example, granting a principal the
dataplex.dataWriter role for a lake
gives the principal write access to all data within the lake, its
underlying zones and assets. Data access roles granted at a lower level (zone)
are inherited in the lake hierarchy to the underlying assets.
|Dataplex Data Reader
|Ability to read data from storage attached to assets, including storage buckets and BigQuery datasets (and their contents). *|
|Dataplex Data Writer
|Ability to write to the underlying resources pointed to by the asset. *|
|Dataplex Data Owner
|Grants the Owner role to the underlying resources, including the ability to manage child resources. For example, as the Data Owner of a BigQuery dataset, you can manage the underlying tables.|
Secure your lake
You can secure and manage access to your lake, and the data attached to it. In the Google Cloud console, use either of the following views:
- The Dataplex Manage view, under the Permissions tab, or
- The Dataplex Secure view
Using the Manage view
The Permissions tab allows you to manage all the permissions on a lake resource, and presents an unfiltered view of all the permissions, including those inherited.
To secure your lake, follow these steps:
Go to Dataplex in the Google Cloud console.
Navigate to the Manage view.
Click the name of the lake you created.
Click the Permissions tab.
Click the Roles tab.
Click Add to add a new role. Add the Dataplex Data Reader, Data Writer, and Data Owner roles.
Verify that the Dataplex Data Reader, Data Writer, and Data Owner roles appear.
Using the Secure view
The Dataplex Secure view in the Google Cloud console provides the following:
- A simple, filterable view of only the Dataplex roles that are centered on a specific resource.
- Separate data roles from lake resource roles.
After you specify your security policy, Dataplex propagates the permissions to the IAM policies of the managed resources.
The security policy configured at the lake level is propagated to all the resources managed within that lake. Dataplex provides propagation status and visibility into these large scale propagations on the Dataplex Manage > Permissions tab. It continuously monitors the managed resources for any changes to IAM policy outside of Dataplex.
Users that already have permissions on a resource continue to have them after a resource gets attached to a Dataplex lake. Similarly, non-Dataplex role bindings that are created or updated after attaching the resource to Dataplex stay the same.
Set column-level, row-level, and table-level policies
Cloud Storage bucket assets have associated BigQuery external tables attached to them.
Metadata primarily refers to schema information associated with user data present in resources managed by a lake.
Dataplex Discovery examines the data in managed resources and extracts tabular schema information. These tables are published to BigQuery, Dataproc Metastore, and Data Catalog systems.
Each discovered table has an associated table registered in BigQuery. For each zone, there is an associated BigQuery dataset under which all the external tables associated with tables discovered in that data zone are registered.
The discovered Cloud Storage-hosted tables are registered under the dataset created for the zone.
Databases and tables are made available in the Dataproc metastore associated with the Dataplex lake instance. Each data zone has an associated database, and each asset can have one or more associated tables.
The data in a Dataproc Metastore service is secured by configuring your VPC-SC network. The Dataproc Metastore instance is provided to Dataplex during lake creation, which already makes it a user-managed resource.
Each discovered table has an associated entry in Data Catalog, to enable search and discovery.
Since Data Catalog requires IAM policy names
during entry creation, Dataplex provides the IAM
policy name of the Dataplex asset resource that the entry should
be associated with. As a result, the permissions on the Dataplex
entry are driven by the permissions on the asset resource. You'll have to
grant the Dataplex Metadata Reader role (
the Dataplex Metadata Writer role (
the asset resource.