Secure your lake

The Dataplex security model allows you to manage who has access to perform the following tasks:

  • Administering a lake (creating and attaching assets, zones, and additional lakes)
  • Accessing data connected to a lake through the mapping asset (Google Cloud resources such as Cloud Storage buckets and BigQuery datasets)
  • Accessing metadata about the data connected to a lake

An administrator for a lake controls access to Dataplex resources (lake, zone, and assets) by granting the following basic and predefined roles.

Basic roles

Role Description
Dataplex Viewer
(roles/dataplex.viewer)
Ability to view (but not edit) the lake and its configured zones and assets.
Dataplex Editor
(roles/dataplex.editor)
Ability to edit the lake. Can create and configure lakes, zones, assets, and tasks.
Dataplex Administrator
(roles/dataplex.administrator)
Ability to fully administer a lake.
Dataplex Developer
(roles/dataplex.developer)
Ability to run data analytics workloads on a lake. *
* To query a BigQuery table, you need the permission to run a BigQuery job. Set this permission in the project you want attributed or charged for the compute spend of the job. For more information, see BigQuery predefined roles and permissions.
To run a Spark job, create Dataproc clusters and submit Dataproc jobs in the project to which you want the compute attributed.

Pre-defined roles

Google Cloud manages the following roles, that provide granular access for Dataplex.

Metadata roles

Metadata roles have the ability to view metadata, such as table schemas.

Role Description
Dataplex Metadata Writer
(roles/dataplex.metadataWriter)
Ability to update the metadata of a certain resource.
Dataplex Metadata Reader
(roles/dataplex.metadataReader)
Ability to read the metadata (for example, to query a table).

Data roles

Granting data roles to a principal gives them the ability to read or write data in the underlying resources pointed to by the assets of the lake.

Dataplex maps its roles to the data roles for each underlying storage resource (Cloud Storage, BigQuery).

Dataplex translates and propagates Dataplex data roles to the underlying storage resource, setting the correct roles for each storage resource. The benefit is that you can grant a single Dataplex data role at the lake hierarchy (for example, a lake), and Dataplex maintains the specified access to data on all resources connected to that lake (for example, Cloud Storage buckets and BigQuery datasets are referred to by assets in the underlying zones).

For example, granting a principal the dataplex.dataWriter role for a lake gives the principal write access to all data within the lake, its underlying zones and assets. Data access roles granted at a lower level (zone) are inherited in the lake hierarchy to the underlying assets.

Role Description
Dataplex Data Reader
(roles/dataplex.dataReader)
Ability to read data from storage attached to assets, including storage buckets and BigQuery datasets (and their contents). *
Dataplex Data Writer
(roles/dataplex.dataWriter)
Ability to write to the underlying resources pointed to by the asset. *
Dataplex Data Owner
(roles/dataplex.dataOwner)
Grants the Owner role to the underlying resources, including the ability to manage child resources. For example, as the Data Owner of a BigQuery dataset, you can manage the underlying tables.

Secure your lake

You can secure and manage access to your lake, and the data attached to it. In the Google Cloud console, use either of the following views:

  • The Dataplex Manage view, under the Permissions tab, or
  • The Dataplex Secure view

Using the Manage view

The Permissions tab allows you to manage all the permissions on a lake resource, and presents an unfiltered view of all the permissions, including those inherited.

To secure your lake, follow these steps:

  1. Go to Dataplex in the Google Cloud console.

    Go to Dataplex

  2. Navigate to the Manage view.

  3. Click the name of the lake you created.

  4. Click the Permissions tab.

  5. Click the View by Roles tab.

  6. Click Add to add a new role. Add the Dataplex Data Reader, Data Writer, and Data Owner roles.

  7. Verify that the Dataplex Data Reader, Data Writer, and Data Owner roles appear.

Using the Secure view

The Dataplex Secure view in the Google Cloud console provides the following:

  • A simple, filterable view of only the Dataplex roles that are centered on a specific resource.
  • Separate data roles from lake resource roles.
Example of data permissions that are not inherited from higher lake resources
Figure 1: In this example of a lake, both principals have data permissions on the asset called Cloud Storage data (GCS data). These permissions aren't inherited from higher lake resources.


Example of permissions that are not inherited from higher lake resources
Figure 2: This example shows:
  1. A service account that inherits the Dataplex Administrator role from the project.
  2. Principals (email address) that inherit Dataplex Editor and Viewer roles from the project. These are the roles that apply to all resources.
  3. A principal (email address) that inherits the Dataplex Administrator role from the project.

Policy management

After you specify your security policy, Dataplex propagates the permissions to the IAM policies of the managed resources.

The security policy configured at the lake level is propagated to all the resources managed within that lake. Dataplex provides propagation status and visibility into these large scale propagations on the Dataplex Manage > Permissions tab. It continuously monitors the managed resources for any changes to IAM policy outside of Dataplex.

Users that already have permissions on a resource continue to have them after a resource gets attached to a Dataplex lake. Similarly, non-Dataplex role bindings that are created or updated after attaching the resource to Dataplex stay the same.

Set column-level, row-level, and table-level policies

Cloud Storage bucket assets have associated BigQuery external tables attached to them.

You can upgrade a Cloud Storage bucket asset, which means that Dataplex removes the attached external tables and attaches BigLake tables instead.

You can use BigLake tables instead of external tables to give you fine-grained access control, including row-level controls, column-level controls, and column data masking.

Metadata security

Metadata primarily refers to schema information associated with user data present in resources managed by a lake.

Dataplex Discovery examines the data in managed resources and extracts tabular schema information. These tables are published to BigQuery, Dataproc Metastore, and Data Catalog systems.

BigQuery

Each discovered table has an associated table registered in BigQuery. For each zone, there is an associated BigQuery dataset under which all the external tables associated with tables discovered in that data zone are registered.

The discovered Cloud Storage-hosted tables are registered under the dataset created for the zone.

Dataproc Metastore

Databases and tables are made available in the Dataproc Metastore associated with the Dataplex lake instance. Each data zone has an associated database, and each asset can have one or more associated tables.

The data in a Dataproc Metastore service is secured by configuring your VPC-SC network. The Dataproc Metastore instance is provided to Dataplex during lake creation, which already makes it a user-managed resource.

Data Catalog

Each discovered table has an associated entry in Data Catalog, to enable search and discovery.

Since Data Catalog requires IAM policy names during entry creation, Dataplex provides the IAM policy name of the Dataplex asset resource that the entry should be associated with. As a result, the permissions on the Dataplex entry are driven by the permissions on the asset resource. Grant the Dataplex Metadata Reader role (roles/dataplex.metadataReader) and the Dataplex Metadata Writer role (roles/dataplex.metadataWriter) on the asset resource.

What's next?