Add a zone

This document describes what Dataplex zones are and how to add them to your Dataplex lake.

Overview

Dataplex zones are named entities within a Dataplex lake. They are logical groupings of unstructured, semi-structured, and structured data, consisting of multiple assets, such as Cloud Storage buckets, BigQuery datasets, and BigQuery tables.

A lake can include one or more zones. While a zone can only be part of one lake, it might contain assets that point to resources that are part of projects outside of its parent project.

You can select configurations for a zone in Dataplex. There are two types of zones that you can choose from: raw and curated.

Raw zones

Raw zones store structured data, semi-structured data such as CSV files and JSON files, and unstructured data in any format from external sources. Raw zones are useful for staging raw data before performing any transformations. Data can be stored in Cloud Storage buckets or BigQuery datasets.

Raw zones support bucket-level or dataset-level granularity for read and write permissions. There are no restrictions on the type of data that can be stored in raw zones.

Curated zones

Curated zones store structured data. Data can be stored in Cloud Storage buckets or BigQuery datasets.

Supported formats for Cloud Storage buckets include Parquet, Avro, and ORC. Curated zones are useful for staging data that requires processing before being used for analysis, or for serving data that is ready for analysis.

For BigQuery tables, you must have a well-defined schema and Hive-style partitions. When you provide a schema for a given table in a curated zone, the data should conform to the schema defined for the table without schema drift. This means that the data should be compatible with the schema defined for the table, and new partitions shouldn't have a schema that conflicts with the table schema.

Curated zones support Cloud Storage bucket-level or BigQuery dataset-level granularity for read and write permissions.

Before you begin

Before you can add zones to a lake, you must have a lake. If you haven't already, create a lake.

Most gcloud lake commands require a location. You can specify the location by setting the --location parameter.

Required roles

To get the permission that you need to add a zone, ask your administrator to grant you the Dataplex Administrator (roles/dataplex.admin) IAM role on project. For more information about granting roles, see Manage access to projects, folders, and organizations.

This predefined role contains the dataplex.lakes.create permission, which is required to add a zone.

You might also be able to get this permission with custom roles or other predefined roles.

Add a zone

You can add multiple zones to your lake. You can add one zone at a time but still use your lake while the zone is being created.

To add a zone to an existing lake, follow these steps:

Console

In the Google Cloud console, go to Dataplex.

Go to Dataplex
Navigate to the Manage view.
In the Manage view, click the name of the lake you'd like to add a zone to.
In the Zones tab, click Add zone.
Enter a Display name for your zone.

Note: The zone ID is automatically generated for you. You can also provide your own ID. Choose a meaningful ID, because it's used in creating dataset and database names.
Click the Type menu. Choose Raw Zone or Curated Zone. Learn more about supported zone types.
Optional: Enter a description.
Under Data locations, select either Regional or Multi-regional. What you choose cannot be changed later. Single region and multi-region data cannot be mixed in the same zone.
Optional: Enable metadata discovery, which lets Dataplex to automatically scan and extract metadata from the data in your zone:
1. Click Discovery settings.
2. Make sure Enable metadata discovery is selected.
3. Optional: Under Include patterns, list the files to include in the discovery scans.
4. Optional: Under Exclude patterns, list the files to exclude in the discovery scans. If you enter both include and exclude patterns, exclude patterns are applied first.
5. Click the Repeats menu and select a frequency. If you select Custom, in the Schedule field, enter a job schedule. Otherwise, the Schedule value is automatically filled for you.
6. Click the Timezone menu and select a timezone.
Click Create.

REST

To add a zone, use the lakes.zones.create method.

It might take a few minutes for the zone to be created.

When the zone creation succeeds, the zone automatically enters active state. If it fails, then the lake is rolled back to its previous state.

After you create your zone, you can map data stored in Cloud Storage buckets and BigQuery datasets as assets to your zone. For more information, see Add an asset.

What's next

Learn how to manage buckets.
Learn how to create a lake.
Learn more about Cloud Audit Logs.