Add a zone

This page introduces zones and explains how to add zones to your Dataplex lake.

Dataplex zone concepts

Data zones are named entities within a Dataplex lake. They are logical groupings of unstructured, semi-structured, and structured data, consisting of multiple assets, such as Cloud Storage buckets, BigQuery datasets, and BigQuery tables.

A lake can include one or more zones. While a zone can only be part of one lake, it may contain assets that point to resources that are part of projects outside of its parent project.

You can select configurations for a zone in Dataplex. There are two types of zones that you can choose from: raw and curated zones.

Raw zones

Raw zones store structured data, semi-structured data such as CSV files and JSON files, and unstructured data in any format from external sources. This is useful for staging raw data before performing any transformations. Data can be stored in Cloud Storage buckets or BigQuery datasets.

Raw zones support bucket-level or dataset-level granularity for read and write permissions. For more information, see IAM and access control.

There are no restrictions on the type of data that can be stored in raw zones.

Curated zones

Curated zones store structured data. Data can be stored in Cloud Storage buckets or BigQuery datasets.

Supported formats for Cloud Storage buckets include Parquet, Avro, and ORC. This is useful for staging data that requires processing before it's used for analysis, or for serving data that is ready for analysis.

For BigQuery tables, you must have a well-defined schema and Hive-style partitions. When you provide a schema for a given table in a curated zone, the data should conform to the schema defined for the table without schema drift.

This means that the data should be compatible with the schema defined for the table, and new partitions shouldn't have a schema that conflicts with the table schema.

Curated zones support Cloud Storage bucket-level or BigQuery dataset-level granularity for read and write permissions. For more information, see Access control with IAM.

Before you begin

Before you can add zones to a lake, you must have a lake. If you haven't already, create a lake.

Most gcloud lake commands require a location. You can specify the location by setting the --location parameter.

Access control

  • To add a zone, you must be granted IAM roles containing the dataplex.lakes.create IAM permission. The Dataplex specific role roles/dataplex.admin can be used to grant add permissions.

For more information, see Dataplex Access control with IAM.

Add a zone

You can create and add a new zone to an existing lake by issuing the Dataplex API method lakes.zones.create or by adding a zone in the Google Cloud console.

You can add multiple zones to your lake. You can add one zone at a time but still use your lake while the zone is being created.

Console

  1. In the Google Cloud console, go to Dataplex:

    Go to Dataplex

  2. Navigate to the Manage view.

  3. In the Manage view, click the name of the lake you'd like to add a zone to.

  4. In the Zones tab, click Add zone.

  5. Enter a Display name for your zone.

  6. Click the Type drop-down. Choose Raw Zone or Curated Zone. Learn more about supported zone types.

  7. Optional: Enter a description.

  8. Under Data locations, select either Regional or Multi-regional. What you choose cannot be changed later. Single region and multi-region data cannot be mixed in the same zone.

  9. Optional: Enable metadata discovery, which allows Dataplex to automatically scan and extract metadata from the data in your zone:

    1. Click Discovery settings.

    2. Make sure Enable metadata discovery is selected.

    3. Optional: Under Include patterns, list the files to include in the discovery scans.

    4. Optional: Under Exclude patterns, list the files to exclude in the discovery scans. If you enter both include and exclude patterns, exclude patterns are applied first.

    5. Click the Repeats drop-down and select a frequency.

    6. Click the Timezone drop-down and select a timezone.

    7. If under Repeats you selected Custom, under Schedule, enter a job schedule. Otherwise, the Schedule value is automatically filled for you.

  10. Click Create.

It may take a few minutes for the zone to be created.

REST

Follow the API instructions to add a zone by using the APIs Explorer.

When the zone creation succeeds, the zone automatically enters active state. If it fails, then the lake is rolled back to its previous state.

After you create your zone, you can map data stored in Cloud Storage buckets and BigQuery datasets as assets in your zone.

What's next?