Create a lake

This guide shows you how to create a Dataplex lake, using the Google Cloud console, gcloud CLI, or the lakes.create API method.

You can create your lake in any of the regions that support Dataplex.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Dataplex, Dataproc, Dataproc Metastore, Data Catalog, BigQuery, and Cloud Storage. APIs.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project.

  7. Enable the Dataplex, Dataproc, Dataproc Metastore, Data Catalog, BigQuery, and Cloud Storage. APIs.

    Enable the APIs

Access control

  1. Make sure you have the predefined roles roles/dataplex.admin or roles/dataplex.editor granted to you so that you can create and manage your lake. Follow the steps in the IAM documentation for granting roles.

  2. To attach a Cloud Storage bucket from another project to your lake, grant the following Dataplex service account an administrator role on the bucket by running the following command:

    gcloud alpha dataplex lakes authorize \
    --project PROJECT_ID_OF_LAKE \
    --storage-bucket-resource BUCKET_NAME
    

Create a metastore

You can access Dataplex metadata using Hive Metastore in Spark queries by associating a Dataproc Metastore service instance with your Dataplex lake. You need to have a gRPC-enabled Dataproc Metastore (version 3.1.2 or higher) associated with the Dataplex lake.

  1. Create a Dataproc Metastore service.

  2. Configure the Dataproc Metastore service instance to expose a gRPC endpoint (instead of the default Thrift Metastore endpoint). Run the following update API request:

    curl -X PATCH \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://metastore.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/services/SERVICE_ID?updateMask=hiveMetastoreConfig.endpointProtocol" \
    -d '{"hiveMetastoreConfig": {"endpointProtocol": "GRPC"}}'
    
  3. View the gRPC endpoint. Run the following command:

    gcloud metastore services describe SERVICE_ID \
      --project PROJECT_ID \
      --location LOCATION \
      --format "value(endpointUri)"
    

Create a Dataplex lake

The following steps show you how to create a Dataplex lake.

Console

  1. Go to Dataplex in the Google Cloud console.

    Go to Dataplex

  2. Navigate to the Manage view.

  3. Click Create.

  4. Enter a Display name.

  5. The lake ID is automatically generated for you. If you prefer, you can provide your own ID. See Resource naming convention.

  6. Optional: Enter a Description.

  7. Specify the Region in which to create the lake.

    For lakes created in a given region (for example, us-central1), both single-region (us-central1) data and multi-region (us multi-region) data can be attached depending on the zone settings.

  8. Optional: Add labels to your lake.

  9. Optional: In the Metastore section, click the Metastore service drop-down, and select the service you created in the Before you begin section.

  10. Click Create.

gcloud

Use the following gcloud preview dataplex lake create command to create a lake:

gcloud alpha dataplex lakes create LAKE \
 --location=LOCATION \
 --labels=k1=v1,k2=v2,k3=v3 \
 --metastore-service=METASTORE_SERVICE

Replace the following:

  • LAKE: The name of the new lake.
  • LOCATION: Refers to a Google Cloud region.
  • k1=v1,k2=v2,k3=v3: The labels used (if any).
  • METASTORE_SERVICE: The Dataproc Metastore service, if one was created.

REST

Follow the API instructions to create a lake by using the APIs Explorer.

What's next?