This document describes how to use Secret Manager as a credential store with Dataproc Serverless to safely store and access sensitive data processed by serverless workloads.
Overview
The Secret Manager can safeguard your sensitive data, such as your API keys, passwords, and certificates. You can use it to manage, access, and audit your secrets across Google Cloud.
When you run a Dataproc Serverless batch workload, you can configure it to use a Secret Manager secret by using the Dataproc Secret Manager Credential Provider.
Availability
This feature is available for Dataproc Serverless for Spark runtime versions 1.2.29+, 2.2.29+, or later major runtime versions.
Terminology
The following table describes the terms used in this document.
Term | Description |
---|---|
Secret |
A Secret Manager secret is a global project object that contains a collection of metadata and secret versions. You can store, manage, and access secrets as binary blobs or text strings. |
Credential |
In Hadoop and other Dataproc workloads, a credential consists of a credential name (ID) and credential value (password). A credential ID and value map to a Secret Manager secret ID and secret value (secret version). |
Usage
You can configure supported Hadoop and other OSS components to work with the Secret Manager by setting the following properties when you submit a Dataproc Serverless workload:
Provider path (required): The provider path property,
hadoop.security.credential.provider.path
, is a comma-separated list of one or more credential provider URIs that is traversed to resolve a credential.--properties=hadoop.security.credential.provider.path=gsm://projects/PROJECT_ID
- The
scheme
in the provider path indicates the credential provider type. Hadoop schemes includejceks://
,user://
,localjceks://
. Use thegsm://
scheme to search for credentials in Secret Manager.
- The
Substitute dot operator (optional): The Secret Manager doesn't support the dot(
.
) operator in secret names, but OSS component credential keys can contain this operator. When this property is set totrue
. you can replace dot(.
)s with hyphen(-
)s in credential names. For example, when this property is set totrue
, you can specify the credential namea.b.c
asa-b-c
when passing it to Secret Manager.--properties=hadoop.security.credstore.google-secret-manager.secret-id.substitute-dot-operator=true
Secret version (optional): Secrets in Secret Manager can have multiple versions (values). Use this property to access a secret version. By default, Secret Manager accesses the
LATEST
version, which resolves to the latest value of the secret at runtime. A best practice is to define this property for stable access in production environments. For information on creating a secret, see Create and access a secret using Secret Manager.--properties=hadoop.security.credstore.google-secret-manager.secret-version=1
Run a batch workload with Secret Manager Credential Provider
To submit a batch workload that uses Secret Manager Credential Provider, run the following command locally or in Cloud Shell.
gcloud dataproc batches submit spark \ --region=REGION \ --jars=JARS \ --class=MAIN_CLASS \ --properties="spark.hive.hadoop.security.credential.provider.path=gsm://projects/PROJECT_ID,spark.hive.hadoop.security.credstore.google-secret-manager.secret-id.substitute-dot-operator=true" \ ...other flags as needed...
Replace the following:
- REGION: a Compute Engine region where your workload runs
- JARS: workload jar path
- MAIN_CLASS: the Jar main class
- PROJECT_ID: your project ID, listed in the Project info section of the Google Cloud console dashboard