Secret Manager Credential Provider

This document describes how to use Secret Manager as a credential store with Dataproc Serverless to safely store and access sensitive data processed by serverless workloads.

Overview

The Secret Manager can safeguard your sensitive data, such as your API keys, passwords, and certificates. You can use it to manage, access, and audit your secrets across Google Cloud.

When you run a Dataproc Serverless batch workload, you can configure it to use a Secret Manager secret by using the Dataproc Secret Manager Credential Provider.

Availability

This feature is available for Dataproc Serverless for Spark runtime versions 1.2.29+, 2.2.29+, or later major runtime versions.

Terminology

The following table describes the terms used in this document.

Term Description
Secret A Secret Manager secret is a global project object that contains a collection of metadata and secret versions. You can store, manage, and access secrets as binary blobs or text strings.
Credential In Hadoop and other Dataproc workloads, a credential consists of a credential name (ID) and credential value (password). A credential ID and value map to a Secret Manager secret ID and secret value (secret version).

Usage

You can configure supported Hadoop and other OSS components to work with the Secret Manager by setting the following properties when you submit a Dataproc Serverless workload:

  • Provider path (required): The provider path property, hadoop.security.credential.provider.path, is a comma-separated list of one or more credential provider URIs that is traversed to resolve a credential.

    --properties=hadoop.security.credential.provider.path=gsm://projects/PROJECT_ID
    
    • The scheme in the provider path indicates the credential provider type. Hadoop schemes include jceks://, user://,localjceks://. Use the gsm:// scheme to search for credentials in Secret Manager.
  • Substitute dot operator (optional): The Secret Manager doesn't support the dot(.) operator in secret names, but OSS component credential keys can contain this operator. When this property is set to true. you can replace dot(.)s with hyphen(-)s in credential names. For example, when this property is set to true, you can specify the credential name a.b.c as a-b-c when passing it to Secret Manager.

    --properties=hadoop.security.credstore.google-secret-manager.secret-id.substitute-dot-operator=true
    
  • Secret version (optional): Secrets in Secret Manager can have multiple versions (values). Use this property to access a secret version. By default, Secret Manager accesses the LATEST version, which resolves to the latest value of the secret at runtime. A best practice is to define this property for stable access in production environments. For information on creating a secret, see Create and access a secret using Secret Manager.

    --properties=hadoop.security.credstore.google-secret-manager.secret-version=1
    

Run a batch workload with Secret Manager Credential Provider

To submit a batch workload that uses Secret Manager Credential Provider, run the following command locally or in Cloud Shell.

gcloud dataproc batches submit spark \
    --region=REGION \
    --jars=JARS \
    --class=MAIN_CLASS \
    --properties="spark.hive.hadoop.security.credential.provider.path=gsm://projects/PROJECT_ID,spark.hive.hadoop.security.credstore.google-secret-manager.secret-id.substitute-dot-operator=true" \
    ...other flags as needed...

Replace the following: