Overview
The Secret Manager can safeguard your sensitive data, such as your API keys, passwords, and certificates. You can use it to manage, access, and audit your secrets across Google Cloud.
You can configure a Dataproc cluster or job to use a Secret Manager secret by using the the Secret Manager Credential Provider.
Availability
This feature is available for use with Dataproc clusters created with image versions 2.0.97+, 2.1.41+, 2.2.6+, or later major Dataproc image versions.
Terminology
The following table describes terms used in this document.
Term | Description |
---|---|
Secret |
A Secret Manager secret is a global project object that contains a collection of metadata and secret versions. You can store, manage, and access secrets as binary blobs or text strings. |
Credential |
In Hadoop and other Dataproc-hosted applications, a credential consists of a credential name (ID) and credential value (password). A credential ID and value map to a Secret Manager secret ID and secret value (secret version). |
Usage
You can configure Hadoop and other OSS components to work with the Secret Manager by setting the following properties when you create a Dataproc cluster or submit a job.
Provider path (required): The provider path property,
hadoop.security.credential.provider.path
, is a comma-separated list of one or more credential provider URIs that is traversed to resolve a credential.--properties=hadoop.security.credential.provider.path=gsm://projects/PROJECT_ID
- The
scheme
in the provider path indicates the credential provider type. Hadoop schemes includejceks://
,user://
,localjceks://
. Use thegsm://
scheme to search for credentials in Secret Manager.
- The
Substitute dot operator (optional): The Secret Manager doesn't support the dot(
.
) operator in secret names, but OSS component credential keys can contain this operator. When this property is set totrue
. you can replace dot(.
)s with hyphen(-
)s in credential names. For example, when this property is set totrue
, you can specify the credential namea.b.c
asa-b-c
when passing it to Secret Manager.--properties=hadoop.security.credstore.google-secret-manager.secret-id.substitute-dot-operator=true
Secret version (optional): Secrets in Secret Manager can have multiple versions (values). Use this property to access a secret version. By default, Secret Manager accesses the
LATEST
version, which resolves to the latest value of the secret at runtime. A best practice is to define this property for stable access in production environments. For information on creating a secret, see Create and access a secret using Secret Manager and Hadoop credential commands.--properties=hadoop.security.credstore.google-secret-manager.secret-version=1
Create a Dataproc cluster with Secret Manager Credential Provider
- Run the following command locally or in Cloud Shell to
create a Dataproc cluster
with the required properties.
gcloud dataproc clusters create CLUSTER_NAME \ --region=REGION \ --properties="hadoop:hadoop.security.credential.provider.path=gsm://projects/PROJECT_ID,hadoop:hadoop.security.credstore.google-secret-manager.secret-id.substitute-dot-operator=true" \ ...other flags as needed...
Notes:
- CLUSTER_NAME: The name of the new cluster.
- REGION: A Compute Engine region where your workload will run.
- PROJECT_ID: Your project ID is listed in the Project info section of the Google Cloud console dashboard.
Submit a Dataproc with Secret Manager Credential Provider
Run the following command locally or in Cloud Shell to submit a Dataproc job with the required properties.
gcloud dataproc jobs submit CLUSTER_NAME \ --region=REGION \ --properties="hadoop.security.credential.provider.path=gsm://projects/PROJECT_ID,hadoop.security.credstore.google-secret-manager.secret-id.substitute-dot-operator=true" \ ...other flags as needed... \ -- job-args
Notes:
- CLUSTER_NAME: The name of the cluster that will run the job.
- REGION: A Compute Engine region where your workload will run.
- PROJECT_ID: Your project ID is listed in the Project info section of the Google Cloud console dashboard.
Use Secret Manager with Hive Metastore
The Hive Metastore property, javax.jdo.option.ConnectionPassword
, contains
the password used to authenticate access to the metastore database. The
password is saved in text format in hive-site.xml
, which represents
a security risk. A production best-practice is to store the password
in Secret Manager, then update the hive-site.xml
config
file to allow the Hive metastore service to read the password
from Secret Manager.
The following examples show you how to use Secret Manager in different Hive Metastore scenarios.
Create a cluster with a local metastore
- Run the following command locally or in Cloud Shell to
create a Dataproc cluster.
gcloud dataproc clusters create CLUSTER_NAME \ --region=REGION \ ...other flags as needed... \
Notes:
- CLUSTER_NAME: The name of the new cluster.
- REGION: A Compute Engine region where your workload will run.
Create a secret using the Secret Manager or the
hadoop credential
command.Alternative 1: Use the Secret Manager to create a secret.
- Secret name:
/projects/PROJECT_ID/secrets/javax-jdo-option-ConnectionPassword/versions/1
- Secret value:
METASTORE_PASSWORD
.
- Secret name:
Alternative 2: Use the
hadoop credential
command to create a secret.sudo hadoop credential create javax-jdo-option-ConnectionPassword -provider gsm://projects/PROJECT_ID -v METASTORE_PASSWORD
- METASTORE_PASSWORD: Since the Secret Manager does
not support the dot(
.
) operator, substitute any dot(.
)s in the password with hyphen(-
)s.
- METASTORE_PASSWORD: Since the Secret Manager does
not support the dot(
Verify that the secret exists.
sudo hadoop credential list -provider gsm://projects/PROJECT_ID
Use a text editor to remove the
javax.jdo.option.ConnectionPassword
property from thehive-site.xml
file, and then add thehadoop.security.credential.provider.path
&hadoop.security.credstore.google-secret-manager.secret-id.substitute-dot-operator
properties to the file.Example properties:
hadoop.security.credential.provider.path=gsm://projects/PROJECT_ID hadoop.security.credstore.google-secret-manager.secret-id.substitute-dot-operator=true
Restart Hive Metastore.
sudo systemctl restart hive-metastore
Create a cluster with an external Hive Metastore
Run the following command locally or in Cloud Shell to create a Dataproc cluster with the following cluster properties. Use this cluster as an external Hive metastore for Hive jobs that run from other Dataproc clusters for Spark Hive workloads.
gcloud dataproc clusters create METASTORE_CLUSTER_NAME \ --region=REGION \ --properties=core:fs.defaultFS=gs://METASTORE_CLUSTER_PROXY_BUCKET,dataproc:dataproc.components.deactivate="hdfs hive-server2 hive-metastore" \ ...other flags as needed...
Create a secret using the Secret Manager or the
hadoop credential
command.- Alternative 1: Use the Secret Manager to create a secret.
- Secret name:
/projects/PROJECT_ID/secrets/javax-jdo-option-ConnectionPassword/versions/1
- Secret value:
METASTORE_PASSWORD
.
- Secret name:
- Alternative 2: Use the
hadoop credential
command to create a secret.sudo hadoop credential create javax-jdo-option-ConnectionPassword -provider gsm://projects/PROJECT_ID -v METASTORE_PASSWORD
- METASTORE_PASSWORD: Since the Secret Manager does
not support the dot(
.
) operator, substitute dot(.
)s in the password with hyphen(-
)s.
- METASTORE_PASSWORD: Since the Secret Manager does
not support the dot(
- Alternative 1: Use the Secret Manager to create a secret.
Verify that the secret exists.
sudo hadoop credential list -provider gsm://projects/PROJECT_ID
Create a cluster to run Hive jobs that connect to an external metastore
- Run the following command locally or in Cloud Shell to
create a Dataproc cluster
with the following cluster properties.
Use this cluster to run Hive jobs that connect to the external metastore
on another Dataproc cluster.
gcloud dataproc clusters create CLUSTER_NAME \ --region=REGION \ --properties="hive:javax.jdo.option.ConnectionURL=jdbc:mysql://METASTORE_CLUSTER_NAME-m/metastore,hive:hadoop.security.credential.provider.path=gsm://projects/PROJECT_ID,hive:hadoop.security.credstore.google-secret-manager.secret-id.substitute-dot-operator=true" \ ...other flags as needed...
Hadoop credential commands
You can use SSH
to connect to the Dataproc master node
to use hadoop credential
command to create, list, and manage secrets.
hadoop credential
commands use the following format:
hadoop credential SUBCOMMAND OPTIONS
.
In the following examples, the -provider
flag is added to specify the
provider type and location (the provider store). The gsm://
scheme specifies
the Secret Manager.
Create a secret with the specified secret ID. The command doesn't create the secret if the specified secret ID exists. This behavior is consistent with the Hadoop
CredentialProvider
API.hadoop credential create SECRET_ID -provider gsm://projects/PROJECT_ID -v VALUE
List secrets stored in a project.
hadoop credential list -provider gsm://projects/PROJECT_ID
Check if a secret exists in a project with a specified value.
hadoop credential check SECRET_ID -provider gsm://projects/PROJECT_ID -v VALUE
Check for a specific secret version in a config file.
hadoop credential conf CONFIG_FILE check SECRET_ID -provider gsm://projects/PROJECT_ID -v VALUE
CONFIG_FILE: The XML file that sets
hadoop.security.credstore.google-secret-manager.secret-version
.Delete all versions of a secret in a project.
hadoop credential delete SECRET_ID -provider gsm://projects/ PROJECT_ID
See the Hadoop Commands Guide for more information.
For more information
- Explore the Hive Documentation.