Secret Manager Credential Provider

Overview

The Secret Manager can safeguard your sensitive data, such as your API keys, passwords, and certificates. You can use it to manage, access, and audit your secrets across Google Cloud.

You can configure a Dataproc cluster or job to use a Secret Manager secret by using the the Secret Manager Credential Provider.

Availability

This feature is available for use with Dataproc clusters created with image versions 2.0.97+, 2.1.41+, 2.2.6+, or later major Dataproc image versions.

Terminology

The following table describes terms used in this document.

Term Description
Secret A Secret Manager secret is a global project object that contains a collection of metadata and secret versions. You can store, manage, and access secrets as binary blobs or text strings.
Credential In Hadoop and other Dataproc-hosted applications, a credential consists of a credential name (ID) and credential value (password). A credential ID and value map to a Secret Manager secret ID and secret value (secret version).

Usage

You can configure Hadoop and other OSS components to work with the Secret Manager by setting the following properties when you create a Dataproc cluster or submit a job.

  • Provider path (required): The provider path property, hadoop.security.credential.provider.path, is a comma-separated list of one or more credential provider URIs that is traversed to resolve a credential.

    --properties=hadoop.security.credential.provider.path=gsm://projects/PROJECT_ID
    
    • The scheme in the provider path indicates the credential provider type. Hadoop schemes include jceks://, user://,localjceks://. Use the gsm:// scheme to search for credentials in Secret Manager.
  • Substitute dot operator (optional): The Secret Manager doesn't support the dot(.) operator in secret names, but OSS component credential keys can contain this operator. When this property is set to true. you can replace dot(.)s with hyphen(-)s in credential names. For example, when this property is set to true, you can specify the credential name a.b.c as a-b-c when passing it to Secret Manager.

    --properties=hadoop.security.credstore.google-secret-manager.secret-id.substitute-dot-operator=true
    
  • Secret version (optional): Secrets in Secret Manager can have multiple versions (values). Use this property to access a secret version. By default, Secret Manager accesses the LATEST version, which resolves to the latest value of the secret at runtime. A best practice is to define this property for stable access in production environments. For information on creating a secret, see Create and access a secret using Secret Manager and Hadoop credential commands.

    --properties=hadoop.security.credstore.google-secret-manager.secret-version=1
    

Create a Dataproc cluster with Secret Manager Credential Provider

  1. Run the following command locally or in Cloud Shell to create a Dataproc cluster with the required properties.
    gcloud dataproc clusters create CLUSTER_NAME \
        --region=REGION \
        --properties="hadoop:hadoop.security.credential.provider.path=gsm://projects/PROJECT_ID,hadoop:hadoop.security.credstore.google-secret-manager.secret-id.substitute-dot-operator=true" \
        ...other flags as needed...
    

Notes:

Submit a Dataproc with Secret Manager Credential Provider

  1. Run the following command locally or in Cloud Shell to submit a Dataproc job with the required properties.

    gcloud dataproc jobs submit CLUSTER_NAME \
        --region=REGION \
        --properties="hadoop.security.credential.provider.path=gsm://projects/PROJECT_ID,hadoop.security.credstore.google-secret-manager.secret-id.substitute-dot-operator=true" \
        ...other flags as needed... \
        -- job-args
    

Notes:

Use Secret Manager with Hive Metastore

The Hive Metastore property, javax.jdo.option.ConnectionPassword, contains the password used to authenticate access to the metastore database. The password is saved in text format in hive-site.xml, which represents a security risk. A production best-practice is to store the password in Secret Manager, then update the hive-site.xml config file to allow the Hive metastore service to read the password from Secret Manager.

The following examples show you how to use Secret Manager in different Hive Metastore scenarios.

Create a cluster with a local metastore

  1. Run the following command locally or in Cloud Shell to create a Dataproc cluster.
    gcloud dataproc clusters create CLUSTER_NAME \
        --region=REGION \
        ...other flags as needed... \
    

Notes:

  • CLUSTER_NAME: The name of the new cluster.
  • REGION: A Compute Engine region where your workload will run.
  1. Create a secret using the Secret Manager or the hadoop credential command.

    • Alternative 1: Use the Secret Manager to create a secret.

      • Secret name: /projects/PROJECT_ID/secrets/javax-jdo-option-ConnectionPassword/versions/1
      • Secret value: METASTORE_PASSWORD.
    • Alternative 2: Use the hadoop credential command to create a secret.

      sudo hadoop credential create javax-jdo-option-ConnectionPassword -provider gsm://projects/PROJECT_ID -v METASTORE_PASSWORD
      

      • METASTORE_PASSWORD: Since the Secret Manager does not support the dot(.) operator, substitute any dot(.)s in the password with hyphen(-)s.
  2. Verify that the secret exists.

    sudo hadoop credential list -provider gsm://projects/PROJECT_ID
    

  3. Use a text editor to remove the javax.jdo.option.ConnectionPassword property from the hive-site.xml file, and then add the hadoop.security.credential.provider.path & hadoop.security.credstore.google-secret-manager.secret-id.substitute-dot-operator properties to the file.

    Example properties:

    hadoop.security.credential.provider.path=gsm://projects/PROJECT_ID
    hadoop.security.credstore.google-secret-manager.secret-id.substitute-dot-operator=true
    

  4. Restart Hive Metastore.

    sudo systemctl restart hive-metastore
    

Create a cluster with an external Hive Metastore

  1. Run the following command locally or in Cloud Shell to create a Dataproc cluster with the following cluster properties. Use this cluster as an external Hive metastore for Hive jobs that run from other Dataproc clusters for Spark Hive workloads.

    gcloud dataproc clusters create METASTORE_CLUSTER_NAME \
        --region=REGION \
        --properties=core:fs.defaultFS=gs://METASTORE_CLUSTER_PROXY_BUCKET,dataproc:dataproc.components.deactivate="hdfs hive-server2 hive-metastore" \
        ...other flags as needed...
    
  2. Create a secret using the Secret Manager or the hadoop credential command.

    • Alternative 1: Use the Secret Manager to create a secret.
      • Secret name: /projects/PROJECT_ID/secrets/javax-jdo-option-ConnectionPassword/versions/1
      • Secret value: METASTORE_PASSWORD.
    • Alternative 2: Use the hadoop credential command to create a secret.
      sudo hadoop credential create javax-jdo-option-ConnectionPassword -provider gsm://projects/PROJECT_ID -v METASTORE_PASSWORD
      
      • METASTORE_PASSWORD: Since the Secret Manager does not support the dot(.) operator, substitute dot(.)s in the password with hyphen(-)s.
  3. Verify that the secret exists.

    sudo hadoop credential list -provider gsm://projects/PROJECT_ID
      

Create a cluster to run Hive jobs that connect to an external metastore

  1. Run the following command locally or in Cloud Shell to create a Dataproc cluster with the following cluster properties. Use this cluster to run Hive jobs that connect to the external metastore on another Dataproc cluster.
    gcloud dataproc clusters create CLUSTER_NAME \
        --region=REGION \
        --properties="hive:javax.jdo.option.ConnectionURL=jdbc:mysql://METASTORE_CLUSTER_NAME-m/metastore,hive:hadoop.security.credential.provider.path=gsm://projects/PROJECT_ID,hive:hadoop.security.credstore.google-secret-manager.secret-id.substitute-dot-operator=true" \
        ...other flags as needed...
    

Hadoop credential commands

You can use SSH to connect to the Dataproc master node to use hadoop credential command to create, list, and manage secrets.

hadoop credential commands use the following format: hadoop credential SUBCOMMAND OPTIONS. In the following examples, the -provider flag is added to specify the provider type and location (the provider store). The gsm:// scheme specifies the Secret Manager.

  • Create a secret with the specified secret ID. The command doesn't create the secret if the specified secret ID exists. This behavior is consistent with the Hadoop CredentialProvider API.

    hadoop credential create SECRET_ID -provider gsm://projects/PROJECT_ID -v VALUE
    

  • List secrets stored in a project.

    hadoop credential list -provider gsm://projects/PROJECT_ID
    

  • Check if a secret exists in a project with a specified value.

    hadoop credential check SECRET_ID -provider gsm://projects/PROJECT_ID -v VALUE
    

  • Check for a specific secret version in a config file.

    hadoop credential conf CONFIG_FILE check SECRET_ID -provider gsm://projects/PROJECT_ID -v VALUE
    
  • CONFIG_FILE: The XML file that sets hadoop.security.credstore.google-secret-manager.secret-version.

  • Delete all versions of a secret in a project.

    hadoop credential delete SECRET_ID -provider gsm://projects/ PROJECT_ID
    

See the Hadoop Commands Guide for more information.

For more information