Cloud Dataproc Security Configuration

When you create a Cloud Dataproc cluster, you can enable Hadoop Secure Mode via Kerberos to provide multi-tenancy via user authentication, isolation, and encryption inside a Cloud Dataproc cluster.

User Authentication and Other Google Cloud Platform Services. Per-user authentication via Kerberos only applies within the cluster. Interactions with other Google Cloud Platform services, such as Cloud Storage, continue to be authenticated as the service account for the cluster.

Enabling Hadoop Secure Mode via Kerberos

Enabling Kerberos and Hadoop Secure Mode for a cluster will include the MIT distribution of Kerberos and configure Apache Hadoop YARN, HDFS, Hive, Spark, and related components to use it for authentication.

Enabling Kerberos creates an on-cluster Key Distribution Center (KDC), that contains service principals and a root principal. The root principal is the account with administrator permissions to the on-cluster KDC. It can also contain standard user principals or be connected via cross-realm trust to another KDC that contains the user principals.

You must provide a password for the Kerberos root principal, which is the account with administrator permissions to the on-cluster KDC. To provide the password securely, it should be encrypted with a Key Management Service (KMS) key, and stored in a Google Cloud Storage bucket that the cluster service account has access to. The cluster service account must be granted the cloudkms.cryptoKeyDecrypter IAM role.

Create a Kerberos cluster

You can use the gcloud command or the Cloud Dataproc API to enable Kerberos on clusters that use Cloud Dataproc image version 1.3 and later. See Supported Cloud Dataproc versions for the Kerberos version included in each Cloud Dataproc image release.

gcloud command

To create a kerberos Cloud Dataproc cluster (image version 1.3 and later), use the gcloud dataproc clusters create command.

gcloud dataproc clusters create cluster-name \
    --kerberos-root-principal-password-uri="Cloud Storage URI of KMS-encrypted password for Kerberos root principal" \
    --kerberos-kms-key="The URI of the KMS key used to decrypt the root password" \
    --image-version=1.3

Use a YAML (or JSON) config file. Instead of passing kerberos-*flags to the gcloud command as shown above, you can place kerberos settings in a YAML (or JSON) config file, then reference the config file to create the kerberos cluster.

  1. Create a config file (see SSL Certificates, Additional Kerberos Settings, and Cross-realm trust for additional config settings that can be included in the file):
    root_principal_password_uri: gs://bucket/password.encrypted
    kms_key_uri: projects/myproject/locations/global/keyRings/mykeyring/cryptoKeys/my-key
    
  2. Use the following gcloud command to create the kerberos cluster:
    gcloud dataproc clusters create cluster-name \
        --kerberos-config-file=local path to the config file \
        --image-version=1.3
    

Security Considerations. Cloud Dataproc discards the decrypted form of the password after adding the root principal to the KDC. For security purposes, after creating the cluster you can delete the password file, the key used to decrypt the secret, and remove the service account from the kmsKeyDecrypter role.

REST API

Kerberos clusters can be created through the ClusterConfig.SecurityConfig.KerberosConfig as part of a clusters.create request.

Console

When creating a cluster with image version 1.3+, in the Advanced options panel in the Advanced Security section, select Enable Kerberos and Hadoop Secure Mode, then complete the security options (discussed in the following sections).

OS Login

On-cluster KDC management can be performed with the kadmin command using the root Kerberos user principal or using sudo kadmin.local. Enable OS Login to control who can run superuser commands.

SSL Certificates

As part of enabling Hadoop Secure Mode, Cloud Dataproc creates a self-signed certificate to enable cluster SSL encryption. As an alternative, you can provide a certificate for cluster SSL encryption by adding the following settings to the configuration file when you create a kerberos cluster:

  • ssl:keystore_password_uri: Location in Cloud Storage of the KMS-encrypted file containing the password to the keystore file.
  • ssl:key_password_uri: Location in Cloud Storage of the KMS-encrypted file containing the password to the key in the keystore file.
  • ssl:keystore_uri: Location in Cloud Storage of the keystore file containing the wildcard certificate and the private key used by cluster nodes.
  • ssl:truststore_password_uri: Location in Cloud Storage of the KMS-encrypted file that contains the password to the truststore file.
  • ssl:truststore_uri: Location in Cloud Storage of the trust store file containing trusted certificates.

Sample config file:

root_principal_password_uri: gs://bucket/root_password.encrypted
kms_key_uri: projects/myproject/locations/global/keyRings/mykeyring/cryptoKeys/my-key
ssl:
  key_password_uri: gs://bucket/key_password.encrypted
  keystore_password_uri: gs://bucket/keystore_password.encrypted
  keystore_uri: gs://bucket/keystore.jks
  truststore_password_uri: gs://bucket/truststore_password.encrypted
  truststore_uri: gs://bucket/truststore.jks

Additional Kerberos Settings

To specify the master key of the KDC database, create a kerberos cluster with the following property added in the Kerberos configuration file:

  • kdc_db_key_uri: Location in Cloud Storage of the KMS-encrypted file containing the KDC database master key.

If this property is not set, Cloud Dataproc will generate the master key.

To specify the ticket granting ticket's maximum lifetime (in hours), create a kerberos cluster with the following property added in the Kerberos configuration file:

  • tgt_lifetime_hours: Max life time of the ticket granting ticket in hours.

If this property is not set, Cloud Dataproc will set the ticket granting ticket's life time to 10 hours.

Cross-realm trust

The KDC on the cluster initially contains only the root administrator principal and service principals. You can add user principals manually or establish a cross-realm trust with an external KDC or Active Directory server that holds user principals. Cloud VPN or Cloud Interconnect is recommended to connect to an on-premise KDC/Active Directory,.

To create a kerberos cluster that supports cross-realm trust, add the settings listed below to the Kerberos configuration file when you create a kerberos cluster. The shared password should be encrypted with KMS and stored in a Cloud Storage bucket that the cluster service account can access.

  • cross_realm_trust:admin_server: hostname/address of the remote admin server.
  • cross_realm_trust:kdc: hostname/address of the remote KDC.
  • cross_realm_trust:realm: name of the remote realm to be trusted.
  • cross_realm_trust:shared_password_uri: Location in Cloud Storage of the KMS-encrypted shared password.

Sample config file:

root_principal_password_uri: gs://bucket/root_password.encrypted
kms_key_uri: projects/myproject/locations/global/keyRings/mykeyring/cryptoKeys/my-key
cross_reaml_trust:
  admin_server: admin.remote.realm
  kdc: kdc.remote.realm
  realm: REMOTE.REALM
  shared_password_uri: gs://bucket/shared_password.encrypted

To enable cross-realm trust to a remote KDC:

  1. Add the following in the /etc/krb5.conf file in the remote KDC:

    [realms]
    DATAPROC.REALM = {
      kdc = MASTER-NAME-OR-ADDRESS
      admin_server = MASTER-NAME-OR-ADDRESS
    }
    

  2. Create the trust user:

    kadmin -q "addprinc krbtgt/DATAPROC.REALM@REMOTE.REALM"
    

  3. When prompted, enter the user's password. The password should match the contents of the encrypted shared password file

To enable cross-realm trust with Active Directory, run the following commands in a PowerShell as Administrator:

  1. Create a KDC definition in Active Directory.

    ksetup /addkdc DATAPROC.REALM DATAPROC-CLUSTER-MASTER-NAME-OR-ADDRESS
    

  2. Create trust in Active Directory.

    netdom trust DATAPROC.REALM /Domain AD.REALM /add /realm /passwordt:TRUST-PASSWORD
    
    The password should match the contents of the encrypted shared password file.

dataproc user

A kerberos Cloud Dataproc cluster is multi-tenant only within the cluster. When it reads or writes to other Google Cloud Platform services, the cluster acts as the cluster service account. When you submit jobs to a kerberos cluster, they run as a single dataproc user.

Default and Custom Cluster Properties

Hadoop secure mode is configured with properties in config files. Cloud Dataproc sets default values for these properties.

You can override the default properties when you create the cluster with the gcloud dataproc clusters create --properties flag or by calling the clusters.create API and setting SoftwareConfig properties (see cluster properties examples).

High-Availability Mode

In High Availability (HA) mode, a kerberos cluster will have 3 KDCs: one on each master. The KDC running on the "first" master ($CLUSTER_NAME-m-0) will be the Master KDC and also serve as the Admin Server. The Master KDC's database will be synced to the two slave KDCs at 5 minute intervals through a cron job, and the 3 KDCs will serve read traffic.

Kerberos does not natively support real-time replication or automatic failover if the master KDC is down. To perform a manual failover:

  1. On all KDC machines, in /etc/krb5.conf, change admin_server to the new Master's FQDN (Fully Qualified Domain Name). Remove the old Master from the KDC list.
  2. On the new Master KDC, set up a cron job to propagate the database.
  3. On the new Master KDC, restart the admin_server process (krb5-admin-server).
  4. On all KDC machines, restart the KDC process (krb5-kdc).

For More Information

See the MIT Kerberos Documentation.

Oliko tästä sivusta apua? Kerro mielipiteesi

Palautteen aihe:

Tämä sivu
Cloud Dataproc Documentation
Tarvitsetko apua? Siirry tukisivullemme.