Dataproc optional Ranger component

You can install additional components like Ranger when you create a Dataproc cluster using the Optional components feature. This page describes the Ranger component.

The Apache Ranger component is an open source framework to manage permission and auditing for the Hadoop ecosystem. The Ranger admin server and Web UI are available on port 6080 on the cluster's first master node.

Also see:

Install the component

Install the component when you create a Dataproc cluster. Components can be added to clusters created with Dataproc version 1.3 and later. The Ranger component requires the installation of the Solr component as shown below.

See Supported Dataproc versions for the component version included in each Dataproc image release.

Installation steps:

  1. Set up your Ranger admin password:

    1. Grant the Cloud KMS CryptoKey Encrypter/Decrypter role to the cluster service account. By default, the cluster service account is set as the Compute Engine default service account, which has following form:
      project-number-compute@developer.gserviceaccount.com
      
      You can specify a different cluster service account when you create the cluster, below.
      1. Example: Grant the Cloud KMS CryptoKey Encrypter/Decrypter role to the Compute Engine default service account:
        gcloud projects add-iam-policy-binding project-id \
            --member=serviceAccount:project-number-compute@developer.gserviceaccount.com \
            --role=roles/cloudkms.cryptoKeyDecrypter
        
    2. Encrypt your Ranger admin user's password using a Key Management Service (KMS) key. For pre-2.2 image version clusters, the password must consist of at least 8 characters, with at least one alphabetic and one numeric character. For 2.2 and later image version clusters, the password must consist of at least 8 characters, with at least one uppercase letter, one lowercase letter, and one numeric character.
      1. Example:
        1. Create the key ring:
          gcloud kms keyrings create my-keyring --location=global
          
        2. Create the key:
          gcloud kms keys create my-key \
              --location=global \
              --keyring=my-keyring \
              --purpose=encryption
          
        3. Encrypt your Ranger admin user password:
          echo 'my-ranger-admin-password' | \
            gcloud kms encrypt \
              --location=global \
              --keyring=my-keyring \
              --key=my-key \
              --plaintext-file=- \
              --ciphertext-file=admin-password.encrypted
          
    3. Upload the encrypted password to a Cloud Storage bucket in your project.
      1. Example:
        gcloud storage cp admin-password.encrypted gs://my-bucket
        
  2. Create your cluster:

    1. When installing the Ranger component, the Solr component must also be installed, as shown below.
      1. The Ranger component relies on the Solr component to store and query its audit logs, which by default uses HDFS as storage. This HDFS data is deleted when the cluster is deleted. To configure the Solr component to store data, including the Ranger audit logs, on Cloud Storage, use the dataproc:solr.gcs.path=gs://<bucket> cluster property when you create your cluster. Cloud Storage data persists after the cluster is deleted.
    2. Pass the KMS key and password Cloud Storage URIs to the cluster creation command by setting the dataproc:ranger.kms.key.uri and dataproc:ranger.admin.password.uri cluster properties.
    3. Optionally, you can pass in the Ranger database's admin user password through an encrypted Cloud Storage file URI by setting the dataproc:ranger.db.admin.password.uri cluster property.
    4. By default, the Ranger component uses the MySql database instance running on the cluster's first master node. In the MySQL instance, enable the log_bin_trust_function_creators flag by setting the variable to ON. Setting this flag controls whether stored function creators can be trusted. After successful cluster creation and Ranger configuration, you can reset the log_bin_trust_function_creators to OFF.
    5. To persist the Ranger database after cluster deletion, use a Cloud SQL instance as the external MySql Database.

      1. Set the dataproc:ranger.cloud-sql.instance.connection.name cluster property to the Cloud SQL instance.
      2. Set the dataproc:ranger.cloud-sql.root.password.uri cluster property to the Cloud Storage URI of the KMS-key encrypted root password of the Cloud SQL instance.
      3. Set the dataproc:ranger.cloud-sql.use-private-ip cluster property to indicate whether the connection to the Cloud SQL instance is over private IP.

      The Ranger component uses Cloud SQL Proxy to connect to the Cloud SQL instance. To use the proxy:

      1. Set the sqlservice.admin API scope when you create the cluster (see Authorizing requests with OAuth 2.0). If using the gcloud dataproc cluster create command, add the --scopes=default,sql-admin parameter.
      2. Enable the SQL Admin API in your project.
      3. Make sure the cluster service account has the Cloud SQL Editor role.

      gcloud command

      To create a Dataproc cluster that includes the Ranger component, use the gcloud dataproc clusters create cluster-name command with the --optional-components flag.

      gcloud dataproc clusters create cluster-name \
          --optional-components=SOLR,RANGER \
          --region=region \
          --enable-component-gateway \
          --properties="dataproc:ranger.kms.key.uri=projects/project-id/locations/global/keyRings/my-keyring/cryptoKeys/my-key,dataproc:ranger.admin.password.uri=gs://my-bucket/admin-password.encrypted" \
          ... other flags
      

      REST API

      Specify the Ranger and Solr components in the SoftwareConfig.Component field as part of a Dataproc API clusters.create request. You must also set the following cluster properties in the SoftwareConfig.Component.properties field:

      1. dataproc:ranger.kms.key.uri: "projects/project-id/locations/global/keyRings/my-keyring/cryptoKeys/my-key"
      2. dataproc:ranger.admin.password.uri : "gs://my-bucket/admin-password.encrypted"

      Console

      1. Enable the component and component gateway.
        • In the Google Cloud console, open the Dataproc Create a cluster page. The Set up cluster panel is selected.
        • In the Components section:

Click the Web interfaces tab. Under Component gateway, click Ranger to open the Ranger web interface. Login with the Ranger admin username (for example, "admin") and password.

Ranger Admin logs

Ranger admin logs are available in Logging as ranger-admin-root logs.