Dataproc Ranger Component

You can install additional components when you create a Dataproc cluster using the Optional Components feature. This page describes the Ranger component.

The Apache Ranger component is an open source framework to manage permission and auditing for the Hadoop ecosystem. The Ranger admin server and Web UI are available on port 6080 on the cluster's first master node.

Install the component

Install the component when you create a Dataproc cluster. Components can be added to clusters created with Dataproc version 1.3 and later. The Ranger component requires the installation of the Solr component as shown below.

See Supported Cloud Dataproc versions for the component version included in each Dataproc image release.

Installation steps:

  1. Set up your Ranger admin password:

    1. Grant the Cloud KMS CryptoKey Encrypter/Decrypter role to the cluster service account (see Granting roles to a service account for specific resources). This account is of the form:
      project-number-compute@developer.gserviceaccount.com
      
      1. Example:
        gcloud projects add-iam-policy-binding project-id \  
            --member serviceAccount:service-project-number-compute@developer.gserviceaccount.com \  
            --role roles/cloudkms.cryptoKeyDecrypter
        
    2. Encrypt your Ranger admin user's password using a Key Management Service (KMS) key. Your password must consist of at least 8 characters with a minimum of one alphabetic and one numeric character.
      1. Example:
        1. Create the key ring:
          gcloud kms keyrings create my-keyring --location global
          
        2. Create the key:
          gcloud kms keys create my-key \  
              --location global \  
              --keyring my-keyring \  
              --purpose encryption
          
        3. Encrypt your Ranger admin user password:
          echo "my-ranger-admin-password" | \  
            gcloud kms encrypt \  
              --location=global \  
              --keyring=my-keyring \  
              --key=my-key \  
              --plaintext-file=- \  
              --ciphertext-file=admin-password.encrypted
          
    3. Upload the encrypted password to a Cloud Storage bucket in your project.
      1. Example:
        gsutil cp admin-password.encrypted gs://my-bucket
        
  2. Create your cluster:

    1. When installing the Ranger component, the Solr component must also be installed, as shown below.
      1. The Ranger component relies on the Solr component to store and query its audit logs, which by default use HDFS as storage. This HDFS data is deleted when the cluster is deleted. To configure the Solr component to store data, including the Ranger audit logs, on Cloud Storage, use the dataproc:solr.gcs.path=gs://<bucket> cluster property when you create your cluster. Cloud Storage data persists after the cluster is deleted.
    2. Pass the KMS key and password Cloud Storage URIs to the cluster creation command by setting the dataproc:ranger.kms.key.uri and dataproc:ranger.admin.password.uri cluster properties.
    3. Optionally, you can pass in the Ranger database's admin user password through an encrypted Cloud Storage file URI by setting the dataproc:ranger.db.admin.password.uri cluster property.
    4. By default, the Ranger component uses the MySql database instance running on the cluster's first master node. To persist the Ranger database after cluster deletion, use a Cloud SQL instance as the external MySql Database.

      1. Set the dataproc:ranger.cloud-sql.instance.connection.name cluster property to the Cloud SQL instance.
      2. Set the dataproc:ranger.cloud-sql.root.password.uri cluster property to the Cloud Storage URI of the KMS-key encrypted root password of the Cloud SQL instance.
      3. Set the dataproc:ranger.cloud-sql.use-private-ip cluster property to indicate whether the connection to the Cloud SQL instance is over private IP.

      The Ranger component uses Cloud SQL Proxy to connect to the Cloud SQL instance. To use the proxy:

      1. Set the sqlservice.admin API scope when you create the cluster (if using the gcloud dataproc cluster create command, add the --scopes=default,sql-admin parameter).
      2. Enable the SQL Admin API in your project.
      3. Give the Cloud SQL Editor role to the Compute Engine default service account in your project.

      gcloud command

      To create a Dataproc cluster that includes the Ranger component, use the gcloud beta dataproc clusters create cluster-name command with the --optional-components flag.

      gcloud beta dataproc clusters create cluster-name \  
          --optional-components=SOLR,RANGER \  
          --region=region \  
          --enable-component-gateway \  
          --properties="dataproc:ranger.kms.key.uri=projects/project-id/locations/global/keyRings/my-keyring/cryptoKeys/my-key,dataproc:ranger.admin.password.uri=gs://my-bucket/admin-password.encrypted" \  
          ... other flags
      

      REST API

      Specify the Ranger and Solr components in the SoftwareConfig.Component field as part of a Dataproc API clusters.create request. You must also set the following cluster properties in the SoftwareConfig.Component.properties field:

      1. dataproc:ranger.kms.key.uri: "projects/project-id/locations/global/keyRings/my-keyring/cryptoKeys/my-key"
      2. dataproc:ranger.admin.password.uri : "gs://my-bucket/admin-password.encrypted"

      Console

      Installing the Ranger component from the Cloud Console is currently not supported.

Ranger Admin logs

Ranger admin logs are available in Logging as ranger-admin-root logs.