You can install additional components when you create a Dataproc cluster using the Optional components feature. This page describes the Ranger component.
The Apache Ranger
component is an open source framework to manage permission and auditing for the
Hadoop ecosystem. The Ranger admin server and Web UI are available on port
6080 on the cluster's first master node.
Install the component
Install the component when you create a Dataproc cluster. Components can be added to clusters created with Dataproc version 1.3 and later. The Ranger component requires the installation of the Solr component as shown below.
See Supported Dataproc versions for the component version included in each Dataproc image release.
Set up your Ranger admin password:
- Grant the
Cloud KMS CryptoKey Encrypter/Decrypter role
to the cluster service account. By default, the cluster service
account is set as the Compute Engine default service account, which has following form:
email@example.comYou can specify a different cluster service account when you create the cluster, below.
Grant the Cloud KMS CryptoKey Encrypter/Decrypter role
to the Compute Engine default service account:
gcloud projects add-iam-policy-binding project-id \ --member=serviceAccount:firstname.lastname@example.org \ --role=roles/cloudkms.cryptoKeyDecrypter
- Example: Grant the Cloud KMS CryptoKey Encrypter/Decrypter role to the Compute Engine default service account:
- Encrypt your Ranger admin user's password using a
Key Management Service (KMS) key. Your
password must consist of at least 8 characters with a minimum of one
alphabetic and one numeric character.
- Create the key ring:
gcloud kms keyrings create my-keyring --location=global
- Create the key:
gcloud kms keys create my-key \ --location=global \ --keyring=my-keyring \ --purpose=encryption
- Encrypt your Ranger admin user password:
echo "my-ranger-admin-password" | \ gcloud kms encrypt \ --location=global \ --keyring=my-keyring \ --key=my-key \ --plaintext-file=- \ --ciphertext-file=admin-password.encrypted
- Create the key ring:
- Upload the encrypted password to a
Cloud Storage bucket in your project.
gsutil cp admin-password.encrypted gs://my-bucket
- Grant the Cloud KMS CryptoKey Encrypter/Decrypter role to the cluster service account. By default, the cluster service account is set as the Compute Engine default service account, which has following form:
Create your cluster:
- When installing the Ranger component, the
Solr component must also be
installed, as shown below.
- The Ranger component relies on the Solr component to store and query
its audit logs, which by default uses HDFS as storage. This HDFS
data is deleted when the cluster is deleted. To configure
the Solr component to store data, including the Ranger audit logs,
on Cloud Storage, use the
dataproc:solr.gcs.path=gs://<bucket>cluster property when you create your cluster. Cloud Storage data persists after the cluster is deleted.
- The Ranger component relies on the Solr component to store and query its audit logs, which by default uses HDFS as storage. This HDFS data is deleted when the cluster is deleted. To configure the Solr component to store data, including the Ranger audit logs, on Cloud Storage, use the
- Pass the KMS key and password Cloud Storage URIs to the
cluster creation command by setting the
- Optionally, you can pass in the Ranger database's admin user password
through an encrypted Cloud Storage file URI
by setting the
- By default, the Ranger component uses the MySql database instance running
on the cluster's first master node. In the MySQL instance,
log_bin_trust_function_creatorsflag by setting the variable to
ON. Setting this flag controls whether stored function creators can be trusted. After successful cluster creation and Ranger configuration, you can reset the
To persist the Ranger database after cluster deletion, use a Cloud SQL instance as the external MySql Database.
- Set the
dataproc:ranger.cloud-sql.instance.connection.namecluster property to the Cloud SQL instance.
- Set the
dataproc:ranger.cloud-sql.root.password.uricluster property to the Cloud Storage URI of the KMS-key encrypted root password of the Cloud SQL instance.
- Set the
dataproc:ranger.cloud-sql.use-private-ipcluster property to indicate whether the connection to the Cloud SQL instance is over private IP.
The Ranger component uses Cloud SQL Proxy to connect to the Cloud SQL instance. To use the proxy:
- Set the
sqlservice.adminAPI scope when you create the cluster (see Authorizing requests with OAuth 2.0). If using the
gcloud dataproc cluster createcommand, add the
- Enable the SQL Admin API in your project.
- Make sure the cluster service account has the Cloud SQL Editor role.
To create a Dataproc cluster that includes the Ranger component, use the gcloud dataproc clusters create cluster-name command with the
gcloud dataproc clusters create cluster-name \ --optional-components=SOLR,RANGER \ --region=region \ --enable-component-gateway \ --properties="dataproc:ranger.kms.key.uri=projects/project-id/locations/global/keyRings/my-keyring/cryptoKeys/my-key,dataproc:ranger.admin.password.uri=gs://my-bucket/admin-password.encrypted" \ ... other flags
Specify the Ranger and Solr components in the SoftwareConfig.Component field as part of a Dataproc API clusters.create request. You must also set the following cluster properties in the SoftwareConfig.Component.properties field:
- Enable the component and component gateway.
- In the Cloud console, open the Dataproc Create a cluster page. The Set up cluster panel is selected.
- In the Components section:
- Under Optional components, select Ranger, Solr, and other optional components to install on your cluster.
- Under Component Gateway, select Enable component gateway (see Viewing and Accessing Component Gateway URLs).
- Set the
- When installing the Ranger component, the Solr component must also be installed, as shown below.
Click the Web interfaces tab. Under Component gateway, click Ranger to open the Ranger web interface. Login with the Ranger admin username (for example, "admin") and password.
Ranger Admin logs
Ranger admin logs are available in