You can install additional components like Ranger when you create a Dataproc cluster using the Optional components feature. This page describes the Ranger component.
The Apache Ranger
component is an open source framework to manage permission and auditing for the
Hadoop ecosystem. The Ranger admin server and Web UI are available on port
6080
on the cluster's first master node.
Also see:
Install the component
Install the component when you create a Dataproc cluster. Components can be added to clusters created with Dataproc version 1.3 and later. The Ranger component requires the installation of the Solr component as shown below.
See Supported Dataproc versions for the component version included in each Dataproc image release.
Installation steps:
Set up your Ranger admin password:
- Grant the
Cloud KMS CryptoKey Encrypter/Decrypter role
to the cluster service account. By default, the cluster service
account is set as the Compute Engine default service account, which has following form:
You can specify a different cluster service account when you create the cluster, below.project-number-compute@developer.gserviceaccount.com
- Example:
Grant the Cloud KMS CryptoKey Encrypter/Decrypter role
to the Compute Engine default service account:
gcloud projects add-iam-policy-binding project-id \ --member=serviceAccount:project-number-compute@developer.gserviceaccount.com \ --role=roles/cloudkms.cryptoKeyDecrypter
- Example:
Grant the Cloud KMS CryptoKey Encrypter/Decrypter role
to the Compute Engine default service account:
- Encrypt your Ranger admin user's password using a
Key Management Service (KMS) key.
For pre-2.2 image version clusters, the password must consist of at least
8 characters, with at least one alphabetic and one numeric character. For
2.2 and later image version clusters, the password must consist of at least
8 characters, with at least one uppercase letter, one lowercase letter,
and one numeric character.
- Example:
- Create the key ring:
gcloud kms keyrings create my-keyring --location=global
- Create the key:
gcloud kms keys create my-key \ --location=global \ --keyring=my-keyring \ --purpose=encryption
- Encrypt your Ranger admin user password:
echo 'my-ranger-admin-password' | \ gcloud kms encrypt \ --location=global \ --keyring=my-keyring \ --key=my-key \ --plaintext-file=- \ --ciphertext-file=admin-password.encrypted
- Create the key ring:
- Example:
- Upload the encrypted password to a
Cloud Storage bucket in your project.
- Example:
gcloud storage cp admin-password.encrypted gs://my-bucket
- Example:
- Grant the
Cloud KMS CryptoKey Encrypter/Decrypter role
to the cluster service account. By default, the cluster service
account is set as the Compute Engine default service account, which has following form:
Create your cluster:
- When installing the Ranger component, the
Solr component must also be
installed, as shown below.
- The Ranger component relies on the Solr component to store and query
its audit logs, which by default uses HDFS as storage. This HDFS
data is deleted when the cluster is deleted. To configure
the Solr component to store data, including the Ranger audit logs,
on Cloud Storage, use the
dataproc:solr.gcs.path=gs://<bucket>
cluster property when you create your cluster. Cloud Storage data persists after the cluster is deleted.
- The Ranger component relies on the Solr component to store and query
its audit logs, which by default uses HDFS as storage. This HDFS
data is deleted when the cluster is deleted. To configure
the Solr component to store data, including the Ranger audit logs,
on Cloud Storage, use the
- Pass the KMS key and password Cloud Storage URIs to the
cluster creation command by setting the
dataproc:ranger.kms.key.uri
anddataproc:ranger.admin.password.uri
cluster properties. - Optionally, you can pass in the Ranger database's admin user password
through an encrypted Cloud Storage file URI
by setting the
dataproc:ranger.db.admin.password.uri
cluster property. - By default, the Ranger component uses the MySql database instance running
on the cluster's first master node. In the MySQL instance,
enable the
log_bin_trust_function_creators
flag by setting the variable toON
. Setting this flag controls whether stored function creators can be trusted. After successful cluster creation and Ranger configuration, you can reset thelog_bin_trust_function_creators
toOFF
. To persist the Ranger database after cluster deletion, use a Cloud SQL instance as the external MySql Database.
- Set the
dataproc:ranger.cloud-sql.instance.connection.name
cluster property to the Cloud SQL instance. - Set the
dataproc:ranger.cloud-sql.root.password.uri
cluster property to the Cloud Storage URI of the KMS-key encrypted root password of the Cloud SQL instance. - Set the
dataproc:ranger.cloud-sql.use-private-ip
cluster property to indicate whether the connection to the Cloud SQL instance is over private IP.
The Ranger component uses Cloud SQL Proxy to connect to the Cloud SQL instance. To use the proxy:
- Set the
sqlservice.admin
API scope when you create the cluster (see Authorizing requests with OAuth 2.0). If using thegcloud dataproc cluster create
command, add the--scopes=default,sql-admin
parameter. - Enable the SQL Admin API in your project.
- Make sure the cluster service account has the Cloud SQL Editor role.
gcloud command
To create a Dataproc cluster that includes the Ranger component, use the gcloud dataproc clusters create cluster-name command with the
--optional-components
flag.gcloud dataproc clusters create cluster-name \ --optional-components=SOLR,RANGER \ --region=region \ --enable-component-gateway \ --properties="dataproc:ranger.kms.key.uri=projects/project-id/locations/global/keyRings/my-keyring/cryptoKeys/my-key,dataproc:ranger.admin.password.uri=gs://my-bucket/admin-password.encrypted" \ ... other flags
REST API
Specify the Ranger and Solr components in the SoftwareConfig.Component field as part of a Dataproc API clusters.create request. You must also set the following cluster properties in the SoftwareConfig.Component.properties field:
dataproc:ranger.kms.key.uri
: "projects/project-id/locations/global/keyRings/my-keyring/cryptoKeys/my-key"dataproc:ranger.admin.password.uri
: "gs://my-bucket/admin-password.encrypted"
Console
- Enable the component and component gateway.
- In the Google Cloud console, open the Dataproc Create a cluster page. The Set up cluster panel is selected.
- In the Components section:
- Under Optional components, select Ranger, Solr, and other optional components to install on your cluster.
- Under Component Gateway, select Enable component gateway (see Viewing and Accessing Component Gateway URLs).
- Set the
- When installing the Ranger component, the
Solr component must also be
installed, as shown below.
Click the Web interfaces tab. Under Component gateway, click Ranger to open the Ranger web interface. Login with the Ranger admin username (for example, "admin") and password.
Ranger Admin logs
Ranger admin logs are available in
Logging as ranger-admin-root
logs.