This tutorial shows how to deploy and configure Datos IO RecoverX to protect your MongoDB database cluster. This tutorial assumes that your MongoDB database is already deployed and fully operational.
For typical deployments, RecoverX is located in the same project as the MongoDB cluster that needs to be protected. The following diagram shows a representative deployment.
Deploying RecoverX in a different project requires SSH connections from the RecoverX node to all of the compute instances on which MongoDB is deployed. RecoverX connects to the MongoDB nodes through SSH connections and uses standard MongoDB APIs to extract the data and the oplogs. RecoverX deploys a lightweight agent on each of the selected MongoDB nodes to facilitate interfacing with the MongoDB cluster. Data is streamed in parallel to Cloud Storage and processed in batch mode offline. After the data is copied, RecoverX processes the data to create a golden copy of the database that is cluster consistent and has no replicas.
Objectives
- Create a Compute Engine instance for MongoDB.
- Create a Cloud Storage bucket to store your backup data.
- Create a Compute Engine instance for RecoverX.
- Configure the MongoDB database.
- Deploy RecoverX and connect from a remote location to configure backup policies.
Costs
This tutorial uses billable components of Google Cloud, including:
- Compute Engine
- Cloud Storage
Use the Pricing Calculator to generate a cost estimate based on your projected usage of Google Cloud.
In addition, RecoverX is licensed directly through Datos IO, based on the physical size of the database that needs to be protected. For pricing-related questions, contact rdio-sales@rubrik.com.
Before you begin
Select or create a Google Cloud project.
Enable billing for your project.
When you finish this tutorial, you can avoid continued billing by deleting the resources you created. See Cleaning up for more detail.
Provisioning your infrastructure for the tutorial
To complete this tutorial, create Compute Engine instances for the MongoDB nodes in cluster 1, a Cloud Storage bucket to store the backup data, and a Compute Engine instance for RecoverX in cluster 2.
Create a Compute Engine instance for the MongoDB nodes
Create a Compute Engine instance. Use
n1-standard-4
or the machine type that best meets your requirements.
Create and configure a Cloud Storage bucket
Create a Cloud Storage bucket to store your backup data.
Create a Cloud Storage bucket using the same service account that you used to create the Compute Engine instances for MongoDB nodes.
Configure an Identity and Access Management (IAM) role or Access Control List (ACL) for the service account to allow read and write access to the Cloud Storage bucket with the role of Editor.
Create a Compute Instance for RecoverX
Datos IO RecoverX is scale-out software that is deployed in a single node. The
minimum node type is n1-standard-8
. The recommended node type for most
deployments is n1-standard-16
.
Create one Compute Engine instance of type
n1-standard-8
using the service account.For the boot disk, select CentOS 7 or RHEL 7.x.
In the Firewall section, select Allow HTTPS traffic.
For the RecoverX Compute Engine instance, create an SSD disk (blank disk) with a capacity of at least 256 GB.
Select the appropriate network.
Format the file system and mount the volume.
gcloud compute ssh [RECOVERX_NODE_NAME] \ 'sudo mkfs -t ext4 [VOLUME_NAME]; sudo mkdir /home sudo mount [VOLUME_NAME] /home'
where:
[RECOVERX_NODE_NAME]
is the name of your RecoverX node.[VOLUME_NAME]
is the name of your volume.
Configuring the MongoDB cluster
To protect the MongoDB database cluster using Datos IO RecoverX and extract data from the source cluster, create a Datos IO user account. Create the account and configure authentication for the user with the following minimum requirements:
- The home directory of
datos_db_user
must have at least 100 GB local storage. - The default shell for
datos_db_user
must be bash.
Create a Datos IO user account, such as datos_db_user
, on each MongoDB node
with the same group ID (GID) as the MongoDB user.
Add
datos_db_user
to the MongoDB group.gcloud compute ssh [MongoDB_NODE_INSTANCE_NAME] \ 'sudo useradd -g MongoDB -m datos_db_user'
where:
[MongoDB_NODE_INSTANCE_NAME]
is the name of your MongoDB node instance, to grant the appropriate permissions.
Configure authentication for the user you created by using one of the following methods:
- Username and password
- Username and SSH key with passphrase
- Username and SSH access key
Give
datos_db_user
write permission to the/home/datos_db_user
home directory on all MongoDB nodes.gcloud compute ssh [MongoDB_NODE_INSTANCE_NAME] \ 'sudo chmod -R u+w /home/datos_db_user'
Give
datos_db_user
read and execute permissions to theMongoDB
data directory and its parent directory on all MongoDB nodes.gcloud compute ssh [MongoDB_NODE_INSTANCE_NAME] \ 'sudo chmod -R g+rx /var/lib/MongoDB; sudo chmod -R g+rx /var/lib/MongoDB/data'
Disable the SSH banner for
datos_db_user
in thedatos_db_user
home directory to hide the banner from thedatos_db_user
.gcloud compute ssh [MongoDB_NODE_INSTANCE_NAME] 'touch ~/.hushlogin'
For each node of the MongoDB cluster, edit the file
/etc/ssh/sshd_config
and set sshd parametersMaxSessions
to500
andMaxStartups
to500:1:500
.Verify that the
Max Sessions
andMaxStartups
values of these parameters were picked up by the SSHD service./usr/sbin/sshd -T | grep -i maxs
To resolve Mongos node hostnames from the RecoverX server, connect to the config server and get hostnames of all Mongos instances in the MongoDB cluster. These hostnames are used to add firewall rules.
mongo --host [CONFIG_IP] --port [CONFIG_PORT] db.mongos.find()
where:
[CONFIG_IP]
is the IP address of the primary MongoDB node[CONFIG_PORT]
is the port of the primary MongoDB node
Setting up communications to the Datos IO RecoverX cluster
In this section, you set up network firewall rules and configure the communication between the MongoDB hosts and the RecoverX cluster.
Set up network ports
Follow the instructions in firewall rules to open the network ports to turn on communication between RecoverX, the nodes running MongoDB, and the Cloud Storage bucket. In the following table, open the ports to the compute instance running RecoverX.
Network | Protocols:Ports | Purpose |
---|---|---|
From external sources to RecoverX | TCP:9090 (user defined) | Access Datos IO UI/API |
In the following table, you open these ports from every node in the MongoDB cluster to the RecoverX compute instance. Opening the ports is necessary to let specific MongoDB services communicate.
Network | Protocols:Ports | Purpose |
---|---|---|
From MongoDB to MongoDB nodes | TCP: (user defined) | From all data nodes Mongod to Mongod |
From RecoverX to MongoDB nodes | TCP:27110 (or user defined) | All Config server ports |
From RecoverX to MongoDB nodes | TCP:27017 (or user defined) | All Mongos ports used |
From RecoverX to MongoDB nodes | TCP: (user defined) | All Mongod data ports used |
From MongoDB nodes to RecoverX | TCP:5672 | Messaging to Application Listener (RabbitMQ) |
Create a Datos IO user on the RecoverX node
In this section, you create a Datos IO user, such as datos_db_user
, on the
RecoverX node with the following characteristics:
- The home directory on the non-root volume that was previously created.
- SSH access to all the nodes in the MongoDB cluster. It's recommended that you provide key-based, passwordless SSH access to each RecoverX node in the cluster.
- The same group ID (GID) as the
datos_db_user
that was created on the MongoDB compute instances. For example, if thedatos_db_user
hasGID 1001
,datos_db_user
must haveGID 1001
.
Add the Datos IO user to the RecoverX node.
Retrieve the GID on the MongoDB compute instance.
id datos_db_user
On the RecoverX node, create a new group using the GID you retrieved.
sudo groupadd -g [GID] MongoDB
where:
[GID]
is the group ID that you retrieved in the first step.
Add the user.
sudo useradd -g MongoDB -m datos_db_user -d /home/datos_db_user
Provide sudo privileges
On the RecoverX node, the datos_db_user
needs sudo privileges for the
following commands:
/bin/cp /sbin/chkconfig
To add this privilege:
Log in as the root user.
Edit the configuration file for sudo access, using the
visudo
command.sudo visudo
Append the following line to the file and save the file.
datos_db_user ALL=NOPASSWD: /sbin/chkconfig, /bin/cp
Configure RecoverX nodes
Make the following changes in the limits.conf
file of all RecoverX nodes.
Edit the
nproc
andnofiles
parameters in/etc/security/limits.conf
to match the following commands.hard nproc unlimited soft nproc unlimited hard nofile 64000 soft nofile 64000
Edit the
nproc
parameter in/etc/security/limits.d/90-nproc.conf
to match the following commands.hard nproc unlimited soft nproc unlimited
Verify the changes.
ulimit -a
Verify that the
/tmp
directory has at least 2 GB empty space on each RecoverX node.df /tmp
Make sure that the short name and FQDN of the RecoverX node is included in the
/etc/hosts
file. For example, a RecoverX node with the hostnamedatosserver.dom.local
and an IP address of192.168.2.4
would have a hosts file listing similar to the following.cat /etc/hosts 127.0.0.1 localhost.localdomain localhost 192.168.2.4 datosserver datosserver.dom.local ::1 localhost6.localdomain6 localhost6
Installing RecoverX
When you install RecoverX, you use the datos_db_user
name that you created
earlier.
Copy the RecoverX compressed tar file to the compute instances.
gcloud compute scp datos_[VERSION].tar.gz
:~ gcloud compute ssh 'sudo mv datos_[VERSION].tar.gz \ /home/datos_db_user; sudo chown datos_db_user \ /home/datos_db_user/datos_[VERSION].tar.gz' where:
[VERSION]
is your version of the tar file.
Uncompress the tar file on the target node. A top-level directory called
datos_[VERSION]
is included.tar -zxf datos_[VERSION].tar.gz
Switch to the uncompressed Datos IO directory.
cd datos_[VERSION]
Install RecoverX in the target installation directory.
./install_datos --ip-address [IP_ADDRESS] \ --target-dir /home/datos_db_user/datosinstall.
where:
[IP_ADDRESS]
is the internal IP addresses of the instances.
Registering RecoverX
After installation, you register RecoverX to run as a service.
On the RecoverX node, run the following commands as a system admin with root privileges.
sudo cp datos-server /etc/init.d/ sudo chmod 755 /etc/init.d/datos-server sudo chkconfig --add datos-server sudo touch /var/lock/subsys/datos-server
On the Recover X server, change to the
datos user
directory and run the following commands as a system admin with root privileges:sudo chown root $DATOS_INSTALL/lib/fuse/bin/fusermount sudo chmod u+s $DATOS_INSTALL/lib/fuse/bin/fusermount sudo modprobe fuse sudo mount -t fusectl fusectl /sys/fs/fuse/connections
Accessing RecoverX
To log in to the RecoverX console, you connect to the public IP address of the RecoverX node.
Connect to the console using a browser.
https://[IP_ADDRESS]:9090/#/dashboard
where:
[IP_ADDRESS]
is the IP address of the node where RecoverX is deployed.
When you log in for the first time, use the following credentials.
- Username:
admin
- Password:
admin
- Username:
To change the default password, click Settings, and select Change Password.
Configuring RecoverX
To add a data source to the Datos IO environment, you add the configured MongoDB database cluster and the provisioned Cloud Storage bucket. You can choose to use Cassandra or MongoDB as the type of data source.
In Datos IO, click Configuration > Data sources.
In the Data sources panel, click the Plus icon, and then configure the following data sources.
- Data Source Type: Select Cassandra or MongoDB.
- Cluster Name: Type a unique name for the data source.
Configuration IP:
- For Cassandra, use the private IP address of any node.
- For MongoDB, use the IP address of all configuration servers for sharded clusters and the primary node for non-sharded clusters.
Configuration Port:
- For Cassandra, use the CQLSH port.
- For MongoDB, use the port on which the node listens.
Source Authentication: Select the authentication method used by your source machines, and then provide the username, password, or key.
To enable driver authentication, click Cassandra/MongoDB Driver Authentication.
To ignore non-local secondary nodes in the MongoDB source replica sets, click Ignore Secondaries and enter the node IP address, hostnames, and port.
Cleaning up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Delete the project
The easiest way to eliminate billing is to delete the project you created for the tutorial.
- In the Cloud Console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Delete your Compute Engine instance
- In the Cloud Console, go to the VM instances page.
- Click the checkbox for the instance you want to delete.
- Click delete Delete to delete the instance.
Delete your Cloud Storage bucket
- In the Cloud Console, go to the Cloud Storage Browser page.
- Click the checkbox for the bucket you want to delete.
- To delete the bucket, click Delete delete.
What's next
- Deploying MongoDB on Compute Engine.
- Try out other Google Cloud features for yourself. Have a look at our tutorials.