This tutorial shows how to deploy and configure Datos IO RecoverX to protect your MongoDB database cluster. This tutorial assumes that your MongoDB database is already deployed and fully operational.
For typical deployments, RecoverX is located in the same project as the MongoDB cluster that needs to be protected. The following diagram shows a representative deployment.
Deploying RecoverX in a different project requires SSH connections from the RecoverX node to all of the compute instances on which MongoDB is deployed. RecoverX connects to the MongoDB nodes through SSH connections and uses standard MongoDB APIs to extract the data and the oplogs. RecoverX deploys a lightweight agent on each of the selected MongoDB nodes to facilitate interfacing with the MongoDB cluster. Data is streamed in parallel to Cloud Storage and processed in batch mode offline. After the data is copied, RecoverX processes the data to create a golden copy of the database that is cluster consistent and has no replicas.
- Create a Compute Engine instance for MongoDB.
- Create a Cloud Storage bucket to store your backup data.
- Create a Compute Engine instance for RecoverX.
- Configure the MongoDB database.
- Deploy RecoverX and connect from a remote location to configure backup policies.
This tutorial uses billable components of Cloud Platform, including:
- Compute Engine
- Cloud Storage
Use the Pricing Calculator to generate a cost estimate based on your projected usage of GCP.
Before you begin
Select or create a GCP project.
Enable billing for your project.
When you finish this tutorial, you can avoid continued billing by deleting the resources you created. See Cleaning up for more detail.
Provisioning your infrastructure for the tutorial
To complete this tutorial, create Compute Engine instances for the MongoDB nodes in cluster 1, a Cloud Storage bucket to store the backup data, and a Compute Engine instance for RecoverX in cluster 2.
Create a Compute Engine instance for the MongoDB nodes
Create a Compute Engine instance. Use
n1-standard-4or the machine type that best meets your requirements.
Create and configure a Cloud Storage bucket
Create a Cloud Storage bucket to store your backup data.
Create a Cloud Storage bucket using the same service account that you used to create the Compute Engine instances for MongoDB nodes.
Configure a Cloud Identity and Access Management (Cloud IAM) role or Access Control List (ACL) for the service account to allow read and write access to the Cloud Storage bucket with the role of Editor.
Create a Compute Instance for RecoverX
Datos IO RecoverX is scale-out software that is deployed in a single node. The
minimum node type is
n1-standard-8. The recommended node type for most
Create one Compute Engine instance of type
n1-standard-8using the service account.
For the boot disk, select CentOS 7 or RHEL 7.x.
In the Firewall section, select Allow HTTPS traffic.
For the RecoverX Compute Engine instance, create an SSD disk (blank disk) with a capacity of at least 256 GB.
Select the appropriate network.
Format the file system and mount the volume.
gcloud compute ssh [RECOVERX_NODE_NAME] \ 'sudo mkfs -t ext4 [VOLUME_NAME]; sudo mkdir /home sudo mount [VOLUME_NAME] /home'
[RECOVERX_NODE_NAME]is the name of your RecoverX node.
[VOLUME_NAME]is the name of your volume.
Configuring the MongoDB cluster
To protect the MongoDB database cluster using Datos IO RecoverX and extract data from the source cluster, create a Datos IO user account. Create the account and configure authentication for the user with the following minimum requirements:
- The home directory of
datos_db_usermust have at least 100 GB local storage.
- The default shell for
datos_db_usermust be bash.
Create a Datos IO user account, such as
datos_db_user, on each MongoDB node
with the same group ID (GID) as the MongoDB user.
datos_db_userto the MongoDB group.
gcloud compute ssh [MongoDB_NODE_INSTANCE_NAME] \ 'sudo useradd -g MongoDB -m datos_db_user'
[MongoDB_NODE_INSTANCE_NAME]is the name of your MongoDB node instance, to grant the appropriate permissions.
Configure authentication for the user you created by using one of the following methods:
- Username and password
- Username and SSH key with passphrase
- Username and SSH access key
datos_db_userwrite permission to the
/home/datos_db_userhome directory on all MongoDB nodes.
gcloud compute ssh [MongoDB_NODE_INSTANCE_NAME] \ 'sudo chmod -R u+w /home/datos_db_user'
datos_db_userread and execute permissions to the
MongoDBdata directory and its parent directory on all MongoDB nodes.
gcloud compute ssh [MongoDB_NODE_INSTANCE_NAME] \ 'sudo chmod -R g+rx /var/lib/MongoDB; sudo chmod -R g+rx /var/lib/MongoDB/data'
Disable the SSH banner for
datos_db_userhome directory to hide the banner from the
gcloud compute ssh [MongoDB_NODE_INSTANCE_NAME] 'touch ~/.hushlogin'
For each node of the MongoDB cluster, edit the file
/etc/ssh/sshd_configand set sshd parameters
Verify that the
MaxStartupsvalues of these parameters were picked up by the SSHD service.
/usr/sbin/sshd -T | grep -i maxs
To resolve Mongos node hostnames from the RecoverX server, connect to the config server and get hostnames of all Mongos instances in the MongoDB cluster. These hostnames are used to add firewall rules.
mongo --host [CONFIG_IP] --port [CONFIG_PORT] db.mongos.find()
[CONFIG_IP]is the IP address of the primary MongoDB node
[CONFIG_PORT]is the port of the primary MongoDB node
Setting up communications to the Datos IO RecoverX cluster
In this section, you set up network firewall rules and configure the communication between the MongoDB hosts and the RecoverX cluster.
Set up network ports
Follow the instructions in firewall rules to open the network ports to turn on communication between RecoverX, the nodes running MongoDB, and the Cloud Storage bucket. In the following table, open the ports to the compute instance running RecoverX.
|From external sources to RecoverX||TCP:9090 (user defined)||Access Datos IO UI/API|
In the following table, you open these ports from every node in the MongoDB cluster to the RecoverX compute instance. Opening the ports is necessary to let specific MongoDB services communicate.
|From MongoDB to MongoDB nodes||TCP: (user defined)||From all data nodes Mongod to Mongod|
|From RecoverX to MongoDB nodes||TCP:27110 (or user defined)||All Config server ports|
|From RecoverX to MongoDB nodes||TCP:27017 (or user defined)||All Mongos ports used|
|From RecoverX to MongoDB nodes||TCP: (user defined)||All Mongod data ports used|
|From MongoDB nodes to RecoverX||TCP:5672||Messaging to Application Listener (RabbitMQ)|
Create a Datos IO user on the RecoverX node
In this section, you create a Datos IO user, such as
datos_db_user, on the
RecoverX node with the following characteristics:
- The home directory on the non-root volume that was previously created.
- SSH access to all the nodes in the MongoDB cluster. It's recommended that you provide key-based, passwordless SSH access to each RecoverX node in the cluster.
- The same group ID (GID) as the
datos_db_userthat was created on the MongoDB compute instances. For example, if the
Add the Datos IO user to the RecoverX node.
Retrieve the GID on the MongoDB compute instance.
On the RecoverX node, create a new group using the GID you retrieved.
sudo groupadd -g [GID] MongoDB
[GID]is the group ID that you retrieved in the first step.
Add the user.
sudo useradd -g MongoDB -m datos_db_user -d /home/datos_db_user
Provide sudo privileges
On the RecoverX node, the
datos_db_user needs sudo privileges for the
To add this privilege:
Log in as the root user.
Edit the configuration file for sudo access, using the
Append the following line to the file and save the file.
datos_db_user ALL=NOPASSWD: /sbin/chkconfig, /bin/cp
Configure RecoverX nodes
Make the following changes in the
limits.conf file of all RecoverX nodes.
/etc/security/limits.confto match the following commands.
hard nproc unlimited soft nproc unlimited hard nofile 64000 soft nofile 64000
/etc/security/limits.d/90-nproc.confto match the following commands.
hard nproc unlimited soft nproc unlimited
Verify the changes.
Verify that the
/tmpdirectory has at least 2 GB empty space on each RecoverX node.
Make sure that the short name and FQDN of the RecoverX node is included in the
/etc/hostsfile. For example, a RecoverX node with the hostname
datosserver.dom.localand an IP address of
192.168.2.4would have a hosts file listing similar to the following.
cat /etc/hosts 127.0.0.1 localhost.localdomain localhost 192.168.2.4 datosserver datosserver.dom.local ::1 localhost6.localdomain6 localhost6
When you install RecoverX, you use the
datos_db_user name that you created
Copy the RecoverX compressed tar file to the compute instances.
gcloud compute scp datos_[VERSION].tar.gz
:~ gcloud compute ssh 'sudo mv datos_[VERSION].tar.gz \ /home/datos_db_user; sudo chown datos_db_user \ /home/datos_db_user/datos_[VERSION].tar.gz'
[VERSION]is your version of the tar file.
Uncompress the tar file on the target node. A top-level directory called
tar -zxf datos_[VERSION].tar.gz
Switch to the uncompressed Datos IO directory.
Install RecoverX in the target installation directory.
./install_datos --ip-address [IP_ADDRESS] \ --target-dir /home/datos_db_user/datosinstall.
[IP_ADDRESS]is the internal IP addresses of the instances.
After installation, you register RecoverX to run as a service.
On the RecoverX node, run the following commands as a system admin with root privileges.
sudo cp datos-server /etc/init.d/ sudo chmod 755 /etc/init.d/datos-server sudo chkconfig --add datos-server sudo touch /var/lock/subsys/datos-server
On the Recover X server, change to the
datos userdirectory and run the following commands as a system admin with root privileges:
sudo chown root $DATOS_INSTALL/lib/fuse/bin/fusermount sudo chmod u+s $DATOS_INSTALL/lib/fuse/bin/fusermount sudo modprobe fuse sudo mount -t fusectl fusectl /sys/fs/fuse/connections
To log in to the RecoverX console, you connect to the public IP address of the RecoverX node.
Connect to the console using a browser.
[IP_ADDRESS]is the IP address of the node where RecoverX is deployed.
When you log in for the first time, use the following credentials.
To change the default password, click Settings, and select Change Password.
To add a data source to the Datos IO environment, you add the configured MongoDB database cluster and the provisioned Cloud Storage bucket. You can choose to use Cassandra or MongoDB as the type of data source.
In Datos IO, click Configuration > Data sources.
In the Data sources panel, click the Plus icon, and then configure the following data sources.
- Data Source Type: Select Cassandra or MongoDB.
- Cluster Name: Type a unique name for the data source.
- For Cassandra, use the private IP address of any node.
- For MongoDB, use the IP address of all configuration servers for sharded clusters and the primary node for non-sharded clusters.
- For Cassandra, use the CQLSH port.
- For MongoDB, use the port on which the node listens.
Source Authentication: Select the authentication method used by your source machines, and then provide the username, password, or key.
To enable driver authentication, click Cassandra/MongoDB Driver Authentication.
To ignore non-local secondary nodes in the MongoDB source replica sets, click Ignore Secondaries and enter the node IP address, hostnames, and port.
To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:
Delete the project
The easiest way to eliminate billing is to delete the project you created for the tutorial.
- In the Cloud Console, go to the Manage resources page.
- In the project list, select the project that you want to delete and then click Delete delete.
- In the dialog, type the project ID and then click Shut down to delete the project.
Delete your Compute Engine instance
- In the Cloud Console, go to the VM Instances page.
- Click the checkbox for the instance you want to delete.
- Click Delete delete to delete the instance.
Delete your Cloud Storage bucket
- In the Cloud Console, go to the Cloud Storage Browser page.
- Click the checkbox for the bucket you want to delete.
- To delete the bucket, click Delete delete.