Deploying Datos IO RecoverX to Protect MongoDB Databases

By: Tarun Thakur, Rubrik Datos IO

This tutorial shows how to deploy and configure Datos IO RecoverX to protect your MongoDB database cluster. This tutorial assumes that your MongoDB database is already deployed and fully operational.

For typical deployments, RecoverX is located in the same project as the MongoDB cluster that needs to be protected. The following diagram shows a representative deployment.

Reference architecture for Dataos IO RecoverX for MongoDB

Deploying RecoverX in a different project requires SSH connections from the RecoverX node to all of the compute instances on which MongoDB is deployed. RecoverX connects to the MongoDB nodes through SSH connections and uses standard MongoDB APIs to extract the data and the oplogs. RecoverX deploys a lightweight agent on each of the selected MongoDB nodes to facilitate interfacing with the MongoDB cluster. Data is streamed in parallel to Cloud Storage and processed in batch mode offline. After the data is copied, RecoverX processes the data to create a golden copy of the database that is cluster consistent and has no replicas.


  • Create a Compute Engine instance for MongoDB.
  • Create a Cloud Storage bucket to store your backup data.
  • Create a Compute Engine instance for RecoverX.
  • Configure the MongoDB database.
  • Deploy RecoverX and connect from a remote location to configure backup policies.


This tutorial uses billable components of Google Cloud, including:

  • Compute Engine
  • Cloud Storage

Use the Pricing Calculator to generate a cost estimate based on your projected usage of Google Cloud.

In addition, RecoverX is licensed directly through Datos IO, based on the physical size of the database that needs to be protected. For pricing-related questions, contact

Before you begin

  1. Select or create a Google Cloud project.


  2. Enable billing for your project.


When you finish this tutorial, you can avoid continued billing by deleting the resources you created. See Cleaning up for more detail.

Provisioning your infrastructure for the tutorial

To complete this tutorial, create Compute Engine instances for the MongoDB nodes in cluster 1, a Cloud Storage bucket to store the backup data, and a Compute Engine instance for RecoverX in cluster 2.

Create a Compute Engine instance for the MongoDB nodes

  1. Create a Compute Engine instance. Use n1-standard-4 or the machine type that best meets your requirements.

  2. Deploy MongoDB on the Compute Engine instance.

Create and configure a Cloud Storage bucket

Create a Cloud Storage bucket to store your backup data.

  1. Create a Cloud Storage bucket using the same service account that you used to create the Compute Engine instances for MongoDB nodes.

  2. Configure an Identity and Access Management (IAM) role or Access Control List (ACL) for the service account to allow read and write access to the Cloud Storage bucket with the role of Editor.

Create a Compute Instance for RecoverX

Datos IO RecoverX is scale-out software that is deployed in a single node. The minimum node type is n1-standard-8. The recommended node type for most deployments is n1-standard-16.

  1. Create one Compute Engine instance of type n1-standard-8 using the service account.

  2. For the boot disk, select CentOS 7 or RHEL 7.x.

  3. In the Firewall section, select Allow HTTPS traffic.

  4. For the RecoverX Compute Engine instance, create an SSD disk (blank disk) with a capacity of at least 256 GB.

  5. Select the appropriate network.

  6. Format the file system and mount the volume.

    gcloud compute ssh [RECOVERX_NODE_NAME] \
        'sudo mkfs -t ext4 [VOLUME_NAME]; sudo mkdir /home sudo mount [VOLUME_NAME] /home'


    • [RECOVERX_NODE_NAME] is the name of your RecoverX node.
    • [VOLUME_NAME] is the name of your volume.

Configuring the MongoDB cluster

To protect the MongoDB database cluster using Datos IO RecoverX and extract data from the source cluster, create a Datos IO user account. Create the account and configure authentication for the user with the following minimum requirements:

  • The home directory of datos_db_user must have at least 100 GB local storage.
  • The default shell for datos_db_user must be bash.

Create a Datos IO user account, such as datos_db_user, on each MongoDB node with the same group ID (GID) as the MongoDB user.

  1. Add datos_db_user to the MongoDB group.

    gcloud compute ssh [MongoDB_NODE_INSTANCE_NAME] \
        'sudo useradd -g MongoDB -m datos_db_user'


    • [MongoDB_NODE_INSTANCE_NAME] is the name of your MongoDB node instance, to grant the appropriate permissions.
  2. Configure authentication for the user you created by using one of the following methods:

    • Username and password
    • Username and SSH key with passphrase
    • Username and SSH access key
  3. Give datos_db_user write permission to the /home/datos_db_user home directory on all MongoDB nodes.

    gcloud compute ssh [MongoDB_NODE_INSTANCE_NAME] \
        'sudo chmod -R u+w /home/datos_db_user'
  4. Give datos_db_user read and execute permissions to the MongoDB data directory and its parent directory on all MongoDB nodes.

    gcloud compute ssh [MongoDB_NODE_INSTANCE_NAME] \
        'sudo chmod -R g+rx /var/lib/MongoDB; sudo chmod -R g+rx /var/lib/MongoDB/data'
  5. Disable the SSH banner for datos_db_user in the datos_db_user home directory to hide the banner from the datos_db_user.

    gcloud compute ssh [MongoDB_NODE_INSTANCE_NAME] 'touch ~/.hushlogin'
  6. For each node of the MongoDB cluster, edit the file /etc/ssh/sshd_config and set sshd parameters MaxSessions to 500 and MaxStartups to 500:1:500.

  7. Verify that the Max Sessions and MaxStartups values of these parameters were picked up by the SSHD service.

    /usr/sbin/sshd -T | grep -i maxs
  8. To resolve Mongos node hostnames from the RecoverX server, connect to the config server and get hostnames of all Mongos instances in the MongoDB cluster. These hostnames are used to add firewall rules.

    mongo --host [CONFIG_IP] --port [CONFIG_PORT]


    • [CONFIG_IP] is the IP address of the primary MongoDB node
    • [CONFIG_PORT] is the port of the primary MongoDB node

Setting up communications to the Datos IO RecoverX cluster

In this section, you set up network firewall rules and configure the communication between the MongoDB hosts and the RecoverX cluster.

Set up network ports

Follow the instructions in firewall rules to open the network ports to turn on communication between RecoverX, the nodes running MongoDB, and the Cloud Storage bucket. In the following table, open the ports to the compute instance running RecoverX.

Network Protocols:Ports Purpose
From external sources to RecoverX TCP:9090 (user defined) Access Datos IO UI/API

In the following table, you open these ports from every node in the MongoDB cluster to the RecoverX compute instance. Opening the ports is necessary to let specific MongoDB services communicate.

Network Protocols:Ports Purpose
From MongoDB to MongoDB nodes TCP: (user defined) From all data nodes Mongod to Mongod
From RecoverX to MongoDB nodes TCP:27110 (or user defined) All Config server ports
From RecoverX to MongoDB nodes TCP:27017 (or user defined) All Mongos ports used
From RecoverX to MongoDB nodes TCP: (user defined) All Mongod data ports used
From MongoDB nodes to RecoverX TCP:5672 Messaging to Application Listener (RabbitMQ)

Create a Datos IO user on the RecoverX node

In this section, you create a Datos IO user, such as datos_db_user, on the RecoverX node with the following characteristics:

  • The home directory on the non-root volume that was previously created.
  • SSH access to all the nodes in the MongoDB cluster. It's recommended that you provide key-based, passwordless SSH access to each RecoverX node in the cluster.
  • The same group ID (GID) as the datos_db_user that was created on the MongoDB compute instances. For example, if the datos_db_user has GID 1001, datos_db_user must have GID 1001.

Add the Datos IO user to the RecoverX node.

  1. Retrieve the GID on the MongoDB compute instance.

    id datos_db_user
  2. On the RecoverX node, create a new group using the GID you retrieved.

    sudo groupadd -g [GID] MongoDB


    • [GID] is the group ID that you retrieved in the first step.
  3. Add the user.

    sudo useradd -g MongoDB -m datos_db_user -d /home/datos_db_user

Provide sudo privileges

On the RecoverX node, the datos_db_user needs sudo privileges for the following commands:


To add this privilege:

  1. Log in as the root user.

  2. Edit the configuration file for sudo access, using the visudo command.

    sudo visudo
  3. Append the following line to the file and save the file.

    datos_db_user ALL=NOPASSWD: /sbin/chkconfig, /bin/cp

Configure RecoverX nodes

Make the following changes in the limits.conf file of all RecoverX nodes.

  1. Edit the nproc and nofiles parameters in /etc/security/limits.conf to match the following commands.

    hard nproc unlimited
    soft nproc unlimited
    hard nofile 64000
    soft nofile 64000
  2. Edit the nproc parameter in /etc/security/limits.d/90-nproc.conf to match the following commands.

    hard nproc unlimited
    soft nproc unlimited
  3. Verify the changes.

    ulimit -a
  4. Verify that the /tmp directory has at least 2 GB empty space on each RecoverX node.

    df /tmp
  5. Make sure that the short name and FQDN of the RecoverX node is included in the /etc/hosts file. For example, a RecoverX node with the hostname datosserver.dom.local and an IP address of would have a hosts file listing similar to the following.

    cat /etc/hosts localhost.localdomain localhost datosserver datosserver.dom.local
    ::1 localhost6.localdomain6 localhost6

Installing RecoverX

When you install RecoverX, you use the datos_db_user name that you created earlier.

  1. Copy the RecoverX compressed tar file to the compute instances.

    gcloud compute scp datos_[VERSION].tar.gz :~
    gcloud compute ssh 'sudo mv datos_[VERSION].tar.gz \
        /home/datos_db_user; sudo chown datos_db_user \


    • [VERSION] is your version of the tar file.
  2. Uncompress the tar file on the target node. A top-level directory called datos_[VERSION] is included.

    tar -zxf datos_[VERSION].tar.gz
  3. Switch to the uncompressed Datos IO directory.

    cd datos_[VERSION]
  4. Install RecoverX in the target installation directory.

    ./install_datos --ip-address [IP_ADDRESS] \
        --target-dir /home/datos_db_user/datosinstall.


    • [IP_ADDRESS] is the internal IP addresses of the instances.

Registering RecoverX

After installation, you register RecoverX to run as a service.

  1. On the RecoverX node, run the following commands as a system admin with root privileges.

    sudo cp datos-server /etc/init.d/
    sudo chmod 755 /etc/init.d/datos-server
    sudo chkconfig --add datos-server
    sudo touch /var/lock/subsys/datos-server
  2. On the Recover X server, change to the datos user directory and run the following commands as a system admin with root privileges:

    sudo chown root $DATOS_INSTALL/lib/fuse/bin/fusermount
    sudo chmod u+s $DATOS_INSTALL/lib/fuse/bin/fusermount
    sudo modprobe fuse
    sudo mount -t fusectl fusectl /sys/fs/fuse/connections

Accessing RecoverX

To log in to the RecoverX console, you connect to the public IP address of the RecoverX node.

  1. Connect to the console using a browser.



    • [IP_ADDRESS] is the IP address of the node where RecoverX is deployed.
  2. When you log in for the first time, use the following credentials.

    • Username: admin
    • Password: admin
  3. To change the default password, click Settings, and select Change Password.

Configuring RecoverX

To add a data source to the Datos IO environment, you add the configured MongoDB database cluster and the provisioned Cloud Storage bucket. You can choose to use Cassandra or MongoDB as the type of data source.

  1. In Datos IO, click Configuration > Data sources.

  2. In the Data sources panel, click the Plus icon, and then configure the following data sources.

    • Data Source Type: Select Cassandra or MongoDB.
    • Cluster Name: Type a unique name for the data source.
    • Configuration IP:

      • For Cassandra, use the private IP address of any node.
      • For MongoDB, use the IP address of all configuration servers for sharded clusters and the primary node for non-sharded clusters.
    • Configuration Port:

      • For Cassandra, use the CQLSH port.
      • For MongoDB, use the port on which the node listens.
    • Source Authentication: Select the authentication method used by your source machines, and then provide the username, password, or key.

    • To enable driver authentication, click Cassandra/MongoDB Driver Authentication.

    • To ignore non-local secondary nodes in the MongoDB source replica sets, click Ignore Secondaries and enter the node IP address, hostnames, and port.

Cleaning up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

Delete the project

The easiest way to eliminate billing is to delete the project you created for the tutorial.

  1. In the Cloud Console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Delete your Compute Engine instance

  1. In the Cloud Console, go to the VM instances page.

    Go to VM instances

  2. Click the checkbox for the instance you want to delete.
  3. Click Delete to delete the instance.

Delete your Cloud Storage bucket

  1. In the Cloud Console, go to the Cloud Storage Browser page.

    Go to the Cloud Storage Browser page

  2. Click the checkbox for the bucket you want to delete.
  3. To delete the bucket, click Delete .

What's next