Capturing data from HDFS

The HDFS capture method uses NFS shares on the Transfer Appliance, which allows you to copy data from HDFS to the Transfer Appliance by exporting directories on the Transfer Appliance via NFS. You can mount the NFS shares to HDFS, and then copy data to the Transfer Appliance.

Data copied onto the exported NFS share is encrypted and saved in a staging area on the Transfer Appliance.


You need to know the following information to export an NFS share:

In addition, make sure that:

Creating NFS shares

To create NFS shares:

  1. Open the Transfer Appliance Web User Interface.

  2. Select Data Capture from the Menu icon.

  3. Select Export NFS Share from the Data Capture pane.

    The NFS Share window appears and displays any currently available NFS shares.

  4. Click Create Share to create a new NFS share.

  5. Enter the NFS share name.

  6. For the NFS client name(s) or IP Address(es), enter the comma separated client names or IP addresses on which you wish to mount and access the NFS share.

  7. If the share is to be accessible for all clients, click the Accessible to all clients checkbox.

  8. Click Create to create the new NFS share on Transfer Appliance.

  9. If you want to access the NFS share from additional NFS clients, click the NFS Share radio button and edit the client name or IP address.

Mounting NFS shares

To capture data from the HDFS, you mount each of the NFS shares on the Transfer Appliance to each Hadoop worker nodes and to the edge node of the Hadoop cluster. Mounting the NFS shares on the edge node allows you to view the files on the Transfer Appliance from the edge node.

Run the following command on each worker node and the edge node to mount the NFS share on each of the nodes:

sudo mount -t nfs [TRANSFER_APPLIANCE_ADDRESS]:/nfs/[SHARE_NAME] -o rw,nolock,hard,intr,noatime,nocto [SHARE_MOUNT_PATH]

For example, where share1 is the share name and /mnt/nfs1/ is the mount path:

sudo mount -t nfs -o rw,nolock,hard,intr,noatime,nocto /mnt/nfs1/

Transferring data from HDFS

To transfer data, run the following command from the edge node:

distcp -m 100 hdfs_directory_1 file:///mnt/nfs_appliance_1

You can execute the command in parallel or serially, depending on the resources in your cluster.

Processing the NFS shares

Processing an NFS share compresses and deduplicates the previously encrypted data stored on the NFS share staging area.

Once you start processing, you cannot copy data to the NFS share. When processing is complete, the NFS share is deleted from the Transfer Appliance.

To process NFS exports:

  1. Open the Transfer Appliance Web User Interface.

  2. Navigate to the Job Monitor window to monitor changes to the data size for NFS exports. The data is compressed as part of its processing.

  3. Click the Settings icon.

  4. Select Process. The Process an NFS Share window appears.

  5. If you want to exclude symbolic links, click the Exclude symlinks checkbox.

  6. Click OK.

Next steps

If you have data on NFS you'd like to capture, see Exporting an NFS share.

To monitor data capture jobs, see Monitoring data capture jobs.

If you are done capturing data, see Preparing and shipping an Transfer Appliance.

Was this page helpful? Let us know how we did:

Send feedback about...

Transfer Appliance Documentation