Capturing data from HDFS

The HDFS capture method uses NFS shares on the Transfer Appliance, which allows you to copy data from HDFS to the Transfer Appliance by exporting directories on the Transfer Appliance via NFS. You can mount the NFS shares to HDFS, and then copy data to the Transfer Appliance.

Data copied onto the exported NFS share is encrypted and saved in a staging area on the Transfer Appliance.


You need to know the following information to export an NFS share:

In addition, make sure that:

NOTE: The maximum usable capacity of Transfer Appliance when using NFS share capture method is:

  • TA100 — 74TB
  • TA480 — 400TB

    When using the TA480 and NFS share capture method, you must create a minimum of four shares that are each 100TB. You can create more shares, as long as the total size of all shares is 400TB or less. We recommend that you arrange your data in chunks to fit on the NFS shares for the sizes that you create. For example, if you create four 100TB shares, you'd arrange your data to fit within 100TB or smaller chunks.

  • You have identified the directories of HDFS you will transfer.

Creating NFS shares

To create NFS shares:

  1. Open the Transfer Appliance Web User Interface.

  2. Click Menu .

  3. Select Export NFS Share.

    The NFS Share window appears and displays any currently available NFS shares.

  4. Click Create Share to create a new NFS share.

  5. Enter the NFS share name.

  6. For the NFS client name(s) or IP Address(es), enter the comma separated client names or IP addresses on which you wish to mount and access the NFS share.

  7. If the share is to be accessible for all clients, click the Accessible to all clients checkbox.

  8. Click Create to create the new NFS share on Transfer Appliance.

  9. If you want to access the NFS share from additional NFS clients, click the NFS Share radio button and edit the client name or IP address.

Mounting NFS shares

To capture data from the HDFS, you mount each of the NFS shares on the Transfer Appliance to each Hadoop worker nodes and to the edge node of the Hadoop cluster. Mounting the NFS shares on the edge node allows you to view the files on the Transfer Appliance from the edge node.

Run the following command on each worker node and the edge node to mount the NFS share on each of the nodes:

sudo mount -t nfs [TRANSFER_APPLIANCE_ADDRESS]:/nfs/[SHARE_NAME] -o rw,nolock,hard,intr,noatime,nocto [SHARE_MOUNT_PATH]

For example, where share1 is the share name and /mnt/nfs1/ is the mount path:

sudo mount -t nfs -o rw,nolock,hard,intr,noatime,nocto /mnt/nfs1/

Transferring data from HDFS

To transfer data, run the following command from the edge node:

distcp -m 100 hdfs_directory_1 file:///mnt/nfs_appliance_1

You can execute the command in parallel or serially, depending on the resources in your cluster.

Processing the NFS shares

Processing an NFS share compresses and deduplicates the previously encrypted data stored on the NFS share staging area.

Once you start processing, you cannot copy data to the NFS share. When processing is complete, the NFS share is deleted from the Transfer Appliance.

To process NFS exports:

  1. Open the Transfer Appliance Web User Interface.

  2. Go to the Job Status window to monitor changes to the data size for NFS exports. The data is compressed as part of its processing.

  3. Click List .

  4. Select Process.

    The Process an NFS Share window appears.

  5. If you want to exclude symbolic links, click the Exclude symlinks checkbox.

  6. Click OK.

Next steps

If you have data on NFS you'd like to capture, see Exporting an NFS share.

To monitor data capture jobs, see Monitoring data capture jobs.

If you are done capturing data, see Preparing and shipping an Transfer Appliance.