Capturing data from HDFS

The HDFS capture method uses NFS shares on the Transfer Appliance, which allows you to copy data from HDFS to the Transfer Appliance by exporting directories on the Transfer Appliance via NFS. You can mount the NFS shares to HDFS, and then copy data to the Transfer Appliance.

Data copied onto the exported NFS share is encrypted and saved in a staging area on the Transfer Appliance.

Prerequisites

You need to know the following information to export an NFS share:

In addition, make sure that:

NOTE: The reserved capacity of Transfer Appliance when using NFS share capture method is:

  • TA100 — 8.5TB
  • TA480 — 18TB

    When using the TA480 and NFS share capture method, you must create a minimum of four shares that are each 101TB. You can create more shares, as long as the total size of all shares is 405TB or less. We recommend that you arrange your data in chunks to fit on the NFS shares for the sizes that you create. For example, if you create four 101TB shares, you'd arrange your data to fit within 101TB or smaller chunks.

  • You have identified the directories of HDFS you will transfer.

Creating NFS shares

To create NFS shares:

  1. Open the Transfer Appliance Web User Interface.

  2. Click Menu .

  3. Select Export NFS share.

    The Export NFS share window appears and displays default NFS shares. The TA100 will have a single NFS share. The TA480 will have four NFS shares.

    The space available for NFS shares depends on how much data was previously captured on the Transfer Appliance using other capture methods.

  4. Click More next to a share, and click Export NFS share. The Export NFS share window is displayed.

  5. Enter the NFS share name.

  6. For the NFS client name(s) or IP Address(es), enter the comma separated client names or IP addresses on which you wish to mount and access the NFS share.

  7. If the share is to be accessible for all clients, click the Accessible to all clients checkbox.

  8. Click Export to export the NFS share on Transfer Appliance.

  9. If you want to access the NFS share from additional NFS clients, click More next to the share, and edit the client name or IP address.

Mounting NFS shares

To capture data from the HDFS, you mount each of the NFS shares on the Transfer Appliance to each Hadoop worker nodes and to the edge node of the Hadoop cluster. Mounting the NFS shares on the edge node allows you to view the files on the Transfer Appliance from the edge node.

Run the following command on each worker node and the edge node to mount the NFS share on each of the nodes:

sudo mount -t nfs [TRANSFER_APPLIANCE_ADDRESS]:/nfs/[SHARE_NAME] -o rw,nolock,hard,intr,noatime,nocto [SHARE_MOUNT_PATH]

For example, where share1 is the share name and /mnt/nfs1/ is the mount path:

sudo mount -t nfs 192.168.5.10:/nfs/share1 -o rw,nolock,hard,intr,noatime,nocto /mnt/nfs1/

Transferring data from HDFS

To transfer data, run the following command from the edge node:

hdfs distcp -m 100 hdfs_directory_1 file:///mnt/nfs_appliance_1

You can execute the command in parallel or serially, depending on the resources in your cluster.

Next steps

If you have data on NFS you'd like to capture, see Exporting an NFS share.

To monitor data capture jobs, see Monitoring data capture jobs.

If you are done capturing data, see Preparing and shipping an Transfer Appliance.