Exporting Data as Sequence Files

This page explains how to export a table from HBase or Cloud Bigtable as a series of Hadoop sequence files.

If you're migrating from HBase, you can export your table from HBase, then import the table into Cloud Bigtable.

If you're backing up or moving a Cloud Bigtable table, you can export your table from Cloud Bigtable, then import the table back into Cloud Bigtable.

Exporting a table from HBase

Identifying the table's column families

When you export a table, you should record a list of column families that the table uses. You will need this information when you import the table into Cloud Bigtable.

To get a list of column families in your table:

  1. Log into your HBase server.
  2. Start the HBase shell:

    hbase shell
    
  3. Use the describe command to get information about the table you plan to export:

    describe '[TABLE_NAME]'
    

    The describe command prints detailed information about the table's column families.

Exporting sequence files

The HBase server provides a utility that exports a table as a series of Hadoop sequence files. See the HBase documentation for instructions on using this utility.

Copying sequence files to Cloud Storage

Use the gsutil tool to copy the exported sequence files to a Cloud Storage bucket, replacing values in brackets with the appropriate values:

gsutil cp [SEQUENCE_FILES] gs://[BUCKET_PATH]

See the gsutil documentation for details about the gsutil cp command.

Exporting a table from Cloud Bigtable

Before you export a Cloud Bigtable table, you need to create a Hadoop cluster using Cloud Dataproc. You can then use the Cloud Dataproc cluster to export the table to a Cloud Storage bucket.

Identifying the table's column families

When you export a table, you should record a list of column families that the table uses. You will need this information when you import the table.

To get a list of column families in your table:

  1. In the Google Cloud Platform Console, click the Cloud Shell icon (Cloud Shell icon) in the upper right corner.
  2. When Cloud Shell is ready to use, download and unzip the quickstart files:
    curl -f -O https://storage.googleapis.com/cloud-bigtable/quickstart/GoogleCloudBigtable-Quickstart-0.9.4.zip
    unzip GoogleCloudBigtable-Quickstart-0.9.4.zip
  3. Change to the quickstart directory, then start the HBase shell:

    ./quickstart.sh
  4. Use the describe command to get information about the table you plan to export:
    describe '[TABLE_NAME]'

    The describe command prints detailed information about the table's column families.

Creating a Cloud Storage bucket

You can store your exported table in an existing Cloud Storage bucket or in a new bucket. To create a new bucket, use the gsutil tool, replacing [BUCKET_NAME] with the appropriate value:

gsutil mb gs://[BUCKET_NAME]

See the gsutil documentation for details about the gsutil mb command.

Exporting sequence files

To export the table as a series of sequence files:

  1. Clone the GitHub repository GoogleCloudPlatform/cloud-bigtable-examples, which provides dependencies for exporting the table:

    git clone https://github.com/GoogleCloudPlatform/cloud-bigtable-examples.git
    
  2. In the directory where you cloned the GitHub repository, change to the directory java/dataproc-wordcount.

  3. Run the following command to build the project, replacing values in brackets with the appropriate values:

    mvn clean package -Dbigtable.projectID=[PROJECT_ID] \
        -Dbigtable.instanceID=[BIGTABLE_INSTANCE_ID]
    
  4. Run the following command to export the table, replacing values in brackets with the appropriate values. Make sure that [CLOUD_STORAGE_EXPORT_PATH] is a Cloud Storage path that does not yet exist:

    gcloud dataproc jobs submit hadoop --cluster [DATAPROC_CLUSTER_NAME] \
        --class com.google.cloud.bigtable.mapreduce.Driver \
        --jar target/wordcount-mapreduce-0-SNAPSHOT-jar-with-dependencies.jar \
        export-table [TABLE_NAME] gs://[CLOUD_STORAGE_EXPORT_PATH]
    

    For example:

    gcloud dataproc jobs submit hadoop --cluster dp \
        --class com.google.cloud.bigtable.mapreduce.Driver \
        --jar target/wordcount-mapreduce-0-SNAPSHOT-jar-with-dependencies.jar \
        export-table my-table gs://my-export-bucket/my-table
    

    The export job saves your table to the Cloud Storage bucket as a set of Hadoop sequence files.

    When the job is complete, it prints status information to the console, including the message Job [JOB_ID] finished successfully.

What's next

Learn how to import sequence files into Cloud Bigtable.

Send feedback about...

Cloud Bigtable Documentation