Importing Data from Sequence Files

This page explains how to import a series of Hadoop sequence files into Cloud Bigtable. You must create the Hadoop sequence files by exporting a table from HBase or Cloud Bigtable.

Before you start

Before you import a table into Cloud Bigtable, you need to complete the following tasks:

The import process is the same regardless of whether you exported your table from HBase or Cloud Bigtable.

Creating a new Cloud Bigtable table

Before importing the table, you need to create a new, empty table with the same column families as the exported table.

To create the new table:

  1. In the Google Cloud Platform Console, click the Cloud Shell icon (Cloud Shell icon) in the upper right corner.
  2. When Cloud Shell is ready to use, download and unzip the quickstart files:
    curl -f -O https://storage.googleapis.com/cloud-bigtable/quickstart/GoogleCloudBigtable-Quickstart-0.9.4.zip
    unzip GoogleCloudBigtable-Quickstart-0.9.4.zip
  3. Change to the quickstart directory, then start the HBase shell:

    ./quickstart.sh
  4. Use the create command to create the table and its column families:
    create '[TABLE_NAME]', '[FAMILY_NAME_1]', '[FAMILY_NAME_2]', ... '[FAMILY_NAME_N]'

    For example, to create a table called my-new-table with the column families cf1 and cf2:

    create 'my-new-table', 'cf1', 'cf2'

Importing the table

To import the table:

  1. Clone the GitHub repository GoogleCloudPlatform/cloud-bigtable-examples, which provides dependencies for importing the table:

    git clone https://github.com/GoogleCloudPlatform/cloud-bigtable-examples.git
    
  2. In the directory where you cloned the GitHub repository, change to the directory java/dataproc-wordcount.

  3. Run the following command to build the project, replacing values in brackets with the appropriate values:

    mvn clean package -Dbigtable.projectID=[PROJECT_ID] \
        -Dbigtable.instanceID=[BIGTABLE_INSTANCE_ID]
    
  4. Run the following command to import the table, replacing values in brackets with the appropriate values:

    gcloud dataproc jobs submit hadoop --cluster [DATAPROC_CLUSTER_NAME] \
        --class com.google.cloud.bigtable.mapreduce.Driver \
        --jar target/wordcount-mapreduce-0-SNAPSHOT-jar-with-dependencies.jar \
        import-table [NEW_TABLE_NAME] gs://[CLOUD_STORAGE_SOURCE_PATH]
    

    For example:

    gcloud dataproc jobs submit hadoop --cluster dp \
        --class com.google.cloud.bigtable.mapreduce.Driver \
        --jar target/wordcount-mapreduce-0-SNAPSHOT-jar-with-dependencies.jar \
        import-table my-new-table gs://my-storage-bucket/my-table
    

    When the job is complete, it prints status information to the console, including the message Job [JOB_ID] finished successfully.

Checking the results of the import process

You can verify that the table was imported by using the HBase shell to count the number of rows in the table.

  1. In the Google Cloud Platform Console, click the Cloud Shell icon (Cloud Shell icon) in the upper right corner.
  2. When Cloud Shell is ready to use, download and unzip the quickstart files:
    curl -f -O https://storage.googleapis.com/cloud-bigtable/quickstart/GoogleCloudBigtable-Quickstart-0.9.4.zip
    unzip GoogleCloudBigtable-Quickstart-0.9.4.zip
  3. Change to the quickstart directory, then start the HBase shell:

    ./quickstart.sh
  4. Use the count command to count the number of rows in your table:
    count '[TABLE_NAME]'

    The command prints status information about the counting process, including the current row key, followed by the total number of rows in the table. For example:

    Current count: 1000, row: BATTERY#20150301124501001
    Current count: 2000, row: BATTERY#20150310175302853
    Current count: 3000, row: BATTERY#20150321114500814
    Current count: 4000, row: CPU#20150301073001573
    Current count: 5000, row: CPU#20150317133000035
    5912 row(s) in 2.5160 seconds

    Verify that the total number of rows is consistent with the number of rows in the exported table.

What's next

Learn how to export sequence files from HBase or Cloud Bigtable.

Send feedback about...

Cloud Bigtable Documentation