This page explains how to import a series of Hadoop sequence files into Cloud Bigtable. You must create the Hadoop sequence files by exporting a table from HBase or Cloud Bigtable.
If you need to import CSV data, see Import a CSV File into a Cloud Bigtable Table.
Before you begin
Before you import a table into Cloud Bigtable, you need to complete the following tasks:
The import process is the same regardless of whether you exported your table from HBase or Cloud Bigtable.
Check how much storage the original table uses, and make sure your Cloud Bigtable cluster has enough nodes for that amount of storage.
For details about how many nodes you need, see Storage utilization per node.
Creating a new Cloud Bigtable table
To import your data, you must create a new, empty table with the same column families as the exported table.
To create the new table:
gcloud components update gcloud components install cbt
createtablecommand to create the table:
cbt -instance [INSTANCE_ID] createtable [TABLE_NAME]
createfamilycommand as many times as necessary to create all of the column families:
cbt -instance [INSTANCE_ID] createfamily [TABLE_NAME] [FAMILY_NAME]
For example, if your table is called
my-table, and you want to add the column families
cbt -instance my-instance createfamily my-new-table cf1 cbt -instance my-instance createfamily my-new-table cf2
Importing the table
Cloud Bigtable provides a utility that uses Cloud Dataflow to import a table from a series of Hadoop sequence files.
To import the table:
Download the import/export JAR file, which includes all of the required dependencies:
curl -f -O http://repo1.maven.org/maven2/com/google/cloud/bigtable/bigtable-beam-import/1.10.0/bigtable-beam-import-1.10.0-shaded.jar
Run the following command to import the table, replacing values in brackets with the appropriate values. For
[TEMP_PATH], use a Cloud Storage path that does not yet exist, or the same path you used when you exported the table:
java -jar bigtable-beam-import-1.10.0-shaded.jar import \ --runner=dataflow \ --project=[PROJECT_ID] \ --bigtableInstanceId=[INSTANCE_ID] \ --bigtableTableId=[TABLE_ID] \ --sourcePattern='gs://[BUCKET_NAME]/[EXPORT_PATH]/part-*' \ --tempLocation=gs://[BUCKET_NAME]/[TEMP_PATH] \ --maxNumWorkers=[3x_NUMBER_OF_NODES] \ --zone=[DATAFLOW_JOB_ZONE]
For example, if the clusters in your Cloud Bigtable instance have 3 nodes:
java -jar bigtable-beam-import-1.10.0-shaded.jar import \ --runner=dataflow \ --project=my-project \ --bigtableInstanceId=my-instance \ --bigtableTableId=my-new-table \ --sourcePattern='gs://my-export-bucket/my-table/part-*' \ --tempLocation=gs://my-export-bucket/jar-temp \ --maxNumWorkers=9 \ --zone=us-east1-c
The import job loads the Hadoop sequence files into your Cloud Bigtable table. You can use the Google Cloud Platform Console to monitor the import job while it runs.
When the job is complete, it prints the message
Job finished with status DONEto the console.
Checking the results of the import process
You can verify that the table was imported by using the
cbt tool to count the
number of rows in the table:
cbt -instance [INSTANCE_ID] count [TABLE_NAME]
The command prints the total number of rows in the table. Verify that the total number of rows is consistent with the number of rows in the exported table.
Learn how to export sequence files from HBase or Cloud Bigtable.