Differences between HBase and Cloud Bigtable
One way to access Cloud Bigtable is to use a customized version of the Apache HBase client for Java. In general, the customized client exposes the same API as a standard installation of HBase. This page describes the differences between the Bigtable HBase client for Java and a standard HBase installation. Many of these difference are related to management tasks that Bigtable handles automatically.
When you create a column family, you cannot configure the block size or compression method, either with the HBase shell or through the HBase API. Bigtable manages the block size and compression for you.
In addition, if you use the HBase shell to get information about a table, the HBase shell will always report that each column family does not use compression. In reality, Bigtable uses proprietary compression methods for all of your data.
Bigtable requires that column family names follow the regular
[_a-zA-Z0-9][-_.a-zA-Z0-9]*. If you are importing
data into Bigtable HBase, you might need to first change the
family names to follow this pattern.
Rows and cells
- You cannot define an ACL for an individual row.
- You cannot set the visibility of individual cells.
- Tags are not supported. You cannot use the class
org.apache.hadoop.hbase.Tagto add metadata to individual cells.
Mutations and deletions
Appendoperations in Bigtable are fully atomic for both readers and writers. Readers will never be able to read a partially applied
- Deleting a specific version of a specific column based on its timestamp is
supported, but deleting all values with a specific timestamp in a given column
family or row is not supported. The following methods in the class
org.apache.hadoop.hbase.client.Deleteare not supported:
new Delete(byte row, long timestamp)
addColumn(byte family, byte qualifier)
addFamily(byte family, long timestamp)
addFamilyVersion(byte family, long timestamp)
- In HBase, deletes mask puts, but Bigtable does not mask puts after deletes when put requests are sent after deletion requests. This means that in Bigtable, a write request sent to a cell is not affected by a previously sent delete request to the same cell.
Gets and scans
- Reverse scans are available in Preview. For details, see Reverse scans.
- Querying versions of column families within a timestamp range is not
supported. You cannot call the following methods:
org.apache.hadoop.hbase.client.Query#setColumnFamilyTimeRange(byte cf, long minStamp, long maxStamp)
org.apache.hadoop.hbase.client.Get#setColumnFamilyTimeRange(byte cf, long minStamp, long maxStamp)
org.apache.hadoop.hbase.client.Scan#setColumnFamilyTimeRange(byte cf, long minStamp, long maxStamp)
- Limiting the number of values per row per column family is not supported. You
cannot call the method
- Setting the maximum number of cells to return for each call to
next()is not supported. Calls to the method
org.apache.hadoop.hbase.client.Scan#setBatch(int batch)are ignored.
- Setting the number of rows for caching is not supported. Calls to the method
org.apache.hadoop.hbase.client.Scan#setCaching(int caching)are ignored.
Coprocessors are not supported. You cannot create classes that implement the
The following table shows which filters are currently supported. All of these
filters are in the package
|Supported||Supported, with limitations||Not supported|
In addition, the following differences affect Bigtable filters:
- In filters that use the regular expression comparator
org.apache.hadoop.hbase.filter.RegexStringComparator), regular expressions use RE2 syntax, not Java syntax.
- Custom filters are not supported. You cannot create classes that inherit from
- There is a size limit of 20 KB on filter expressions. As a workaround to reduce the size of a filter expression, use a supplementary column that stores the hash value of the filter criteria.
Bigtable stores timestamps in microseconds, while HBase stores timestamps in milliseconds. This distinction has implications when you use the HBase client library for Bigtable and you have data with reversed timestamps.
The client library converts between microseconds and milliseconds, but because that the largest HBase timestamp that Bigtable can store is Long.MAX_VALUE/1000, any value larger than that is converted to Long.MAX_VALUE/1000. As a result, large reversed timestamp values might not convert correctly.
This section describes methods in the interface
org.apache.hadoop.hbase.client.Admin that are not available
on Bigtable, or that behave differently on Bigtable
than on HBase. These lists are not exhaustive, and they might not reflect the
most recently added HBase API methods.
Most of these methods are unnecessary on Bigtable, because management tasks are handled automatically. A few methods are not available because they relate to features that Bigtable does not support.
General maintenance tasks
Bigtable handles most maintenance tasks automatically. As a result, the following methods are not available:
abort(String why, Throwable e)
setBalancerRunning(boolean on, boolean synchronous)
Bigtable does not allow you to specify locality groups for column families. As a result, you cannot call HBase methods that return a locality group.
Bigtable does not use namespaces. You can use row key prefixes to simulate namespaces. The following methods are not available:
Bigtable uses tablets, which are similar to regions. Bigtable manages your tablets automatically. As a result, the following methods are not available:
closeRegion(byte regionname, String serverName)
closeRegion(ServerName sn, HRegionInfo hri)
closeRegion(String regionname, String serverName)
closeRegionWithEncodedRegionName(String encodedRegionName, String serverName)
compactRegion(byte regionName, byte columnFamily)
compactRegionServer(ServerName sn, boolean major)
majorCompactRegion(byte regionName, byte columnFamily)
mergeRegions(byte encodedNameOfRegionA, byte encodedNameOfRegionB, boolean forcible)
move(byte encodedRegionName, byte destServerName)
splitRegion(byte regionName, byte splitPoint)
unassign(byte regionName, boolean force)
The following methods are not available.
restoreSnapshot(byte snapshotName, boolean takeFailSafeSnapshot)
restoreSnapshot(String snapshotName, boolean takeFailSafeSnapshot)
Tasks such as table compaction are handled automatically. As a result, the following methods are not available:
compact(TableName tableName, byte columnFamily)
majorCompact(TableName tableName, byte columnFamily)
modifyTable(TableName tableName, HTableDescriptor htd)
split(TableName tableName, byte splitPoint)
Bigtable does not support coprocessors. As a result, the following methods are not available:
Bigtable does not support distributed procedures. As a result, the following methods are not available:
execProcedure(String signature, String instance, Map<String, String> props)
execProcedureWithRet(String signature, String instance, Map<String, String> props)
isProcedureFinished(String signature, String instance, Map<String, String> props)