Differences between HBase and Bigtable

One way to access Bigtable is to use a customized version of the Apache HBase client for Java. In general, the customized client exposes the same API as a standard installation of HBase. This page describes the differences between the Cloud Bigtable HBase client for Java and a standard HBase installation. Many of these difference are related to management tasks that Bigtable handles automatically.

The Cloud Bigtable HBase client for Java is for customers who are migrating to Bigtable from HBase and want to continue using the HBase API. In all other cases, Java developers should instead use the Cloud Bigtable client for Java, which calls the Bigtable APIs.

Column families

When you create a column family, you cannot configure the block size or compression method, either with the HBase shell or through the HBase API. Bigtable manages the block size and compression for you.

In addition, if you use the HBase shell to get information about a table, the HBase shell will always report that each column family does not use compression. In reality, Bigtable uses proprietary compression methods for all of your data.

Bigtable requires that column family names follow the regular expression [_a-zA-Z0-9][-_.a-zA-Z0-9]*. If you are importing data into Bigtable from HBase, you might need to first change the family names to follow this pattern.

Rows and cells

You cannot define an ACL for an individual row.
You cannot set the visibility of individual cells.
Tags are not supported. You cannot use the class org.apache.hadoop.hbase.Tag to add metadata to individual cells.

Mutations and deletions

Append operations in Bigtable are fully atomic for both readers and writers. Readers will never be able to read a partially applied Append operation.
Deleting a specific version of a specific column based on its timestamp is supported, but deleting all values with a specific timestamp in a given column family or row is not supported. The following methods in the class org.apache.hadoop.hbase.client.Delete are not supported:
- new Delete(byte[] row, long timestamp)
- addColumn(byte[] family, byte[] qualifier)
- addFamily(byte[] family, long timestamp)
- addFamilyVersion(byte[] family, long timestamp)
In HBase, deletes mask puts, but Bigtable does not mask puts after deletes when put requests are sent after deletion requests. This means that in Bigtable, a write request sent to a cell is not affected by a previously sent delete request to the same cell.

Gets and scans

Reverse scans let you read a range of rows backwards. For details, see Reverse scans.
Unlike HBase, when you send a read request, Bigtable doesn't automatically filter out expired data that is marked for deletion in an upcoming garbage collection cycle. To avoid reading expired data, use a filter in the read request. For more information, see the Garbage collection overview.
Querying versions of column families within a timestamp range is not supported. You cannot call the following methods:
Limiting the number of values per row per column family is not supported. You cannot call the method org.apache.hadoop.hbase.client.Scan#setMaxResultsPerColumnFamily(int limit).
Setting the maximum number of cells to return for each call to next() is not supported. Calls to the method org.apache.hadoop.hbase.client.Scan#setBatch(int batch) are ignored.
Setting the number of rows for caching is not supported. Calls to the method org.apache.hadoop.hbase.client.Scan#setCaching(int caching) are ignored.

Coprocessors

Coprocessors are not supported. You cannot create classes that implement the interface org.apache.hadoop.hbase.coprocessor.

Filters

The following table shows which filters are supported. All of these filters are in the package org.apache.hadoop.hbase.filter.

Supported	Supported, with limitations	Not supported
Supports only a single column family. Calling `setLenAsVal(true)` is not supported. Supports only the `BinaryComparator` comparator. If any operator other than `EQUAL` is used, only a single column family is supported. Supports only the following comparators: `BinaryComparator` `RegexStringComparator` with no flags (flags are ignored) and the `EQUAL` operator If a `PageFilter` is in a `FilterList`, `PageFilter` will only work similarly to HBase when the `FilterList` is set to `MUST_PASS_ALL`, which is the default behavior. If the `FilterList` is set to `MUST_PASS_ONE`, Bigtable will treat the `PageFilter` as a `MUST_PASS_ALL` and only return a number of rows corresponding to the `PageFilter`'s pageSize. `PrefixFilter` scans for rows in the `PrefixFilter` in most cases. However, if `PrefixFilter` is part of a `FilterList` and has the operator `MUST_PASS_ONE`, Bigtable cannot determine the implied range and instead performs an unfiltered scan from the start row to the stop row. Use `PrefixFilter` with `BigtableExtendedScan` or a combination of filters to optimize performance in this case. Relies on the Bigtable condition filter, which can be slow. Supported but not recommended.
`ColumnPrefixFilter` `FamilyFilter` `FilterList` `FuzzyRowFilter` `MultipleColumnPrefixFilter` `MultiRowRangeFilter` `PrefixFilter` ⁶ `RandomRowFilter` `TimestampsFilter`	`ColumnCountGetFilter` ¹ `ColumnPaginationFilter` ¹ `ColumnRangeFilter` ¹ `FirstKeyOnlyFilter` ¹ `KeyOnlyFilter` ² `PageFilter` ⁵ `QualifierFilter` ³ `RowFilter` ^{1, 4} `SingleColumnValueExcludeFilter` ^{1, 4, 7} `SingleColumnValueFilter` ^{4, 7} `ValueFilter` ⁴	`DependentColumnFilter` `FirstKeyValueMatchingQualifiersFilter` `InclusiveStopFilter` `ParseFilter` `SkipFilter` `WhileMatchFilter`

In addition, the following differences affect Bigtable filters:

In filters that use the regular expression comparator (org.apache.hadoop.hbase.filter.RegexStringComparator), regular expressions use RE2 syntax, not Java syntax.
Custom filters are not supported. You cannot create classes that inherit from org.apache.hadoop.hbase.filter.Filter.
There is a size limit of 20 KB on filter expressions. As a workaround to reduce the size of a filter expression, use a supplementary column that stores the hash value of the filter criteria.

Timestamps

Bigtable stores timestamps in microseconds, while HBase stores timestamps in milliseconds. This distinction has implications when you use the HBase client library for Bigtable and you have data with reversed timestamps.

The client library converts between microseconds and milliseconds, but because that the largest HBase timestamp that Bigtable can store is Long.MAX_VALUE/1000, any value larger than that is converted to Long.MAX_VALUE/1000. As a result, large reversed timestamp values might not convert correctly.

Administration

This section describes methods in the interface org.apache.hadoop.hbase.client.Admin that are not available on Bigtable, or that behave differently on Bigtable than on HBase. These lists are not exhaustive, and they might not reflect the most recently added HBase API methods.

Most of these methods are unnecessary on Bigtable, because management tasks are handled automatically. A few methods are not available because they relate to features that Bigtable does not support.

General maintenance tasks

Bigtable handles most maintenance tasks automatically. As a result, the following methods are not available:

abort(String why, Throwable e)
balancer()
enableCatalogJanitor(boolean enable)
getMasterInfoPort()
getOperationTimeout()
isCatalogJanitorEnabled()
rollWALWriter(ServerName serverName)
runCatalogScan()
setBalancerRunning(boolean on, boolean synchronous)
shutdown()
stopMaster()
updateConfiguration()
updateConfiguration(ServerName serverName)

Locality groups

Bigtable does not allow you to specify locality groups for column families. As a result, you cannot call HBase methods that return a locality group.

Namespaces

Bigtable does not use namespaces. You can use row key prefixes to simulate namespaces. The following methods are not available:

createNamespace(NamespaceDescriptor descriptor)
deleteNamespace(String name)
getNamespaceDescriptor(String name)
listNamespaceDescriptors()
listTableDescriptorsByNamespace(String name)
listTableNamesByNamespace(String name)
modifyNamespace(NamespaceDescriptor descriptor)

Region management

Bigtable uses tablets, which are similar to regions. Bigtable manages your tablets automatically. As a result, the following methods are not available:

assign(byte[] regionName)
closeRegion(byte[] regionname, String serverName)
closeRegion(ServerName sn, HRegionInfo hri)
closeRegion(String regionname, String serverName)
closeRegionWithEncodedRegionName(String encodedRegionName, String serverName)
compactRegion(byte[] regionName)
compactRegion(byte[] regionName, byte[] columnFamily)
compactRegionServer(ServerName sn, boolean major)
flushRegion(byte[] regionName)
getAlterStatus(byte[] tableName)
getAlterStatus(TableName tableName)
getCompactionStateForRegion(byte[] regionName)
getOnlineRegions(ServerName sn)
majorCompactRegion(byte[] regionName)
majorCompactRegion(byte[] regionName, byte[] columnFamily)
mergeRegions(byte[] encodedNameOfRegionA, byte[] encodedNameOfRegionB, boolean forcible)
move(byte[] encodedRegionName, byte[] destServerName)
offline(byte[] regionName)
splitRegion(byte[] regionName)
splitRegion(byte[] regionName, byte[] splitPoint)
stopRegionServer(String hostnamePort)
unassign(byte[] regionName, boolean force)

Snapshots

The following methods are not available.

deleteSnapshots(Pattern pattern)
deleteSnapshots(String regex)
isSnapshotFinished(HBaseProtos.SnapshotDescription snapshot)
restoreSnapshot(byte[] snapshotName)
restoreSnapshot(String snapshotName)
restoreSnapshot(byte[] snapshotName, boolean takeFailSafeSnapshot)
restoreSnapshot(String snapshotName, boolean takeFailSafeSnapshot)
snapshot(HBaseProtos.SnapshotDescription snapshot)

Table management

Tasks such as table compaction are handled automatically. As a result, the following methods are not available:

compact(TableName tableName)
compact(TableName tableName, byte[] columnFamily)
flush(TableName tableName)
getCompactionState(TableName tableName)
majorCompact(TableName tableName)
majorCompact(TableName tableName, byte[] columnFamily)
modifyTable(TableName tableName, HTableDescriptor htd)
split(TableName tableName)
split(TableName tableName, byte[] splitPoint)

Coprocessors

Bigtable does not support coprocessors. As a result, the following methods are not available:

coprocessorService()
coprocessorService(ServerName serverName)
getMasterCoprocessors()

Distributed procedures

Bigtable does not support distributed procedures. As a result, the following methods are not available:

execProcedure(String signature, String instance, Map<String, String> props)
execProcedureWithRet(String signature, String instance, Map<String, String> props)
isProcedureFinished(String signature, String instance, Map<String, String> props)