Transactions with ACID semantics is supported by Apache Hive metastores in Dataproc Metastore. For more information, see Hive Transactions. These transactions are enabled by default on Hive 3.
Configurations
You must set server and client side configurations in order to enable transaction support.
Server side configurations
The following server side configurations are set by default during the creation of the service by Dataproc Metastore. You can choose to override these by entering Key and Value overrides under Metastore config overrides.
metastore.compactor.initiator.on
— Whether to run the initiator and cleaner threads on the Dataproc Metastore service.Set to
true
to enable the initiator.metastore.compactor.worker.threads
— The number of compactor worker threads to run on the Dataproc Metastore.Set to a positive number to enable the compactor. Setting this to a higher number may affect the performance of the service, especially if you're on Developer tier. If this number needs to be tweaked, we recommend using a lower value, such as 8.
hive.metastore.event.db.notification.api.auth
— Whether the Dataproc Metastore service should authorize against database notification related APIs.Set to
false
. If set totrue
, then only the superusers in proxy settings have permission. See Metastore notification API security for more information on superuser proxy privilege.
Client side configurations
Client side configurations are set in the Hive client as described in Validate transactions.
hive.support.concurrency
— Set totrue
to support insert, update, and delete transactions.hive.exec.dynamic.partition.mode
— In strict mode, you must specify at least one static partition in case all partitions are accidentally overwritten. In nonstrict mode, all partitions are allowed to be dynamic.Set to
nonstrict
to support insert, update, and delete transactions.hive.txn.manager
— Set toorg.apache.hadoop.hive.ql.lockmgr.DbTxnManager
.
Validate transactions
You can validate Hive transactions using a Dataproc cluster that uses a Dataproc Metastore service on Hive 3.
You must create the Dataproc cluster in the same project as the
Dataproc Metastore service and with Hive 3. The Dataproc
2.0 images, 2.0-ubuntu18 and 2.0-debian10, support Hive 3 and transactions. You
can use the flag --image-version
to set the 2.0 image. For example:
gcloud dataproc clusters create DATAPROC_CLUSTER_ID \
--dataproc-metastore=projects/PROJECT_ID/locations/LOCATION/services/SERVICE \
--region=REGION \
--image-version 2.0-debian10
The following instructions demonstrate how to validate transactions in your Dataproc Metastore service that is used by a Dataproc cluster.
SSH into the Dataproc cluster. You can do this from either a browser or from the command line.
Run the command
hive
to open the Hive client:$> hive
Set up the client side configurations to enable Hive ACID support for transactions in the hive client session:
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; SET hive.support.concurrency=true; SET hive.exec.dynamic.partition.mode=nonstrict;
Create a transactional table to insert and update into. The following is an example.
Create a transaction table:
create table student (id int, name string, age int) STORED AS ORC TBLPROPERTIES ('transactional' = 'true');
Check if the table is transactional:
describe formatted <tableName>;
A list of the table properties are printed. A transactional table has
transactional=true
in its table parameters.Insert data into the table:
INSERT INTO student VALUES (1, 'Alice', 10), (2, 'Bob', 10), (3, 'Charlie', 10);
- Observe the delta folder created under the
student
directory in the warehouse directory of the service. Multiple delta folders are created if you run multiple insert or update statements.
- Observe the delta folder created under the
View which compactions are running and their statuses. Hive metastore runs a thread called initiator every five minutes to check for tables which are due for compaction and requests compaction for those tables.
show compactions;
To start a manual compaction (either minor or major):
ALTER TABLE student COMPACT 'minor'; ALTER TABLE student COMPACT 'major';