Use Apache Hive with Dataproc Metastore

This page shows you an example of using Apache Hive with a Dataproc Metastore service. In this example, you launch a Hive session on a Dataproc cluster and run some sample commands to create a database and table.

Before you begin

Connect to Apache Hive

To start using Hive you can SSH into the Dataproc cluster that's associated with your Dataproc Metastore service. After, you SSH into the cluster, you can run Hive commands to manage your metadata.

To connect to Hive

  1. In the Google Cloud console, go to the VM Instances page.
  2. In the list of virtual machine instances, click SSH in the row of the Dataproc VM instance that you want to connect to.

A browser window opens in your home directory on the node with an output similar to the following:

Connected, host fingerprint: ssh-rsa ...
Linux cluster-1-m 3.16.0-0.bpo.4-amd64 ...
...
example-cluster@cluster-1-m:~$

To start Hive and create a database and table, run the following commands in the SSH session:

  1. Start Hive.

    hive
    
  2. Create a database called myDatabase.

    create database myDatabase;
    
  3. Show the database you created.

    show databases;
    
  4. Use the database you created.

    use myDatabase;
    
  5. Create a table called myTable.

    create table myTable(id int,name string);
    
  6. List the tables under myDatabase.

    show tables;
    
  7. Show the table rows in the table you created.

    desc MyTable;
    

Running these commands shows an output similar to the following:

$hive

hive> show databases;
OK
default
hive> create database myDatabase;
OK
hive> use myDatabase;
OK
hive> create table myTable(id int,name string);
OK
hive> show tables;
OK
myTable
hive> desc myTable;
OK
id                      int                                         
name                    string 

What's next