Auxiliary versions

Dataproc Metastore auxiliary versions are Hive metastore services that are attached to a primary Dataproc Metastore service. The auxiliary versions are earlier than the primary Dataproc Metastore version. They help provide Hive metastore version compatibility with different data processing engines that use different versions of the Hive metastore client library.

When to use auxiliary versions

Auxiliary versions allow you to expose additional endpoints that use the Hive metastore wire protocol for Hive versions that are earlier than that of the metadata database schema in your Dataproc Metastore services. This gives you the ability to share metadata across data processing engines that use different versions of the Hive metastore client library.

How auxiliary versions works

You have the option to specify additional Hive metastore versions for a Dataproc Metastore service, that are then exposed through separate endpoints. All endpoints share a common metadata database.

The metadata database schema version matches the Dataproc Metastore service's primary Hive metastore version. All auxiliary versions are earlier than the primary service's version in order to avoid forward compatibility issues with the metadata schema.

Only one auxiliary version is supported per Dataproc Metastore service.

You can specify auxiliary versions when you create or update Dataproc Metastore services. When updating a service, you can add or delete the auxiliary version.

Create an auxiliary version with your Dataproc Metastore service

The following instructions demonstrate how to create an auxiliary version with your Dataproc Metastore service.

Console

  1. In the Cloud console, open the Dataproc Metastore page:

    Open Dataproc Metastore in the Cloud console

  2. At the top of the Dataproc Metastore page, click the Create button. The Create service page opens.

  3. Configure your service as desired.

  4. Under Auxiliary version config, enable auxiliary versions.

  5. Click Add Auxiliary Version.

    1. Enter a name for your auxiliary version.

    2. Select a version for your auxiliary version.

    3. Optional: To apply a mapping to the auxiliary version, click + Add Overrides.

    4. Click Done.

  6. Click Submit.

gcloud

  1. Run one of the following gcloud beta metastore services create commands
    to create a service with an auxiliary version:

    gcloud beta metastore services create SERVICE \
        --location=LOCATION \
        --auxiliary-versions=AUXILIARY_VERSIONS, ...
    

    or

    gcloud beta metastore services create SERVICE \
        --location=LOCATION \
        --auxiliary-versions-from-file=AUXILIARY_VERSIONS_FROM_FILE
    

    Replace the following:

    • SERVICE: a name for your new service
    • LOCATION: a Google Cloud region
    • AUXILIARY_VERSIONS: a comma-separated list of auxiliary Hive metastore versions to deploy
    • AUXILIARY_VERSIONS_FROM_FILE: a path to a YAML file containing the auxiliary versions configuration; for more information and an example, see the SDK documentation
  2. Verify that the creation was successful.

curl

You can create an auxiliary version with your Dataproc Metastore service when you create it using the create method:

   ```
     curl -X POST -s -i \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -d '{"network":"projects/PROJECT_ID/global/networks/default", "port": 9083, "hive_metastore_config": {"auxiliary_versions": {"aux-version1": {"version": "AUX_VERSION"} } } }' \
     -H "Content-Type:application/json" \
     https://metastore.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/services?service_id=SERVICE_ID
   ```

Update an auxiliary version with your Dataproc Metastore service

The following instructions demonstrate how to update an auxiliary version with your Dataproc Metastore service.

Console

  1. In the Cloud console, open the Dataproc Metastore page:

    Open Dataproc Metastore in the Cloud console

  2. On the Dataproc Metastore page, click the service name of the service you'd like to update. The Service detail page for that service opens.

  3. Under the Configuration tab, click the Edit button. The Edit service page opens.

  4. Under Auxiliary version config, enable or disable auxiliary versions.

  5. To delete you auxiliary version, click Delete.

  6. To add a new auxiliary version, click Add Auxiliary Version.

  7. To apply a mapping to the auxiliary version, click + Add Overrides.

  8. Click Submit.

gcloud

  1. Run the following gcloud beta metastore services update command to update a service with Private Service Connect:

    gcloud beta metastore services update SERVICE \
       --location=LOCATION \
       --add-auxiliary-versions=AUXILIARY_VERSIONS, ...
    

    or

    gcloud beta metastore services update SERVICE \
       --location=LOCATION \
       --update-auxiliary-versions-from-file=AUXILIARY_VERSIONS_FROM_FILE
    

    Replace the following:

    • SERVICE: the name of your service
    • LOCATION: a Google Cloud region
    • AUXILIARY_VERSIONS: a comma-separated list of auxiliary Hive metastore versions to deploy
    • AUXILIARY_VERSIONS_FROM_FILE: a path to a YAML file containing the auxiliary versions configuration; for more information and an example, see the SDK documentation
  2. Verify that the update was successful.

curl

You can update an auxiliary version with your Dataproc Metastore service when you update it using the patch method:

 ```
   curl -X PATCH -s -i \
   -H "Authorization: Bearer $(gcloud auth print-access-token)" \
   -d '{"hive_metastore_config": {"auxiliary_versions": {"aux-version1": {"version": "AUX_VERSION} } } }' \
   -H "Content-Type:application/json" \
   https://metastore.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/services/SERVICE_ID?update_mask=hive_metastore_config.auxiliary_versions
 ```

Shared behaviors and properties

The following table lists the various behaviors and properties of a Dataproc Metastore service, and whether they are shared across the service or replicated per Hive metastore version. In general, aspects related to static configuration, the metadata database, and the metadata itself are shared. On the other hand, behaviors that affect the request path for Hive metastore metadata requests are replicated per Hive metastore version.

Behavior Per-Service Per-Version
Endpoint
Hive config overrides
Kerberos config
Endpoint protocol (Thrift/gRPC)
Thrift port
Artifacts Cloud Storage bucket
Tier
Maintenance window
Release channel
Encryption config
Database type
Data Catalog sync toggle
Request count metric

Attach a Dataproc cluster

You can attach a Dataproc cluster that uses the Dataproc Metastore auxiliary version as its Hive metastore using the auxiliary's endpoint URI and warehouse directory.

For more information on how to attach a Dataproc cluster, see Attach a Dataproc cluster using the ENDPOINT_URI and WAREHOUSE_DIR.

Auxiliary versions caveats

Auxiliary versions have the following caveats:

  • The auxiliary version must be an earlier version than the primary version.

  • Only one auxiliary version is supported per Dataproc Metastore service.

  • Auxiliary versions are not supported for Private Service Connect configuration.

  • The Spanner database type is not supported with Auxiliary versions.

  • Import, export, backup, and restore don't apply to auxiliary versions.

  • Depending on the Hive versions, some Hive metastore methods in the auxiliary version may not work with the primary version.

  • The auxiliary version's log is separated from the primary version. You can use Cloud Logging to debug Hive metastore issues.

What's next