Auxiliary versions

Stay organized with collections Save and categorize content based on your preferences.

This page explains how to use the auxiliary versions feature with Dataproc Metastore.

Auxiliary versions let you connect two different versions of a Hive metastore to a single Dataproc Metastore service. This configuration allows you to support multiple data processing engines that need to run on different Hive metastore versions.

For example, using auxiliary versions, you can connect multiple Dataproc clusters to the same Dataproc Metastore service. In this configuration, one cluster could run Dataproc version 2.0 while the other runs Dataproc version 1.5. The Dataproc 2.0 cluster could connect to an endpoint that exposes Hive version 3.1.2, while the Dataproc 1.5 cluster connects to an endpoint that exposes Hive version 2.3.6.

How auxiliary versions work

When you enable auxiliary versions, Dataproc Metastore exposes a separate endpoint for each Hive metastore version. However, both endpoints continue to share the same metadata database.

Note that this feature doesn't provide you with a way to use different sets of metadata with a single Dataproc Metastore service. Instead, it offers a way for you to extend and enhance compatibility between your services.

Considerations

  • You can only create one auxiliary version for each Dataproc Metastore service.

  • The auxiliary version must be configured to use a lower Hive metastore version than the primary version.

  • The auxiliary version doesn't support the following features:

    The metadata-related features (import/export/backup/restore) can only be used with the primary version, since the backend metadata between both versions is shared.

  • Some Hive methods might not be compatible between the auxiliary version and the primary version. This compatibility depends on what versions of Hive you're using for your primary and auxiliary versions and the methods that are compatible between the Hive versions.

  • The auxiliary version maintains a separate log file than the primary version. To debug Hive metastore issues, you can use Cloud Logging.

Shared properties between versions

When you create an auxiliary version, some properties are shared and remain common between the auxiliary version and the primary version. Other properties aren't shared and are separate between both versions.

The following table lists these differences.

Properties Common Separate
Endpoint
Hive config overrides*
Kerberos config
Endpoint protocol (Thrift/gRPC)
Thrift port
Artifacts Cloud Storage bucket
Tier
Maintenance window
Release channel
Encryption config
Database type
Data Catalog sync toggle
Request count metric

* The Hive config overrides remain separate between the auxiliary version and the primary version. However, the auxiliary version references a merged list of the overrides (primary+auxiliary). In this case, the auxiliary config takes precedence over the primary config.

Before you begin

Required Roles

To get the permission that you need to create a Dataproc Metastore that uses auxiliary versions, ask your administrator to grant you the following IAM roles on your project, based on the principle of least privilege:

  • Grant full control of Dataproc Metastore resources (roles/metastore.editor)
  • Grant full access to all Dataproc Metastore resources, including IAM policy administration (roles/metastore.admin)

For more information about granting roles, see Manage access.

This predefined role contains the metastore.services.create permission, which is required to create a Dataproc Metastore that uses auxiliary versions. You might also be able to get this permission with custom roles or other predefined roles.

For more information about specific Dataproc Metastore roles and permissions, see Manage Dataproc access with IAM.

Create an auxiliary version for a new service

The following example shows an abbreviated version of the steps that you follow to enable auxiliary versions. For complete step-by-step instructions on the entire process you must follow, see Create a Dataproc Metastore.

Console

  1. In the Google Cloud console, open the Dataproc Metastore page:

    Open Dataproc Metastore

  2. At the top of the Dataproc Metastore page, click the Create button.

    The Create service page opens.

  3. Under Auxiliary version config, enable auxiliary versions.

  4. Click Add Auxiliary Version.

    1. Enter a name for your auxiliary version.

    2. Select a version for your auxiliary version.

    3. Optional: To apply a mapping to the auxiliary version, click + Add Overrides.

    4. Click Done.

  5. Choose the remaining configurations for your service, as needed.

  6. Click Submit.

gcloud CLI

  1. To create a Dataproc Metastore service with an auxiliary version, run one of the following gcloud beta metastore services create commands:

    gcloud beta metastore services create SERVICE \
        --location=LOCATION \
        --auxiliary-versions=AUXILIARY_VERSIONS, ...
    

    or

    gcloud beta metastore services create SERVICE \
        --location=LOCATION \
        --auxiliary-versions-from-file=AUXILIARY_VERSIONS_FROM_FILE
    

    Replace the following:

    • SERVICE: the name of your new Dataproc Metastore service.
    • LOCATION: the region you want to create your Dataproc Metastore service in.
    • AUXILIARY_VERSIONS: a comma-separated list of the Hive metastore versions to deploy for your auxiliary version. Currently, only one auxiliary version is supported. Use the following format "2.3.6".
    • AUXILIARY_VERSIONS_FROM_FILE: a path to a YAML file containing the auxiliary versions configuration. For more information and an example, see the SDK documentation.
  2. Verify that the creation was successful.

curl

To create a Dataproc Metastore service with an auxiliary version, use the create method.

curl -X POST -s -i \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -d '{"network":"projects/PROJECT_ID/global/networks/default", "port": 9083, "hive_metastore_config": {"auxiliary_versions": {"aux-version1": {"version": "AUX_VERSION"} } } }' \
     -H "Content-Type:application/json" \
     https://metastore.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/services?service_id=SERVICE_ID

Replace the following:

  • SERVICE_ID: the name of your new Dataproc Metastore service.
  • PROJECT_ID: the Google Cloud project ID that you're creating the Dataproc Metastore service cluster in.
  • LOCATION: the region in which your Dataproc Metastore resides.
  • AUX_VERSIONS: a comma-separated list of auxiliary Hive metastore versions to deploy. Currently, only one auxiliary version is supported.

Update an auxiliary version for an existing service

The following instructions show you how to update an existing Dataproc Metastore service with auxiliary versions.

When running an update operation, you can complete the following tasks:

  • Add a new auxiliary version.
  • Delete an existing auxiliary version.
  • Add or modify overrides of an existing auxiliary version.

Console

  1. In the Google Cloud console, open the Dataproc Metastore page:

    Open Dataproc Metastore

  2. On the Dataproc Metastore page, click the service name of the service you want to update.

    The Service detail page opens.

  3. On the Configuration tab, click Edit.

    The Edit service page opens.

  4. In the Auxiliary version config section, click the toggle to enable or disable auxiliary versions.

    You can complete the following tasks:

    1. To delete an existing auxiliary version, click Delete.

    2. To add a new auxiliary version, click Add Auxiliary Version.

    3. To apply an override mapping to an auxiliary version, click + Add Overrides.

  5. Click Submit.

gcloud CLI

  1. To update a Dataproc Metastore service with an auxiliary version, run one of the following gcloud beta metastore services update commands:

    gcloud beta metastore services update SERVICE \
       --location=LOCATION \
       --add-auxiliary-versions=AUXILIARY_VERSIONS, ...
    

    or

    gcloud beta metastore services update SERVICE \
       --location=LOCATION \
       --update-auxiliary-versions-from-file=AUXILIARY_VERSIONS_FROM_FILE
    

    Replace the following:

    • SERVICE: the name of your Dataproc Metastore service.
    • LOCATION: the region in which your Dataproc Metastore resides.
    • AUXILIARY_VERSIONS: a comma-separated list of auxiliary Hive metastore versions to deploy.
    • AUXILIARY_VERSIONS_FROM_FILE: a path to a YAML file containing the auxiliary versions configuration; for more information and an example, see the SDK documentation.
  2. Verify that the update was successful.

curl

To update a Dataproc Metastore service with an auxiliary version, use the patch method.

curl -X PATCH -s -i \
   -H "Authorization: Bearer $(gcloud auth print-access-token)" \
   -d '{"hive_metastore_config": {"auxiliary_versions": {"aux-version1": {"version": "AUX_VERSION} } } }' \
   -H "Content-Type:application/json" \
   https://metastore.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/services/SERVICE_ID?update_mask=hive_metastore_config.auxiliary_versions

Replace the following:

  • SERVICE_ID: the name of your Dataproc Metastore service.
  • PROJECT_ID: the Google Cloud project ID that you're creating the Dataproc Metastore service cluster in.
  • LOCATION: the region in which your Dataproc Metastore resides.
  • AUX_VERSIONS: a comma-separated list of auxiliary Hive metastore versions to deploy.

What's next