Dataproc optional Trino component

You can install additional components like Trino when you create a Dataproc cluster using the Optional components feature. This page describes how you can optionally install the Trino component on a Dataproc cluster.

Trino is an open source distributed SQL query engine. The Trino server and Web UI are by default available on port 8060 (or port 7778 if Kerberos is enabled) on the cluster's first master node.

By default, Trino on Dataproc is configured to work with Hive, BigQuery, Memory, TPCH and TPCDS connectors.

After creating a cluster with the Trino component, you can run queries:

Install the component

Install the component when you create a Dataproc cluster.

See Supported Dataproc versions for the component version included in each Dataproc image release.

gcloud command

To create a Dataproc cluster that includes the Trino component, use the gcloud dataproc clusters create cluster-name command with the --optional-components flag.

gcloud dataproc clusters create cluster-name \
    --optional-components=TRINO \
    --region=region \
    --enable-component-gateway \
    ... other flags

Configuring properties

Add the --properties flag to the gcloud dataproc clusters create command to set trino, trino-jvm and trino-catalog config properties.

  • Application properties: Use cluster properties with the trino: prefix to configure Trino application properties—for example, --properties="trino:join-distribution-type=AUTOMATIC".
  • JVM configuration properties: Use cluster properties with the trino-jvm: prefix to configure JVM properties for Trino coordinator and worker Java processes—for example, --properties="trino-jvm:XX:+HeapDumpOnOutOfMemoryError".
  • Creating new catalogs and adding catalog properties: Use trino-catalog:catalog-name.property-name to configure Trino catalogs.

    Example: The following `properties` flag can be used with the `gcloud dataproc clusters create` command to create a Trino cluster with a "prodhive" Hive catalog. A prodhive.properties file will be created under/usr/lib/trino/etc/catalog/ to enable the prodhive catalog.

    --properties="trino-catalog:prodhive.connector.name=hive,trino-catalog:prodhive.hive.metastore.uri=localhost:9000"

REST API

The Trino component can be specified through the Dataproc API using SoftwareConfig.Component as part of a clusters.create request.

Console

    1. Enable the component and component gateway.
      • In the Google Cloud console, open the Dataproc Create a cluster page. The Set up cluster panel is selected.
      • In the Components section: