You can install additional components like Presto when you create a Dataproc cluster using the Optional components feature. This page describes how you can optionally install Presto component on a Dataproc cluster.
Presto (Trino) is an open
source distributed SQL query engine. The Presto server and
Web UI are by default available on port 8060
(or port 7778
if Kerberos is
enabled) on the cluster's first master node.
By default, Presto on Dataproc is configured to work with Hive
, BigQuery
,
Memory
, TPCH
and TPCDS
connectors.
After creating a cluster with the Presto component, you can run queries:
- from a local terminal with the
gcloud dataproc jobs submit presto
command - from a terminal window on the cluster's first master node using the
presto
CLI (Command Line Interface)—see Use Trino with Dataproc
Install the component
Install the component when you create a Dataproc cluster. Components can be added to clusters created with Dataproc version 1.3 and later.
See Supported Dataproc versions for the component version included in each Dataproc image release.
gcloud command
To create a Dataproc cluster that includes the Presto component,
use the
gcloud dataproc clusters create cluster-name
command with the --optional-components
flag.
gcloud dataproc clusters create cluster-name \ --optional-components=PRESTO \ --region=region \ --enable-component-gateway \ ... other flags
Configuring properties
Add the --properties
flag to the
gcloud dataproc clusters create
command to set
presto, presto-jvm and presto-catalog config properties.
-
Application properties: Use cluster properties with the
presto:
prefix to configure Presto application properties—for example,--properties="presto:join-distribution-type=AUTOMATIC"
. - JVM configuration properties: Use cluster properties with the
presto-jvm:
prefix to configure JVM properties for Presto coordinator and worker Java processes—for example,--properties="presto-jvm:XX:+HeapDumpOnOutOfMemoryError"
. - Creating new catalogs and adding catalog properties: Use
presto-catalog:catalog-name.property-name
to configure Presto catalogs.Example: The following `properties` flag can be used with the `gcloud dataproc clusters create` command to create a Presto cluster with a "prodhive" Hive catalog. A
prodhive.properties
file will be created under/usr/lib/presto/etc/catalog/
to enable the prodhive catalog.--properties="presto-catalog:prodhive.connector.name=hive-hadoop2,presto-catalog:prodhive.hive.metastore.uri=thrift://localhost:9083
REST API
The Presto component can be specified through the Dataproc API using SoftwareConfig.Component as part of a clusters.create request.
Console
- Enable the component and component gateway.
- In the Google Cloud console, open the Dataproc Create a cluster page. The Set up cluster panel is selected.
- In the Components section:
- Under Optional components, select Presto and other optional components to install on your cluster.
- Under Component Gateway, select Enable component gateway (see Viewing and Accessing Component Gateway URLs).