Some of the core open source components included with Google Cloud Dataproc clusters, such as Apache Hadoop and Apache Spark, provide web interfaces. These web interfaces can be used to manage and monitor different cluster resources and facilities, such as the YARN resource manager, the Hadoop Distributed File System (HDFS), MapReduce, and Spark.
The interfaces listed below are available on a Cloud Dataproc cluster master
master-host-name with the name of your master node).
The YARN ResourceManager has links for all currently running and completed MapReduce and Spark Applications web interfaces under the "Tracking UI" column.
Connecting to the web interfaces
To connect to the web interfaces, we recommend you use an SSH tunnel to create a secure connection to your master node. The SSH tunnel supports traffic proxying using the SOCKS protocol. This means that you can send network requests through your SSH tunnel in any browser that supports the SOCKS protocol. This method allows you to transfer all of your browser data over SSH, eliminating the need to open firewall ports to access the web interfaces.
Connecting to the web interfaces with SSH and SOCKS is a two-step process:
Create an SSH tunnel. Use an SSH client or utility to create the SSH tunnel. Use the SSH tunnel to securely transfer web traffic data from your computer's web browser to the Cloud Dataproc cluster.
Use a SOCKS proxy to connect with your browser. Configure your browser to use the SOCKS proxy. The SOCKS proxy routes data intended for the Cloud Dataproc cluster through the SSH tunnel.
Directions for performing each step are provided below.
Step 1 - Create an SSH tunnel
We recommend using the
gcloud compute ssh
command in the Google Cloud SDK to create an SSH tunnel.
We recommend that you pass arguments
gcloud compute ssh to enable
local port forwarding
when creating the SSH tunnel.
Run the following command to set up an SSH tunnel to the Hadoop master instance
on port 1080 of your local machine. Replace
master-host-name with the
name of the master node in your Cloud Dataproc cluster and
master-host-zone with the zone of your Cloud Dataproc cluster.
gcloud compute ssh --zone=<master-host-zone> \ --ssh-flag="-D 1080" --ssh-flag="-N" --ssh-flag="-n" <master-host-name>
--ssh-flag flag allows you to add extra parameters to your SSH
--ssh-flag values, above, have the following
-D 1080specifies dynamic application-level port forwarding.
gcloudnot to open a remote shell.
gcloudnot to read from stdin.
By using this command, you create an SSH tunnel that operates independently from other SSH shell sessions you may be running. Doing this keeps tunnel-related errors out of your shell output, and helps prevent you from inadvertently closing your SSH tunnel.
Step 2 - Connect with your web browser
Your SSH tunnel supports traffic proxying using the SOCKS protocol. You must configure your browser to use the proxy when connecting to your cluster.
The application (executable) location of your browser on your machine/device depends on its operating system. The following are standard Google Chrome application locations for popular operating systems:
|Operating System||Google Chrome Executable Path|
|Mac OS X||
You can specify proxy server information when you start your browser. Here's an example that uses the Google Chrome browser:
<Google Chrome executable path> \ --proxy-server="socks5://localhost:1080" \ --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" \ --user-data-dir=/tmp/
This command uses the following Google Chrome flags:
-proxy-server="socks5://localhost:1080"tells Chrome to send all
https://URL requests through the SOCKS proxy server localhost:1080, using version 5 of the SOCKS protocol. Hostnames for these URLs are resolved by the proxy server, not locally by Chrome.
--host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost"prevents Chrome from sending any DNS requests over the network.
--user-data-dir=/tmp/forces Chrome to open a new window that is not tied to an existing Chrome session. Without this flag, Chrome may open a new window attached to an existing Chrome session, ignoring your
--proxy-serversetting. The value set for
--user-data-dircan be any nonexistent path.
Once your browser is configured to use the proxy, you can navigate to one of the web interface URLs—see Available interfaces—on your Cloud Dataproc cluster.