Cluster web interfaces

Some of the core open source components included with Google Cloud Dataproc clusters, such as Apache Hadoop and Apache Spark, provide web interfaces. These web interfaces can be used to manage and monitor different cluster resources and facilities, such as the YARN resource manager, the Hadoop Distributed File System (HDFS), MapReduce, and Spark.

Available interfaces

The interfaces listed below are available on a Cloud Dataproc cluster master node (replace master-host-name with the name of your master node).

Web UI Port URL
YARN ResourceManager 8088 http://master-host-name:8088
HDFS NameNode 9870 http://master-host-name:9870

The YARN ResourceManager has links for all currently running and completed MapReduce and Spark Applications web interfaces under the "Tracking UI" column.

Connecting to the web interfaces

To connect to the web interfaces, we recommend you use an SSH tunnel to create a secure connection to your master node. The SSH tunnel supports traffic proxying using the SOCKS protocol. This means that you can send network requests through your SSH tunnel in any browser that supports the SOCKS protocol. This method allows you to transfer all of your browser data over SSH, eliminating the need to open firewall ports to access the web interfaces.

Connecting to the web interfaces with SSH and SOCKS is a two-step process:

  1. Create an SSH tunnel. Use an SSH client or utility to create the SSH tunnel. Use the SSH tunnel to securely transfer web traffic data from your computer's web browser to the Cloud Dataproc cluster.

  2. Use a SOCKS proxy to connect with your browser. Configure your browser to use the SOCKS proxy. The SOCKS proxy routes data intended for the Cloud Dataproc cluster through the SSH tunnel.

Directions for performing each step are provided below.

Step 1 - Create an SSH tunnel

We recommend using the gcloud compute ssh command in the Google Cloud SDK to create an SSH tunnel.

We recommend that you pass arguments to gcloud compute ssh to enable local port forwarding when creating the SSH tunnel.

Run the following command to set up an SSH tunnel to the Hadoop master instance on port 1080 of your local machine. Replace master-host-name with the name of the master node in your Cloud Dataproc cluster and master-host-zone with the zone of your Cloud Dataproc cluster.

gcloud compute ssh  --zone=<master-host-zone> \
  --ssh-flag="-D 1080" --ssh-flag="-N" --ssh-flag="-n" <master-host-name>

The --ssh-flag flag allows you to add extra parameters to your SSH connection. The --ssh-flag values, above, have the following meanings:

  • -D 1080 specifies dynamic application-level port forwarding.
  • -N instructs gcloud not to open a remote shell.
  • -n instructs gcloud not to read from stdin.

By using this command, you create an SSH tunnel that operates independently from other SSH shell sessions you may be running. Doing this keeps tunnel-related errors out of your shell output, and helps prevent you from inadvertently closing your SSH tunnel.

Step 2 - Connect with your web browser

Your SSH tunnel supports traffic proxying using the SOCKS protocol. You must configure your browser to use the proxy when connecting to your cluster.

The application (executable) location of your browser on your machine/device depends on its operating system. The following are standard Google Chrome application locations for popular operating systems:

Operating System Google Chrome Executable Path
Mac OS X /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome
Linux /usr/bin/google-chrome
Windows C:\Program Files (x86)\Google\Chrome\Application\chrome.exe

You can specify proxy server information when you start your browser. Here's an example that uses the Google Chrome browser:

<Google Chrome executable path> \
  --proxy-server="socks5://localhost:1080" \
  --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" \
  --user-data-dir=/tmp/

This command uses the following Google Chrome flags:

  • -proxy-server="socks5://localhost:1080" tells Chrome to send all http:// and https:// URL requests through the SOCKS proxy server localhost:1080, using version 5 of the SOCKS protocol. Hostnames for these URLs are resolved by the proxy server, not locally by Chrome.
  • --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" prevents Chrome from sending any DNS requests over the network.
  • --user-data-dir=/tmp/ forces Chrome to open a new window that is not tied to an existing Chrome session. Without this flag, Chrome may open a new window attached to an existing Chrome session, ignoring your --proxy-server setting. The value set for --user-data-dir can be any nonexistent path.

Once your browser is configured to use the proxy, you can navigate to one of the web interface URLs—see Available interfaces—on your Cloud Dataproc cluster.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Google Cloud Dataproc Documentation