Cluster web interfaces

Some of the core open source components included with Google Cloud Dataproc clusters, such as Apache Hadoop and Apache Spark, provide web interfaces. These web interfaces can be used to manage and monitor different cluster resources and facilities, such as the YARN resource manager, the Hadoop Distributed File System (HDFS), MapReduce, and Spark.

Available interfaces

The interfaces listed below are available on a Cloud Dataproc cluster master node (replace master-host-name with the name of your master node).

Web UI Port URL
YARN ResourceManager 8088 http://master-host-name:8088
HDFS NameNode 9870* http://master-host-name:9870

* In earlier Cloud Dataproc releases (pre-1.2), the HDFS Namenode Web UI port was 50070.

The YARN ResourceManager has links for all currently running and completed MapReduce and Spark Applications web interfaces under the "Tracking UI" column.

Connecting to the web interfaces

To connect to the web interfaces, the best practice is to use an SSH tunnel to create a secure connection to your master node. The SSH tunnel supports traffic proxying using the SOCKS protocol. This means that you can send network requests through your SSH tunnel in any browser that supports the SOCKS protocol. This method allows you to transfer all of your browser data over SSH, eliminating the need to open firewall ports to access the web interfaces.

Connecting to the web interfaces with SSH and SOCKS is a two-step process:

  1. Create an SSH tunnel. Use an SSH client or utility to create the SSH tunnel. Use the SSH tunnel to securely transfer web traffic data from your computer's web browser to the Cloud Dataproc cluster.

  2. Use a SOCKS proxy to connect with your browser. Configure your browser to use the SOCKS proxy. The SOCKS proxy routes data intended for the Cloud Dataproc cluster through the SSH tunnel.

Directions for performing each step are provided below.

Step 1 - Create an SSH tunnel

We recommend using the Google Cloud SDK gcloud compute ssh command to create an SSH tunnel with arguments that enable dynamic port forwarding.

Run the following command to set up an SSH tunnel to the Hadoop master instance on port 1080 of your local machine. Note that 1080 is an arbitrary but typical choice since it is likely to be open on your local machine. Replace master-host-name with the name of the master node in your Cloud Dataproc cluster and master-host-zone with the zone of your Cloud Dataproc cluster.

gcloud compute ssh --zone=master-host-zone master-host-name -- \
  -D 1080 -N -n

The -- separator allows you to add SSH arguments to the gcloud compute ssh command, as follows:

  • -Dspecifies dynamic application-level port forwarding. Port 1080 is shown in the example, but another available port on your local machine can be used.
  • -N instructs gcloud not to open a remote shell.
  • -n instructs gcloud not to read from stdin.

This command creates an SSH tunnel that operates independently from other SSH shell sessions, keeps tunnel-related errors out of the shell output, and helps prevent inadvertent closures of the tunnel.

Step 2 - Connect with your web browser

Your SSH tunnel supports traffic proxying using the SOCKS protocol. You must configure your browser to use the proxy when connecting to your cluster.

The application (executable) location of your browser on your machine/device depends on its operating system. The following are standard Google Chrome application locations for popular operating systems:

Operating System Google Chrome Executable Path
Mac OS X /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome
Linux /usr/bin/google-chrome
Windows C:\Program Files (x86)\Google\Chrome\Application\chrome.exe

To configure your browser, start a new browser session with proxy server parameters. Here's an example that uses the Google Chrome browser:

Google Chrome executable path \
  --proxy-server="socks5://localhost:1080" \
  --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" \
  --user-data-dir=/tmp/master-host-name

This command uses the following Google Chrome flags:

  • -proxy-server="socks5://localhost:1080" tells Chrome to send all http:// and https:// URL requests through the SOCKS proxy server localhost:1080, using version 5 of the SOCKS protocol. Hostnames for these URLs are resolved by the proxy server, not locally by Chrome.
  • --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" prevents Chrome from sending any DNS requests over the network.
  • --user-data-dir=/tmp/hadoop-master-name forces Chrome to open a new window that is not tied to an existing Chrome session. Without this flag, Chrome may open a new window attached to an existing Chrome session, ignoring your --proxy-server setting. The value set for --user-data-dir can be any nonexistent path.

Once your browser is configured to use the proxy, you can navigate to one of the web interface URLs—see Available interfaces—on your Cloud Dataproc cluster.

FAQ And Debugging Tips

What if I don't see the UI in my browser?

If you don't see the UIs in your browser, the two most common reasons are:

  1. You have a network connectivity issue, possibly due to a firewall. Run the following command to see if you can SSH to the master instance. If you can't, it signals a connectivity issue.
    gcloud compute ssh cluster-name-m
    
  2. Another proxy is interfering with the SOCKS proxy. To check the proxy, run the following curl command (available on Linux and Mac OS X):
    curl -Is --socks5-hostname localhost:1080 http://cluster-name-m:8088
    
    If you see an HTTP response, the proxy is working, so it's possible that the SOCKS proxy is being interrupted by another proxy or browser extension.

Why should I use a SOCKS proxy instead of local port forwarding?

Instead of the SOCKS proxy, it's possible to access web application UIs running on your master instance with SSH local port forwarding, which forwards the master's port to a local port. For example, the following command lets you access localhost:8088 to reach your cluster without SOCKS:

gcloud compute ssh cluster-name-m -- -D 8088 -N -n
You can also use Google Cloud Shell to implement local port forwarding, then use the Cloud Shell Web Preview feature to access the web interface. UI – for an example, see Install and run a Cloud Datalab notebook in a Cloud Dataproc cluster→Open the Cloud Datalab notebook.

Using a SOCKS proxy is recommended over local port forwarding since the proxy:

  • allows you to access all web application ports without having to set up a port forward tunnel for each UI port
  • allows the Spark and Hadoop web UIs to correctly resolve DNS hosts

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Google Cloud Dataproc Documentation