Cluster web interfaces

Some of the core open source components included with Google Cloud Dataproc clusters, such as Apache Hadoop and Apache Spark, provide Web interfaces. These interfaces can be used to manage and monitor cluster resources and facilities, such as the YARN resource manager, the Hadoop Distributed File System (HDFS), MapReduce, and Spark. Other components or applications that you install on your cluster may also provide Web interfaces (see, for example, Install and run a Jupyter notebook on a Cloud Dataproc cluster).

Available interfaces

The following interfaces are available on a Cloud Dataproc cluster master node (replace master-host-name with the name of your master node).

Web UI Port URL
YARN ResourceManager 8088 http://master-host-name:8088
HDFS NameNode 9870* http://master-host-name:9870

* In earlier Cloud Dataproc releases (pre-1.2), the HDFS Namenode Web UI port was 50070.

The YARN ResourceManager has links for all currently running and completed MapReduce and Spark Applications Web interfaces under the "Tracking UI" column.

Connecting to Web interfaces

You can connect to Web interfaces running on a Cloud Dataproc cluster using your project's Cloud Shell or the Cloud SDK gcloud command-line tool:

  • Cloud Shell: The Cloud Shell in the Google Cloud Platform Console has the Cloud SDK commands and utilities pre-installed, and it provides a Web preview feature that allow you to quickly connect through an SSH tunnel to a Web interface port on a cluster. However, a connection to the cluster from Cloud Shell uses local port forwarding, which opens a connection to only one port on a cluster Web interface—multiple commands are needed to connect to multiple ports. Also, Cloud Shell sessions automatically terminate after a period of inactivity (30 minutes).

  • gcloud command-line tool: The gcloud compute ssh command with dynamic port forwarding allows you to establish an SSH tunnel and run a SOCKS proxy server on top of the tunnel. After issuing this command, you must configure your local browser to use the SOCKS proxy. This connection method allows you to connect to multiple ports on a cluster Web interface. See Can I use local port forwarding instead of a SOCKS proxy? for more information.

Set commonly used command variables

To make copying and running command-line examples on your local machine or in Cloud Shell easier, set gcloud dataproc command variables. Additional variables may need to be set for some of the command examples shown on this page.

Linux/mac/Shell

export PROJECT=project;export HOSTNAME=hostname;export ZONE=zone

Windows

set PROJECT=project && set HOSTNAME=hostname && set ZONE=zone
  • Set PROJECT to your Google Cloud Platform project ID
  • Set HOSTNAME to the name of master node in your Cloud Dataproc cluster (the master name ends with a -m suffix)
  • Set ZONE to the zone of the VMs in your Cloud Dataproc cluster (for example, "us-central1-b")

Create an SSH tunnel

gcloud Command

Run the following gcloud command on your local machine to set up an SSH tunnel from an open port on your local machine to the master instance of your cluster, and run a local SOCKS proxy server listening on the port.

Before running the command, on your local machine:

  1. Set commonly used command variables
  2. Set a PORT variable to an open port on your local machine. Port 1080 is an arbitrary but typical choice since it is likely to be open.
    PORT=number
    

Linux/macOS

gcloud compute ssh ${HOSTNAME} \
    --project=${PROJECT} --zone=${ZONE}  -- \
    -D ${PORT} -N

Windows

gcloud compute ssh %HOSTNAME% ^
    --project=%PROJECT% --zone=%ZONE%  -- ^
    -D %PORT% -N

The -- separator allows you to add SSH arguments to the gcloud compute ssh command, as follows:

  • -Dspecifies dynamic application-level port forwarding.
  • -N instructs gcloud not to open a remote shell.

This gcloud command creates an SSH tunnel that operates independently from other SSH shell sessions, keeps tunnel-related errors out of the shell output, and helps prevent inadvertent closures of the tunnel.

If the ssh command fails with the error message bind: Cannot assign requested address, a likely cause is that the requested port is in use. Try running the command with a different PORT variable value.

The above command runs in the foreground, and must continue running to keep the tunnel active. The command should terminate automatically if and when the you delete the cluster.

Cloud Shell

  1. Open Google Cloud Platform Cloud Shell.
  2. Run the gcloud command, below, in Cloud Shell to set up an SSH tunnel from a Cloud Shell preview port to a Web interface port on the master node on your cluster. Before running the command, in Cloud Shell :

    1. Set commonly used command variables
    2. Set a PORT1 variable to a Cloud Shell port in the port range 8080 - 8084, and set a PORT2 variable to the Web interface port on the master node on your Cloud Dataproc cluster.
      PORT1=number
      PORT2=number
      
    gcloud compute ssh ${HOSTNAME} \
        --project=${PROJECT} --zone=${ZONE}  -- \
        -4 -N -L ${PORT1}:${HOSTNAME}:${PORT2}
    

    The -- separator allows you to add SSH arguments to the gcloud compute ssh command, as follows:

    • -4 instructs ssh to only use IPv4.
    • -N instructs gcloud not to open a remote shell.
    • -L ${PORT1}:${HOSTNAME}:${PORT2} specifies local port forwarding from the specified Cloud Shell PORT1 to cluster HOSTNAME:PORT2.

    This gcloud command creates an SSH tunnel that operates independently from other SSH shell sessions, keeps tunnel-related errors out of the shell output, and helps prevent inadvertent closures of the tunnel.

Configure your browser

gcloud Command

Your SSH tunnel supports traffic proxying using the SOCKS protocol. To configure your browser to use the proxy, start a new browser session with proxy server parameters. Here's an example that uses the Google Chrome browser. HOSTNAME is the name of the cluster's master node (see Set commonly used command variables).

Linux

/usr/bin/google-chrome \
    --proxy-server="socks5://localhost:${PORT}" \
    --user-data-dir=/tmp/${HOSTNAME}

macOS

"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
    --proxy-server="socks5://localhost:${PORT}" \
    --user-data-dir=/tmp/${HOSTNAME}

Windows

"%ProgramFiles(x86)%\Google\Chrome\Application\chrome.exe" ^
    --proxy-server="socks5://localhost:%PORT%" ^
    --user-data-dir="%Temp%\%HOSTNAME%"

This command uses the following Google Chrome browser flags:

  • -proxy-server="socks5://localhost:1080" tells Chrome to send all http:// and https:// URL requests through the SOCKS proxy server localhost:${PORT}, using version 5 of the SOCKS protocol. ${PORT} is the port variable you set in Create an SSH tunnel. Hostnames for URLs are resolved by the proxy server, not locally by Chrome.
  • --user-data-dir=/tmp/${HOSTNAME} forces Chrome to open a new window that is not tied to an existing Chrome session. Without this flag, Chrome may open a new window attached to an existing Chrome session, ignoring your --proxy-server setting. The value set for --user-data-dir can be any non-existent path.

Cloud Shell

You do not need to configure your local browser when using Cloud Shell. After creating an SSH tunnel, use Cloud Shell Web preview to connect to the cluster interface.

Connect to the cluster interface

gcloud Command

Once your local browser is configured to use the proxy, you can navigate to the Web interface URL on your Cloud Dataproc cluster (see Available interfaces). The browser URL has the following format and content: http://cluster-name-m:port (cluster interface port)

Cloud Shell

Click the Cloud Shell Web Preview button web-preview-button, and then select either:

  • "Preview on port 8080", or
  • "Change port" and insert the port number in the dialog
according to the Cloud Shell PORT1 number (port 8080 - 8084) you passed to the gcloud compute ssh command in Create an SSH tunnel.

A browser window opens that connects to the Web interface port on the cluster master node. The following screenshot shows a browser window that connects through Cloud Shell port 8084 to a Cloud Datalab notebook interface running on the cluster master node.

FAQ And debugging tips

What if I don't see the UI in my browser?

If you don't see the UIs in your browser, the two most common reasons are:

  1. You have a network connectivity issue, possibly due to a firewall. Run the following command (after setting local variables) to see if you can SSH to the master instance. If you can't, it signals a connectivity issue.

    Linux/macOS

    gcloud compute ssh ${HOSTNAME}-m \ 
        --project=${PROJECT}
    

    Windows

    gcloud compute ssh %HOSTNAME%-m ^
        --project=%PROJECT%
    

  2. Another proxy is interfering with the SOCKS proxy. To check the proxy, run the following curl command (available on Linux and macOS):

    Linux/macOS

    curl -Is --socks5-hostname localhost:1080 http://cluster-name-m:8088
    

    Windows

    curl.exe -Is --socks5-hostname localhost:1080 http://cluster-name-m:8088
    
    If you see an HTTP response, the proxy is working, so it's possible that the SOCKS proxy is being interrupted by another proxy or browser extension.

Can I use local port forwarding instead of a SOCKS proxy?

Instead of the SOCKS proxy, it's possible to access Web application UIs running on your master instance with SSH local port forwarding, which forwards the master's port to a local port. For example, the following command lets you access localhost:1080 to reach cluster-name-m:8088 without SOCKS (see Set commonly used command variables):

Linux/macOS

gcloud compute ssh ${HOSTNAME}-m \ 
    --project=${PROJECT} -- \ 
    -L 1080:${HOSTNAME}-m:8088 -N -n

Windows

gcloud compute ssh %HOSTNAME%-m ^ 
    --project=%PROJECT% -- ^ 
    -L 1080:%HOSTNAME%-m:8088 -N -n

Using a SOCKS proxy may be preferable to using local port forwarding since the proxy:

  • allows you to access all Web application ports without having to set up a port forward tunnel for each UI port
  • allows the Spark and Hadoop Web UIs to correctly resolve DNS hosts
Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataproc Documentation