Some of the core open source components included with Google Cloud Dataproc clusters, such as Apache Hadoop and Apache Spark, provide Web interfaces. These interfaces can be used to manage and monitor cluster resources and facilities, such as the YARN resource manager, the Hadoop Distributed File System (HDFS), MapReduce, and Spark. Other components or applications that you install on your cluster may also provide Web interfaces (see, for example, Install and run a Jupyter notebook on a Cloud Dataproc cluster).
The following interfaces are available on a Cloud Dataproc cluster master
master-host-name with the name of your master node).
* In earlier Cloud Dataproc releases (pre-1.2), the HDFS Namenode Web UI port was 50070.
The YARN ResourceManager has links for all currently running and completed MapReduce and Spark Applications Web interfaces under the "Tracking UI" column.
Connecting to Web interfaces
Using Cloud Shell from Google Cloud Platform Console is easiest since Cloud Shell has the GCP SDK commands and utilities pre-installed, and it provides a Web preview feature to quickly connect to a Web interface port on a cluster without having to configure a local browser. However, an SSH connection to the cluster from Cloud Shell uses local port forwarding, which opens a connection to only one port on a cluster Web interface (multiple commands are needed to connect to multiple ports). Also, Cloud Shell sessions automatically terminate after a period of inactivity (30 minutes).
gcloudcommand-line tool with dynamic port forwarding to establish an SSH connection, then configuring your local browser to use a SOCKS proxy, allows you to connect to multiple ports on a cluster Web interface. See Why should I use a SOCKS proxy instead of local port forwarding? for more information.
Create an SSH tunnel
Run the following
gcloud command on your local machine to set up
an SSH tunnel from port
1080 of your local machine to the master
instance of your cluster. Note that 1080 is an arbitrary but typical
choice since it is likely to be open on your local machine. Replace
master-host-name with the
name of the master node in your Cloud Dataproc cluster,
project-id with your Google Cloud Platform project ID, and
master-host-zone with the zone of your Cloud Dataproc
Linux/Mac OS X
gcloud compute ssh master-host-name \ --project=project-id --zone=master-host-zone -- \ -D 1080 -N
gcloud compute ssh master-host-name ^ --project=project-id --zone=master-host-zone -- ^ -D 1080 -N
-- separator allows you to add
arguments to the
gcloud compute ssh command, as follows:
-Dspecifies dynamic application-level port forwarding. Port 1080 is shown in the example, but you can specify a different available port on your local machine.
gcloudnot to open a remote shell.
gcloud command creates an SSH tunnel that operates
independently from other SSH shell sessions, keeps tunnel-related errors out
of the shell output, and helps prevent inadvertent closures of the tunnel.
If the ssh command fails with the error message
bind: Cannot assign requested address, a likely cause is that
the requested port is in use. Try running the command with a different
-D port to create the SSH tunnel. If successful,
use this alternate port number in the
argument in the browser command in
Configure your browser.
The above command runs in the foreground, and must continue running to keep the tunnel active. The command should terminate automatically if and when the you delete the cluster.
- Open Google Cloud Platform Cloud Shell.
Run the gcloud compute ssh
command shown below to set up an SSH tunnel from one of the available Cloud
Shell Web preview ports (port 8080 through and including 8084) to a Web
interface port on the master node on your cluster:
- Replace master-host-name in two places with the name of the master node in your Cloud Dataproc cluster (master node names end with a "-m" suffix).
- Replace project-id with your Google Cloud Platform project ID.
- Replace master-host-zone with the zone of your Cloud Dataproc cluster.
- Replace port1 with the Cloud Shell port you will use (8080 - 8084), and port2 with the Web interface port on the cluster master node.
gcloud compute ssh master-host-name \ --project=project-id --zone master-host-zone -- \ -4 -N -L port1:master-host-name:port2The
--separator allows you to add SSH arguments to the
gcloud compute sshcommand, as follows:
-4instructs ssh to only use IPv4.
gcloudnot to open a remote shell.
-L port1:master-host-name:port2specifies local port forwarding from the specified Cloud Shell port1 to cluster master-host-name:port2.
gcloudcommand creates an SSH tunnel that operates independently from other SSH shell sessions, keeps tunnel-related errors out of the shell output, and helps prevent inadvertent closures of the tunnel.
Configure your browser
Your SSH tunnel supports traffic proxying using the SOCKS protocol. To configure your browser to use the proxy, start a new browser session with proxy server parameters. Here's an example that uses the Google Chrome browser
/usr/bin/google-chrome \ --proxy-server="socks5://localhost:1080" \ --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" \ --user-data-dir=/tmp/master-host-name
Mac OS X
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \ --proxy-server="socks5://localhost:1080" \ --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" \ --user-data-dir=/tmp/master-host-name
"%ProgramFiles(x86)%\Google\Chrome\Application\chrome.exe" ^ --proxy-server="socks5://localhost:1080" ^ --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" ^ --user-data-dir="%Temp%\master-host-name"
This command uses the following Google Chrome browser flags:
-proxy-server="socks5://localhost:1080"tells Chrome to send all
https://URL requests through the SOCKS proxy server localhost:1080, using version 5 of the SOCKS protocol (change the localhost port value if you are using a different port). Hostnames for URLs are resolved by the proxy server, not locally by Chrome.
--host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost"prevents Chrome from sending DNS requests over the network.
--user-data-dir=/tmp/master-host-nameforces Chrome to open a new window that is not tied to an existing Chrome session. Without this flag, Chrome may open a new window attached to an existing Chrome session, ignoring your
--proxy-serversetting. The value set for
--user-data-dircan be any non-existent path.
Connect to the cluster interface
Once your local browser is configured to use the proxy, you can navigate to the
Web interface URL on your Cloud Dataproc cluster (see
The browser URL has the following format and content:
http://cluster-name-m:port (cluster interface port).
Click the Cloud Shell Web Preview button , and then select either:
- "Preview on port 8080", or
- "Change port" and insert the port number in the dialog
gcloud compute sshcommand in Create an SSH tunnel.
A browser window opens that connects to the Web interface port on the cluster master node. The following screenshot shows a browser window that connects through Cloud Shell port 8084 to a Cloud Datalab notebook interface running on the cluster master node.
FAQ And Debugging Tips
What if I don't see the UI in my browser?
If you don't see the UIs in your browser, the two most common reasons are:
You have a network connectivity issue, possibly due to a firewall. Run the following command to see if you can SSH to the master instance. If you can't, it signals a connectivity issue.
Linux/Mac OS X
gcloud compute ssh cluster-name-m \ --project=project-id
gcloud compute ssh cluster-name-m ^ --project=project-id
Another proxy is interfering with the SOCKS proxy. To check the proxy, run the following
curlcommand (available on Linux and Mac OS X):
Linux/Mac OS X
curl -Is --socks5-hostname localhost:1080 http://cluster-name-m:8088
curl.exe -Is --socks5-hostname localhost:1080 http://cluster-name-m:8088
Why should I use a SOCKS proxy instead of local port forwarding?
Instead of the SOCKS proxy, it's possible to access Web application UIs running
on your master instance with SSH local port forwarding, which
forwards the master's port to a local port. For example, the following command lets
localhost:1080 to reach
cluster-name-m:8088 without SOCKS:
Linux/Mac OS X
gcloud compute ssh cluster-name-m \ --project=project-id -- \ -L 1080:cluster-name-m:8088 -N -n
gcloud compute ssh cluster-name-m ^ --project=project-id -- ^ -L 1080:cluster-name-m:8088 -N -n
Using a SOCKS proxy is recommended over local port forwarding since the proxy:
- allows you to access all Web application ports without having to set up a port forward tunnel for each UI port
- allows the Spark and Hadoop Web UIs to correctly resolve DNS hosts