Although Looker application monitoring may not seem like it is strictly required, it is very important to set up on customer-hosted instances. In the rare instance that something goes wrong with your server, it is often much more difficult or impossible for Looker to help you fix the issue unless you can provide appropriate monitoring information from the time of the incident.
Application monitoring
URL
There are two simple ways to validate that your Looker instance is running.
Append
/alive
to your Looker instance's URL like this:https://instance_name.looker.com/alive
If your instance is able to respond to a web request you'll receive a 200 OK HTTP status code.
Append
/availability
to your Looker instance's URL like this:https://instance_name.looker.com/availability
This URL performs a more complete check of several underlying subsystems and will also respond with a 200 OK HTTP status code if all is well.
JMX
The Java virtual machine that runs Looker may be monitored via JMX.
Many monitoring applications such as Zabbix and Nagios support JMX. See your monitoring application's documentation for more information.
Edit the Looker startup script
To enable JMX monitoring, you will need to edit your Looker startup script. By default it is named:
/home/looker/looker/looker
Look for the Java startup parameters:
java \
-XX:+UseG1GC -XX:MaxGCPauseMillis=2000 \
-Xms$JAVAMEM -Xmx$JAVAMEM \
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps \
-Xloggc:/tmp/gc.log ${JAVAARGS} \
-jar looker.jar start ${LOOKERARGS}
Starting in Looker 6.18, the Looker JAR file has been split into two separate JAR files: the Looker core JAR file and a Looker dependencies JAR file. Upon starting, the core JAR file will automatically start the dependencies JAR file. Both JAR files must be in the same directory so that the core JAR file can successfully find and start the dependencies JAR file.
By default, the --no-daemonise
startup option is not set. If you have not set the --no-daemonise
option, add a section following the line starting with -Xms$JAVAMEM
:
-Dcom.sun.akuma.jvmarg.com.sun.management.jmxremote \
-Dcom.sun.akuma.jvmarg.com.sun.management.jmxremote.port=9910 \
-Dcom.sun.akuma.jvmarg.com.sun.management.jmxremote.ssl=false \
-Dcom.sun.akuma.jvmarg.com.sun.management.jmxremote.local.only=false \
-Dcom.sun.akuma.jvmarg.com.sun.management.jmxremote.authenticate=true \
-Dcom.sun.akuma.jvmarg.com.sun.management.jmxremote.access.file=${HOME}/.lookerjmx/jmxremote.access \
-Dcom.sun.akuma.jvmarg.com.sun.management.jmxremote.password.file=${HOME}/.lookerjmx/jmxremote.password \
If you have set the --no-daemonise
startup option, add a section following the line starting with -Xms$JAVAMEM
:
-Dcom.sun.management.jmxremote \
-Dcom.sun.management.jmxremote.port=9910 \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.local.only=false \
-Dcom.sun.management.jmxremote.authenticate=true \
-Dcom.sun.management.jmxremote.access.file=${HOME}/.lookerjmx/jmxremote.access \
-Dcom.sun.management.jmxremote.password.file=${HOME}/.lookerjmx/jmxremote.password \
Create the .lookerjmx
directory
Next, create the .lookerjmx
directory under your Looker user's home directory, and set permissions:
sudo su - looker
mkdir ~/.lookerjmx
chmod 700 ~/.lookerjmx
cd ~/.lookerjmx
Create the JMX files
Using your favorite text editor create a file in the new directory named jmxremote.access
with the following contents (you may customize for your environment):
monitorRole readonly
controlRole readwrite \
create javax.management.monitor.*,javax.management.timer.* \
unregister
Next create a file named jmxremote.password
in the same directory with the following
contents, using your own secure passwords:
monitorRole some_password_here
controlRole some_password_here
Setting permissions
Make it such that Java (and therefore Looker) will not start if the file permissions allow anyone except the Looker user to read the password file.
chmod 400 jmxremote.*
Restart Looker
Looker needs to be restarted to enable JMX. Make sure you run this *as the Looker user and not root*:
cd ~/looker
./looker restart
Your Looker instance is now configured for remote JMX monitoring on port 9910, using the password you supplied. You may need to modify your firewall settings or network ACLs to allow your monitoring server to get network access on this port.
Host monitoring
For every host running the Looker application, we recommend that you collect, graph and alert on at least the following performance metrics:
- CPU Utilization: load and percent CPU utilized
- Memory Utilization: total used and swap used
- Disk Usage
Alerting thresholds
To establish good alerting thresholds, first establish a baseline. Collect performance data with your Looker instance running under a normal load. Take a look at the performance graphs and observe the peaks. The length of time you will need to establish the baselines depends on your business and your Looker usage patterns. Some companies may use Looker in a stable, repeatable pattern every week during business hours. Others may use Looker more heavily at specific times (such as the end of each month).
In general, alerts should only be sent for events that are actionable. Sending alerts when there is nothing which needs to be done masks the importance of critical alerts.
The following thresholds may be used as a starting point for alerting. When the following values are exceeded for 15 minutes or more, manual intervention may be required.
Metric | Warning | Critical | Comments |
---|---|---|---|
CPU Load | 2 | 4 | Load should generally be 1 or less for a single-core system. Sustained high load leads to poor performance. |
CPU % Used | 80 | 90 | High CPU usage leads to poor performance. |
Memory % Used | 60 | 70 | High memory usage can indicate too much memory is allocated to Java. |
Disk % Used | 80 | 90 | Ensure the disk isn't full. |
Additional notes:
- Systems with more than one core can handle high CPU loads without reduced performance. The rule of thumb is that sustained load should not be greater than the number of processor cores.
- The percent of total CPU time in use before a system experiences performance degradation scales with the number of CPU cores in the system. In other words, a single-core system may have poor performance when the CPU is 80% utilized, whereas a sixteen-core host may still be usable at 95% utilization.
- High sustained CPU utilization can be rectified by updating the host hardware, or upgrading to a larger instance. Sometimes large numbers of scheduled Looks or long-query derived tables can be reduced or made more efficient to improve performance.
Next steps
After you have set up monitoring, you're ready to set up Looker backups.