Diese Seite ist Teil einer mehrteiligen Reihe, in der das Hosting von Looker, die Bereitstellungsmethoden und Best Practices für die beteiligten Komponenten erörtert werden. Auf dieser Seite werden gängige Vorgehensweisen für bestimmte Komponenten der Looker-Architektur erläutert und es wird beschrieben, wie diese innerhalb einer Bereitstellung konfiguriert werden.
Diese Reihe besteht aus drei Teilen:
- Übersicht über die vom Kunden gehostete Infrastruktur
- Muster der vom Kunden gehosteten Infrastrukturarchitektur
- Schritt-für-Schritt-Anleitungen für vom Kunden gehostete Infrastrukturkomponenten (diese Seite)
Looker hat eine Reihe von Abhängigkeiten für das Hosting des Servers, die Bedienung von Ad-hoc- und geplanten Arbeitslasten, die Verfolgung der iterativen Modellentwicklung usw. Diese Abhängigkeiten werden auf dieser Seite als Komponenten bezeichnet. Jede Komponente wird in den folgenden Abschnitten ausführlich behandelt:
- Hostkonfiguration
- JVM-Konfiguration
- Startoptionen von Looker
- Interne (Back-End-)Datenbank
- Git-Dienst
- Netzwerk
Hostkonfiguration
Betriebssystem und Verteilung
Looker läuft gut auf den gängigsten Linux-Versionen: RedHat, SUSE und Debian/Ubuntu. Ableitungen dieser Verteilungen, die für die Ausführung in einer bestimmten Umgebung konzipiert und optimiert sind, sind im Allgemeinen in Ordnung. Beispielsweise sind die Google Cloud- oder AWS-Distributionen von Linux mit Looker kompatibel. Debian/Ubuntu ist die am häufigsten verwendete Linux-Vielfalt in der Looker-Community. Diese Versionen sind dem Looker-Support am vertrautesten. Am einfachsten ist es, Debian/Ubuntu oder ein von Debian/Ubuntu abgeleitetes Betriebssystem eines bestimmten Cloud-Anbieters zu verwenden.
Looker wird auf der Java Virtual Machine (JVM) ausgeführt. Überlegen Sie bei der Auswahl einer Distribution, ob die Versionen von OpenJDK 8 aktuell sind. Looker kann möglicherweise auch mit älteren Linux-Distributionen ausgeführt werden. Die Java-Version und -Bibliotheken, die Looker für bestimmte Funktionen benötigt, müssen jedoch auf dem neuesten Stand sein. Wenn die JVM nicht alle empfohlenen Looker-Bibliotheken und -Versionen enthält, funktioniert Looker nicht normal. Derzeit erfordert Looker Java HotSpot 1.8 Update 161+ oder OpenJDK 8 181+.
CPU und Arbeitsspeicher
4x16-Knoten (4 CPUs und 16 GB RAM) sind ausreichend für ein Entwicklungssystem oder eine Looker-Instanz zum einfachen Testen, die von einem kleinen Team verwendet wird. In einer Produktionsumgebung reicht dies jedoch in der Regel nicht aus. Unserer Erfahrung nach bieten 16 x 64-Knoten (16 CPUs und 64 GB RAM) ein gutes Gleichgewicht zwischen Preis und Leistung. Mehr als 64 GB RAM können die Leistung beeinträchtigen, da Ereignisse für die automatische Speicherbereinigung Single-Threads sind und die Ausführung aller anderen Threads anhalten.
Festplattenspeicher
100 GB Speicherplatz sind für ein Produktionssystem in der Regel mehr als ausreichend.
Überlegungen zu Clustern
Looker wird auf einer Java JVM ausgeführt, und Java kann Schwierigkeiten bei der Verwaltung von Speicher über 64 GB haben. Allgemein gilt: Wenn mehr Kapazität benötigt wird, sollten Sie dem Cluster zusätzliche 16 x 64-Knoten hinzufügen, anstatt die Knotengröße über 16 x 64 zu erhöhen. Für eine höhere Verfügbarkeit bevorzugen wir möglicherweise auch eine geclusterte Architektur.
In einem Cluster müssen die Looker-Knoten bestimmte Teile des Dateisystems gemeinsam nutzen. Die freigegebenen Daten umfassen Folgendes:
- LookML-Modelle
- Die LookML-Modelle für Entwickler
- Git-Server-Konnektivität
Da das Dateisystem freigegeben ist und eine Reihe von Git-Repositories hostet, ist der Umgang mit gleichzeitigem Zugriff und Dateisperrung von entscheidender Bedeutung. Das Dateisystem muss POSIX-konform sein. Netzwerkdateisystem (Network File System, NFS) funktioniert bekanntermaßen und ist unter Linux problemlos verfügbar. Sie sollten eine zusätzliche Linux-Instanz starten und NFS so konfigurieren, dass das Laufwerk freigegeben wird. Das Standard-NFS ist möglicherweise ein Single Point of Failure. Erwägen Sie daher eine Failover- oder Hochverfügbarkeitseinrichtung.
Die Metadaten von Looker müssen ebenfalls zentralisiert werden, daher muss die interne Datenbank zu MySQL migriert werden. Dies kann ein MySQL-Dienst oder eine dedizierte MySQL-Bereitstellung sein. Im Abschnitt Interne (Back-End-)Datenbank auf dieser Seite finden Sie weitere Informationen.
JVM-Konfiguration
Die JVM-Einstellungen von Looker werden im Looker-Startup-Skript definiert. Nach allen Aktualisierungen muss Looker neu gestartet werden, damit Änderungen am Manifest vorgenommen werden. Die Standardkonfigurationen sind wie folgt:
java \ -XX:+UseG1GC -XX:MaxGCPauseMillis=2000 \ -Xms$JAVAMEM -Xmx$JAVAMEM \ -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps \ -Xloggc:/tmp/gc.log ${JAVAARGS} \ -jar looker.jar start ${LOOKERARGS}
Ressourcen
Die Ressourceneinstellungen werden im Looker-Startup-Skript definiert.
JAVAMEM="2300m" METAMEM="800m"
Bei der Arbeitsspeicherzuweisung für die JVM muss der Overhead des Betriebssystems berücksichtigt werden, auf dem Looker ausgeführt wird. Als Faustregel gilt: Die JVM kann bis zu 60% des Gesamtarbeitsspeichers zugewiesen werden. Je nach Größe der Maschine gibt es jedoch auch Nachteile. Für Rechner mit mindestens 8 GB Gesamtspeicher empfehlen wir eine Zuweisung von 4–5 GB für Java und 800 MB für Meta. Bei größeren Maschinen kann dem Betriebssystem ein geringerer Anteil des Arbeitsspeichers zugewiesen werden. Für Maschinen mit insgesamt 60 GB Arbeitsspeicher empfehlen wir beispielsweise eine Zuweisung von 36 GB zu Java und 1 GB für Meta. Es ist wichtig zu beachten, dass die Arbeitsspeicherzuweisung in Java in der Regel mit dem Gesamtspeicher der Maschine skaliert werden sollte, aber Meta sollte bei 1 GB ausreichen.
Da Looker die Systemressourcen für das Rendering mit anderen Prozessen wie Chromium teilt, sollte die Größe des Java-Speichers im Kontext der erwarteten Renderinglast und der Größe des Computers ausgewählt werden. Wenn eine geringe Rendering-Auslastung zu erwarten ist, kann Java ein größerer Anteil des gesamten Arbeitsspeichers zugewiesen werden. Beispielsweise könnte Java auf einer Maschine mit insgesamt 60 GB Arbeitsspeicher sicher 46 GB zugewiesen werden, was mehr als die allgemeine Empfehlung von 60% ist. Die genauen Ressourcenzuweisungen für jede Bereitstellung variieren. Verwenden Sie daher 60% als Basis und passen Sie sie je nach Nutzung an.
Automatische Speicherbereinigung
Looker bevorzugt die Verwendung des modernsten Garbage Collectors für seine Java-Version. Standardmäßig beträgt das Zeitlimit für die automatische Speicherbereinigung 2 Sekunden. Sie können dies jedoch ändern, indem Sie in der Startkonfiguration die folgende Option bearbeiten:
-XX:MaxGCPauseMillis=2000
Auf einer größeren Maschine mit mehreren Kernen kann das GC-Zeitlimit verkürzt werden.
Logs
Standardmäßig werden die GC-Logs von Looker in /tmp/gc.log
gespeichert. Sie können dies ändern, indem Sie die folgende Option in der Startkonfiguration bearbeiten:
-Xloggc:/tmp/gc.log
JMX
Die Verwaltung von Looker benötigt möglicherweise ein Monitoring, um die Ressourcenbeschaffung zu optimieren. Wir empfehlen die Verwendung von JMX, um die JVM-Arbeitsspeichernutzung zu überwachen.
Looker-Startoptionen
Die Startoptionen sind in einer Datei namens lookerstart.cfg
gespeichert. Diese Datei stammt aus dem Shell-Skript, das Looker startet.
Thread-Pools
Als Multithread-Anwendung verfügt Looker über eine Reihe von Thread-Pools. Diese Thread-Pools reichen vom Kern-Webserver bis hin zu speziellen Subdiensten wie Planung, Rendering und externen Datenbankverbindungen. Je nach Ihren Geschäftsworkflows müssen diese Pools möglicherweise von der Standardkonfiguration geändert werden. Insbesondere für die Clustertopologien, die auf der Seite mit den Best Practices unter Muster und Best Practices für vom Kunden gehostete Infrastrukturarchitekturen erwähnt werden, sollten Sie besondere Überlegungen anstellen.
Optionen für hohen Planungsdurchsatz
Fügen Sie für alle Nicht-Planer-Knoten der Umgebungsvariablen LOOKERARGS
in lookerstart.cfg
--scheduler-threads=0
hinzu. Ohne Planer-Threads werden auf diesen Knoten keine geplanten Jobs ausgeführt.
Fügen Sie für alle dedizierten Planerknoten --scheduler-threads=<n>
zur Umgebungsvariable LOOKERARGS
in lookerstart.cfg
hinzu. Looker beginnt standardmäßig mit 10 Planer-Threads. Dieser Wert kann jedoch auf <n> erhöht werden. Mit <n> Planer-Threads kann jeder Knoten <n> gleichzeitige Planungsjobs ausführen. Es wird generell empfohlen, <n> weniger als die doppelte Anzahl von CPUs zu verwenden. Der größte empfohlene Host ist einer mit 16 CPUs und 64 GB Arbeitsspeicher. Die Obergrenze der Planer-Threads sollte also kleiner als 32 sein.
Optionen für hohen Rendering-Durchsatz
Fügen Sie für alle Nicht-Rendering-Knoten --concurrent-render-jobs=0
zu environment variable in
lookerstart.cfg
. Without renderer nodes, no render jobs will run on these nodes.
For all dedicated render nodes, add --concurrent-render-jobs=<n>
to the LOOKERARGS
environment variable in lookerstart.cfg
. Looker starts with two render threads by default, but this can be increased to <n>. With <n> render threads, each node will be capable of executing <n> concurrent render jobs.
Each render job can utilize a significant amount of memory. Budget about 2 GB per render job. For example, if the core Looker process (Java) is allocated 60% of the total memory and 20% of the remaining memory is reserved for the operating system, that leaves the last 20% for render jobs. On a 64 GB machine, that leaves 12 GB, which is enough for 6 concurrent render jobs. If a node is dedicated to rendering and is not included in the load balancer pool that is handling interactive jobs, the core Looker process memory can be reduced to allow for more render jobs. On a 64 GB machine, one might allocate approximately 30% (20 GB) to the Looker core process. Reserving 20% for general OS use, that leaves 50% (32 GB) for rendering, which is enough for 16 concurrent render jobs.
Internal (backend) database
The Looker server maintains information about its own configuration, database connections, users, groups, and roles, folders, user-defined Looks and dashboards, and various other data in an internal database.
For a standalone Looker instance of moderate size, this data is stored within an in-memory HyperSQL database embedded in the Looker process itself. The data for this database is stored in the file <looker install directory>/.db/looker.script
. Although convenient and lightweight, this database experiences performance issues with heavy usage. Therefore, we recommend starting with a remote MySQL database. If this isn't feasible, we recommend migration to a remote MySQL database once the ~/looker/.db/looker.script
file reaches 600 MB. Clusters must use a MySQL database.
The Looker server makes many small reads and writes to the MySQL database. Every time a user runs a Look or an Explore, Looker will check the database to verify that the user is still logged in, the user has privileges to access the data, the user has privileges to run the Look or Explore, etc. Looker will also write data to the MySQL database, including the actual SQL that was run, the time the request started and ended, etc. A single interaction between a user and the Looker application could result in 15 or 20 small reads and writes to the MySQL database.
MySQL
The MySQL server should be version 8.0.x, and must be configured to use utf8mb4 encoding. The InnoDB storage engine must be used. The setup instructions for MySQL, as well as instructions for how to migrate data from an existing HyperSQL database to MySQL, are available on the Migrating the Looker backend database to MySQL documentation page.
When configuring Looker to use MySQL, a YAML file must be created containing the connection information. Name the YAML file looker-db.yml
and add the setting -d looker-db.yml
in the LOOKERARGS
section of the lookerstart.cfg
file.
MariaDB
MySQL is dual-licensed, available both as open source and as a commercial product. Oracle has continued to enhance MySQL, and MySQL is forked as MariaDB. The MariaDB equivalent versions of MySQL are known to work with Looker, but they aren't developed for or tested by Looker's engineering teams; therefore, functionality is not supported or guaranteed.
Cloud versions
If you host Looker in your cloud infrastructure, it is logical to host the MySQL database in the same cloud infrastructure. The three major cloud vendors — Amazon AWS, Microsoft Azure, and Google Cloud. The cloud providers manage much of the maintenance and configuration for the MySQL database and offer services to help manage backups, provide rapid recovery, etc. These products are known to work well with Looker.
System Activity queries
The MySQL database is used to store information about how users are using Looker. Any Looker user who has permission to view the System Activity model has access to a number of prebuilt Looker dashboards to analyze this data. Users can also access Explores of Looker metadata to build additional analysis. The MySQL database is primarily used for small, fast, "operational" queries. The large, slow, "analytic" queries generated by the System Activity model can compete with these operational queries and slow Looker down.
In these cases, the MySQL database can be replicated to another database. Both self-managed and certain cloud-managed systems provide simple configuration of replication to other databases. Configuring replication is outside the scope of this document.
In order to use the replica for the System Activity queries, you will create a copy of the looker-db.yml
file, for example named looker-usage-db.yml
, modify it to point to the replica, and add the setting --internal-analytics-connection-file looker-usage-db.yml
to the LOOKERARGS
section of the lookerstart.cfg
file.
The System Activity queries can run against a MySQL instance or a Google BigQuery database. They are not tested against other databases.
MySQL performance configuration
In addition to the settings required to migrate the Looker backend database to MySQL, highly active clusters may benefit from additional tuning and configuration. These settings can be made to the /etc/my.cnf
file, or through the Cloud Console for cloud-managed instances.
The my.cnf
configuration file is divided into several sections. The setting changes discussed below are made in the [mysqld]
section.
Set the InnoDB buffer pool size
The InnoDB buffer pool size is the total RAM that is used to store the state of the InnoDB data files in memory. If the server is dedicated to running MySQL, the innodb_buffer_pool_size
should be set to 50%-70% of total system memory.
If the total size of the database is small, it is allowable to set the InnoDB buffer pool to the size of the database rather than 50% or more of memory.
For this example, a server has 64 GB of memory; therefore, the InnoDB buffer pool should be between 32 GB and 45 GB. Bigger is typically better.
[mysqld]
...
innodb_buffer_pool_size=45G
Set the InnoDB buffer pool instances
When multiple threads attempt to search a large buffer pool, they could contend. To prevent this, the buffer pool is divided into smaller units that can be accessed by different threads without conflict. By default, the buffer pool is divided into 8 instances. This creates the potential for an 8 thread bottleneck. Increasing the number of buffer pool instances reduces the chance of a bottleneck. The innodb_buffer_pool_instances should be set so that each buffer pool gets at least 1 GB of memory.
[mysqld]
...
innodb_buffer_pool_instances=32
Optimize the InnoDB log file
When a transaction is committed, the database has the option to update the data in the actual file, or it can save details about the transaction in the log. If the database crashes before the data files have been updated, the log file can be "replayed" to apply the changes. Writing to the log file is a simple append operation. It is efficient to append to the log at commit time, then batch up multiple changes to the data files and write them in a single IO operation. When the log file is filled, the database has to pause processing new transactions and write all the changed data back to disk.
As a general rule of thumb, the InnoDB log file should be large enough to contain 1 hour of transactions.
There are typically two InnoDB log files. They should be about 25% of your InnoDB buffer pool. For an example database with a 32 GB buffer pool, the InnoDB log files should total 8 GB, or 4 GB per file.
[mysqld]
...
innodb_log_file_size=8GB
Configure InnoDB IO capacity
MySQL will throttle the speed at which writes are recorded to the disk so as not to overwhelm the server. The default values are conservative for most servers. For best performance use the sysbench
utility to measure the random write speed to the data disk, then use that value to configure the IO capacity so that MySQL will write data more quickly.
On a cloud-hosted system, the cloud vendor should be able to report the performance of the disks used for data storage. For a self-hosted MySQL server, measure the speed of random writes to the data disk in IO operations per second (IOPS). The Linux utility sysbench
is one way to measure this. Use that value for the innodb_io_capacity_max
, and a value one-half to three-quarters of that for innodb_io_capacity
. So, in the example below, we would see the values if we measured 800 IOPS.
[mysqld]
...
innodb_io_capacity=500
innodb_io_capacity_max=800
Configure InnoDB threads
MySQL will open at least one thread for each client being served. If many clients are connected simultaneously, that can lead to a huge number of threads being processed. This can cause the system to spend more time swapping than processing.
Benchmarking should be done to determine the ideal number of threads. To test, set the number of threads between the number of CPUs (or CPU cores) on the system and 4x the number of CPUs. For a 16-core system, this value is likely between 16 and 64.
[mysqld]
...
innodb_thread_concurrency=32
Transaction durability
A transaction value of 1 forces MySQL to write to disk for every transaction. If the server crashes, the transaction won't be lost, but database performance will be impacted. Setting this value to 0 or 2 can improve performance, but it will come at the risk of losing a couple of seconds' worth of data transactions.
[mysqld]
...
innodb_flush_log_at_trx_commit=1
Set the flush method
The operating system normally does buffering of writes to the disk. Since MySQL and the OS are both buffering, there is a performance penalty. Reducing the flush method one layer of buffering can improve performance.
[mysqld]
...
innodb_flush_method=O_DIRECT
Enable one file per table
By default, MySQL will use a single data file for all data. The innodb_file_per_table
setting will create a separate file for each table, which improves performance and data management.
[mysqld]
...
innodb_file_per_table=ON
Disable stats on metadata
This setting disables the collection of stats on internal metadata tables, improving read performance.
[mysqld]
...
innodb_stats_on_metadata=OFF
Disable the query cache
The query cache is deprecated, so setting the query_cache_size
and query_cache_type
to 0 disables it.
[mysqld]
...
query_cache_size=0
query_cache_type=0
Enlarge the join buffer
The join_buffer
is used to perform joins in memory. Increasing it can improve certain operations.
[mysqld]
...
join_buffer_size=512KB
Enlarge the temporary table and max heap sizes
The tmp_table_size
and max_heap_table_size
set reasonable defaults for temporary in-memory tables, before they are forced to disk.
[mysqld
...
tmp_table_size=32MB
max_heap_table_size=32MB
Adjust the table open cache
The table_open_cache
setting determines the size of the cache that holds the file descriptors for open tables. The table_open_cache_instances
setting breaks the cache into a number of smaller parts. There is a potential for thread contention in the table_open_cache
, so dividing it into smaller parts helps increase concurrency.
[mysqld]
...
table_open_cache=2048
table_open_cache_instances=16
Git service
Looker is designed to work with a Git service to provide version management of the LookML files. Major Git hosting services are supported, including GitHub, GitLab, Bitbucket, etc. Git service providers offer additional value adds such as a GUI to view code changes and support for workflows like pull requests and change approvals. If required, Git can be run on a plain Linux server.
If a Git hosting service is not appropriate for your deployment because of security rules, many of these service providers offer versions that can be run in your own environment. GitLab, in particular, is commonly self-hosted and can be used as an open source product with no license cost or as a supported licensed product. GitHub Enterprise is available as a self-hosted service and is a supported commercial product.
The following sections list nuances for the most common service providers.
GitHub/GitHub Enterprise
The Setting up and testing a Git connection documentation page uses GitHub as an example.
GitLab/gitlab.com
Refer to the Using GitLab for version control in Looker Looker Community post for detailed setup steps for GitLab. If your repo is contained within subgroups, these can be added to the repo URL using either the HTTPS or SSH format:
https://gitlab.com/accountname/subgroup/reponame
git@gitlab.com:accountname/subgroup/reponame.git
Additionally, there are three different ways you can store Looker-generated SSH keys in GitLab: as a user SSH key, as a repository deploy key, and as a global shared deploy key. A more in-depth explanation can be found in the GitLab documentation.
Google Cloud Source
Refer to the Using Cloud Source Repositories for version control in Looker Community Post for steps to set up Git with Cloud Source Repositories.
Bitbucket Cloud
Refer to the Using Bitbucket for version control in Looker Community Post for steps for setting up Git with Bitbucket Cloud. As of August 2021, Bitbucket Cloud does not support secrets on deploy webhooks.
Bitbucket Server
To use pull requests with Bitbucket Server, you may need to complete the following steps:
-
When you open a pull request, Looker will automatically use the default port number (7999) in the URL. If you are using a custom port number, you will need to replace the port number in the URL manually.
-
You will need to hit the project's deploy webhook to sync the production branch in Looker with the repo's master branch.
Phabricator diffusion
Refer to the Setting up Phabricator and Looker for version control Community Post for steps on setting up Git with Phabricator.
Network
Inbound connections
Looker web application
By default, Looker listens for HTTPS requests on port 9999. Looker uses a self-signed certificate with a CN of self-signed.looker.com
. The Looker server can alternately be configured to do the following:
- Accept HTTP connections from an SSL-termination load balancer/proxy, with the
--ssl-provided-externally-by=<s>
startup flag. The value should either be set to the IP address of the proxy, or to a host name that can be locally resolved to the IP address of the proxy. Looker will accept HTTP connections only from this IP address.
- Use a customer supplied SSL certificate, with the
--ssl-keystore=<s>
startup flag.
Looker API
The Looker API listens on port 19999. If the installation requires access to the API, then the load balancer should have the requisite forwarding rules. The same SSL considerations apply as with the main web application. We recommend using a distinct port from the web application.
Load balancers
A load balancer is often used to accept an HTTPS request at port 443 using the customer's certificate, then forward the request to the Looker server node at port 9999 using the self-signed certificate or HTTP. If load balancers are using Looker's self-signed certificate, they must be configured to accept that certificate.
Idle connections and timeouts
When a user starts a large request in Looker, that could result in a query that could be expensive to run on the database. If the user abandons that request in any way — by shutting the lid on their laptop, disconnecting from the network, killing that tab in the browser, etc. — Looker wants to know and terminate that database query.
To handle this situation, when the client web application makes a request to run a database query, the browser will open a socket connection via a long-lived HTTP request to the Looker server. This connection will sit open and idle. This socket will get disconnected if the client web application is killed or disconnected in any way. The server will see that disconnect and cancel any related database queries.
Load balancers often notice these open idle connections and kill them. In order to run Looker effectively, the load balancer must be configured to allow this connection to remain open for as long as the longest query a user might run. A timeout of at least 60 minutes is suggested.
Outbound connections
Looker servers can have unrestricted outbound access to all resources, including the public internet. This simplifies many tasks, such as installing Chromium, which requires access to the package repositories for the Linux distribution.
The following are outbound connections that Looker may need to make.
Internal database connection
By default, MySQL listens for connections on port 3306. The Looker nodes must be able to initiate connections to MySQL on this port. Depending on how the repository is hosted, you may need to traverse a firewall.
External services
Looker's telemetry and license servers are available via HTTPS on the public internet. Traffic from a Looker node to ping.looker.com:443
and license.looker.com:443
may need to be added to an allowlist.
Data warehouse connections
Cloud-hosted databases may require a connection via the public internet. For example, if you are using BigQuery, then accounts.google.com:443
and www.googleapis.com:443
may need to be added to an allowlist. If the database is outside of your own infrastructure, consult with your database host for network details.
SMTP services
By default, Looker sends outgoing mail via SendGrid. That may require adding smtp.sendgrid.net:587
to an allowlist. The SMTP settings can be changed in the configuration to use a different mail handler as well.
Action hubs, action servers, and webhooks
Many scheduler destinations, in particular webhooks and the ones that are enabled in the Looker Admin panel, involve sending data via HTTPS requests.
- For webhooks, these destinations are specified at runtime by users, and may be contrary to the goal of firewalling outbound connections.
- For an action hub, these requests are sent to
actions.looker.com
. Details can be found in our Looker Action Hub configuration documentation.
- For other action servers, these requests are sent to the domains specified in the action server's configuration by administrators in the Looker Admin panel.
Proxy server
If the public internet cannot be reached directly, Looker can be configured to use a proxy server for HTTP(S) requests by adding a line like the following to lookerstart.cfg
:
JAVAARGS="-Dhttp.proxyHost=myproxy.example.com
-Dhttp.proxyPort=8080
-Dhttp.nonProxyHosts=127.0.0.1|localhost
-Dhttps.proxyHost=myproxy.example.com
-Dhttps.proxyPort=8080"
Note that internode communications happen over HTTPS, so if you use a proxy server and your instance is clustered, you will usually want to add the IPs/host names for all the nodes in the cluster to the Dhttp.nonProxyHosts
argument.
Internode communications
Internal host identifier
Within a cluster, each node must be able to communicate with the other nodes. To allow this, the host name or IP address of each node is specified in the startup configuration. When the node starts up, this value will be written into the MySQL repository. Other members of the cluster can then refer to those values to communicate with this node. To specify the host name or IP address in the startup configuration, add -H node1.looker.example.com
to the LOOKERARGS
environment variable in lookerstart.cfg
.
Since the host name must be unique per node, the lookerstart.cfg
file needs to be unique on each instance. As an alternative to hardcoding the host name or IP address, the command hostname -I
or hostname --fqdn
can be used to find these at runtime. To implement this, add -H $(hostname -I)
or -H $(hostname --fqdn)
to the LOOKERARGS
environment variable in lookerstart.cfg
.
Internal ports
In addition to the ports 9999 and 19999, which are used for the web and API servers, respectively, the cluster nodes will communicate with each other through a message broker service, which uses ports 1551 and 61616. Ports 9999 and 19999 must be open to end-user traffic, but 1551 and 61616 must be open between cluster nodes.