고객 호스팅 아키텍처 솔루션: 구성요소 둘러보기

이 페이지는 Looker 호스팅, 배포 방법론, 관련 구성요소의 권장사항을 다루는 여러 파트로 구성된 시리즈 중 하나입니다. 이 페이지에서는 Looker 아키텍처의 특정 구성요소에 대한 일반적인 권장사항을 살펴보고 배포 내에서 이를 구성하는 방법을 설명합니다.

이 시리즈는 세 부분으로 구성됩니다.

Looker에는 서버 호스팅, 임시 및 예약된 워크로드 서비스, 반복 모델 개발 추적 등 다양한 종속 항목이 있습니다. 이러한 종속 항목을 이 페이지에서는 구성요소라고 하며 다음 섹션에서 각 구성요소를 자세히 설명합니다.

호스트 구성

OS 및 배포

Looker는 가장 일반적인 Linux 버전인 RedHat, SUSE, Debian/Ubuntu에서 원활하게 실행됩니다. 특정 환경에서 실행되도록 설계되고 최적화된 배포판의 파생물은 일반적으로 문제가 없습니다. 예를 들어 Linux의 Google Cloud 또는 AWS 배포판은 Looker와 호환됩니다. Debian/Ubuntu는 Looker 커뮤니티에서 가장 많이 사용되는 Linux 버전으로, Looker 지원에서 가장 친숙한 버전입니다. Debian/Ubuntu에서 파생된 특정 클라우드 제공업체의 경우 Debian/Ubuntu 또는 운영체제를 사용하는 것이 가장 쉽습니다.

Looker는 자바 가상 머신 (JVM)에서 실행됩니다. 배포를 선택할 때 OpenJDK 8의 버전이 최신인지 확인하세요. 이전 버전의 Linux에서는 Looker를 실행할 수 있지만 특정 기능에 필요한 자바 버전 및 라이브러리는 최신 상태여야 합니다. JVM에 모든 권장 Looker 라이브러리와 버전이 포함되어 있지 않으면 Looker가 정상적으로 작동하지 않습니다. 현재 Looker에는 자바 핫스팟 1.8 업데이트 161 이상 또는 OpenJDK 8 181 이상이 필요합니다.

CPU 및 메모리

4x16 (CPU 4개 및 RAM 16GB) 노드는 개발팀 또는 소규모 팀에서 사용하는 기본 테스트 Looker 인스턴스에 충분합니다. 그러나 프로덕션 용도로는 일반적으로 충분하지 않습니다. 경험에 의하면 가격과 성능 사이에서 16x64 노드 (CPU 16개, RAM 64GB)를 적절히 사용할 수 있습니다. 가비지 컬렉션 이벤트는 단일 스레드이고 다른 모든 스레드가 실행을 중지하므로 64GB를 초과하는 RAM은 성능에 영향을 줄 수 있습니다.

디스크 스토리지

일반적으로 프로덕션 시스템에 100GB의 디스크 공간이면 충분합니다.

클러스터 고려사항

Looker는 자바 JVM에서 실행되며 자바는 64GB를 초과하는 메모리를 관리하는 데 문제가 있을 수 있습니다. 일반적으로 용량이 더 필요한 경우 16x64 이상으로 노드 크기를 늘리는 대신 16x64 노드를 클러스터에 추가해야 합니다. 가용성을 높이기 위해 클러스터링된 아키텍처를 선호할 수도 있습니다.

클러스터에서 Looker 노드는 파일 시스템의 특정 부분을 공유해야 합니다. 공유 데이터에는 다음이 포함됩니다.

  • LookML 모델
  • 개발자 LookML 모델
  • Git 서버 연결

파일 시스템은 공유되며 여러 Git 저장소를 호스팅하므로 동시 액세스 처리와 파일 잠금이 매우 중요합니다. 파일 시스템은 POSIX를 준수해야 합니다. 네트워크 파일 시스템 (NFS)은 작동하는 것으로 알려져 있으며 Linux에서 즉시 사용할 수 있습니다. 추가 Linux 인스턴스를 가동하고 NFS를 구성하여 디스크를 공유해야 합니다. 기본 NFS는 단일 장애점일 수 있으므로 장애 조치 설정이나 고가용성 설정을 고려하세요.

Looker의 메타데이터도 중앙 집중화되어야 하므로 내부 데이터베이스를 MySQL로 마이그레이션해야 합니다. MySQL 서비스 또는 전용 MySQL 배포일 수 있습니다. 이 페이지의 내부 (백엔드) 데이터베이스 섹션에 더 자세한 내용이 나와 있습니다.

JVM 구성

Looker의 JVM 설정은 Looker 시작 스크립트 내에서 정의됩니다. 업데이트 후 매니페스트 변경사항을 적용하려면 Looker를 다시 시작해야 합니다. 기본 구성은 다음과 같습니다.

java \
  -XX:+UseG1GC -XX:MaxGCPauseMillis=2000 \
  -Xms$JAVAMEM -Xmx$JAVAMEM \
  -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps \
  -Xloggc:/tmp/gc.log ${JAVAARGS} \
  -jar looker.jar start ${LOOKERARGS}

리소스

리소스 설정은 Looker의 시작 스크립트에서 정의됩니다.

JAVAMEM="2300m"
METAMEM="800m"

JVM의 메모리 할당은 Looker가 실행 중인 운영체제 오버헤드를 고려해야 합니다. 일반적으로 JVM은 총 메모리의 최대 60% 에 할당할 수 있지만 머신 크기에 따라 주의사항이 있습니다. 총 메모리가 최소 8GB인 머신의 경우 자바에 4~5GB를 할당하고 Meta에 800MB를 할당하는 것이 좋습니다. 더 큰 머신의 경우 운영체제에 더 적은 양의 메모리를 할당할 수 있습니다. 예를 들어 총 메모리가 60GB인 머신의 경우 36GB를 자바에 할당하고 1GB를 Meta에 할당하는 것이 좋습니다. 자바의 메모리 할당은 일반적으로 시스템의 총 메모리와 함께 확장되어야 하지만 Meta는 1GB로 충분해야 합니다.

또한 Looker는 렌더링을 위해 Chromium과 같은 다른 프로세스와 시스템 리소스를 공유하므로 자바에 할당된 메모리 양은 예상 렌더링 로드 및 머신 크기와 관련하여 선택해야 합니다. 렌더링 로드가 낮을 것으로 예상되는 경우 자바에 총 메모리 중 더 많은 부분을 할당할 수 있습니다. 예를 들어 총 메모리가 60GB인 머신에서 자바는 일반적인 60% 권장사항보다 높은 46GB에 안전하게 할당할 수 있습니다. 각 배포에 적합한 정확한 리소스 할당이 다르므로 60% 를 기준으로 사용하고 사용량에 따라 조정합니다.

가비지 컬렉션

Looker는 자바 버전에서 사용 가능한 최신 가비지 컬렉터를 사용하는 것을 선호합니다. 가비지 컬렉션 제한 시간은 기본적으로 2초지만 시작 구성에서 다음 옵션을 수정하여 변경할 수 있습니다.

-XX:MaxGCPauseMillis=2000

코어가 여러 개 있는 더 큰 머신의 경우 GC 제한 시간이 단축될 수 있습니다.

로그

기본적으로 Looker의 GC 로그는 /tmp/gc.log에 저장됩니다. 시작 구성에서 다음 옵션을 수정하여 변경할 수 있습니다.

-Xloggc:/tmp/gc.log

JMX

Looker 관리에 리소스 조달 개선을 위해 모니터링이 필요할 수 있습니다. JMX를 사용하여 JVM 메모리 사용량을 모니터링하는 것이 좋습니다.

Looker 시작 옵션

시작 옵션은 lookerstart.cfg라는 파일에 저장됩니다. 이 파일은 Looker를 시작하는 셸 스크립트에서 제공됩니다.

스레드 풀

멀티 스레드 애플리케이션의 경우 Looker에는 여러 스레드 풀이 있습니다. 이러한 스레드 풀은 코어 웹 서버에서부터 예약, 렌더링, 외부 데이터베이스 연결과 같은 특화된 하위 서비스에 이르기까지 다양합니다. 비즈니스 워크플로에 따라 이러한 풀을 기본 구성에서 수정해야 할 수 있습니다. 특히 고객 호스팅 인프라 아키텍처 패턴 및 권장사항 권장사항 페이지에 언급된 클러스터 토폴로지와 관련하여 특별히 고려해야 할 사항이 있습니다.

높은 예약 처리량 옵션

스케줄러가 아닌 모든 노드의 경우 --scheduler-threads=0lookerstart.cfgLOOKERARGS 환경 변수에 추가합니다. 스케줄러 스레드가 없으면 이러한 노드에서 예약된 작업이 실행되지 않습니다.

모든 전용 스케줄러 노드의 경우 --scheduler-threads=<n>lookerstart.cfgLOOKERARGS 환경 변수에 추가합니다. Looker는 기본적으로 10개의 스케줄러 스레드로 시작하지만 <n>으로 늘어날 수 있습니다. 스케줄러 스레드 <n>개를 사용하면 각 노드가 동시 일정 작업을 <n>개 실행할 수 있습니다. 일반적으로 CPU 수의 <n>은 2배 미만으로 유지하는 것이 좋습니다. 권장되는 가장 큰 호스트는 CPU 16개와 메모리 64GB가 있는 호스트이므로 스케줄러 스레드의 상한값은 32보다 작아야 합니다.

높은 렌더링 처리량 옵션

렌더링되지 않는 모든 노드의 경우 --concurrent-render-jobs=0 environment variable in lookerstart.cfg. Without renderer nodes, no render jobs will run on these nodes. 에 추가합니다.

For all dedicated render nodes, add --concurrent-render-jobs=<n> to the LOOKERARGS environment variable in lookerstart.cfg. Looker starts with two render threads by default, but this can be increased to <n>. With <n> render threads, each node will be capable of executing <n> concurrent render jobs.

Each render job can utilize a significant amount of memory. Budget about 2 GB per render job. For example, if the core Looker process (Java) is allocated 60% of the total memory and 20% of the remaining memory is reserved for the operating system, that leaves the last 20% for render jobs. On a 64 GB machine, that leaves 12 GB, which is enough for 6 concurrent render jobs. If a node is dedicated to rendering and is not included in the load balancer pool that is handling interactive jobs, the core Looker process memory can be reduced to allow for more render jobs. On a 64 GB machine, one might allocate approximately 30% (20 GB) to the Looker core process. Reserving 20% for general OS use, that leaves 50% (32 GB) for rendering, which is enough for 16 concurrent render jobs.

Internal (backend) database

The Looker server maintains information about its own configuration, database connections, users, groups, and roles, folders, user-defined Looks and dashboards, and various other data in an internal database.

For a standalone Looker instance of moderate size, this data is stored within an in-memory HyperSQL database embedded in the Looker process itself. The data for this database is stored in the file <looker install directory>/.db/looker.script. Although convenient and lightweight, this database experiences performance issues with heavy usage. Therefore, we recommend starting with a remote MySQL database. If this isn't feasible, we recommend migration to a remote MySQL database once the ~/looker/.db/looker.script file reaches 600 MB. Clusters must use a MySQL database.

The Looker server makes many small reads and writes to the MySQL database. Every time a user runs a Look or an Explore, Looker will check the database to verify that the user is still logged in, the user has privileges to access the data, the user has privileges to run the Look or Explore, etc. Looker will also write data to the MySQL database, including the actual SQL that was run, the time the request started and ended, etc. A single interaction between a user and the Looker application could result in 15 or 20 small reads and writes to the MySQL database.

MySQL

The MySQL server should be version 5.7.x, and must be configured to use utf8mb4 encoding. The InnoDB storage engine must be used. The setup instructions for MySQL, as well as instructions for how to migrate data from an existing HyperSQL database to MySQL, are available on the Migrating the Looker backend database to MySQL documentation page.

When configuring Looker to use MySQL, a YAML file must be created containing the connection information. Name the YAML file looker-db.yml and add the setting -d looker-db.yml in the LOOKERARGS section of the lookerstart.cfg file.

MariaDB

MySQL is dual-licensed, available both as open source and as a commercial product. Oracle has continued to enhance MySQL, and MySQL is forked as MariaDB. The MariaDB equivalent versions of MySQL are known to work with Looker, but they aren't developed for or tested by Looker's engineering teams; therefore, functionality is not supported or guaranteed.

Cloud versions

If you host Looker in your cloud infrastructure, it is logical to host the MySQL database in the same cloud infrastructure. The three major cloud vendors — Amazon AWS, Microsoft Azure, and Google Cloud — all offer hosted versions of MySQL 5.7.x. The cloud providers manage much of the maintenance and configuration for the MySQL database and offer services to help manage backups, provide rapid recovery, etc. These products are known to work well with Looker.

System Activity queries

The MySQL database is used to store information about how users are using Looker. Any Looker user who has permission to view the System Activity model has access to a number of prebuilt Looker dashboards to analyze this data. Users can also access Explores of Looker metadata to build additional analysis. The MySQL database is primarily used for small, fast, "operational" queries. The large, slow, "analytic" queries generated by the System Activity model can compete with these operational queries and slow Looker down.

In these cases, the MySQL database can be replicated to another database. Both self-managed and certain cloud-managed systems provide simple configuration of replication to other databases. Configuring replication is outside the scope of this document.

In order to use the replica for the System Activity queries, you will create a copy of the looker-db.yml file, for example named looker-usage-db.yml, modify it to point to the replica, and add the setting --internal-analytics-connection-file looker-usage-db.yml to the LOOKERARGS section of the lookerstart.cfg file.

The System Activity queries can run against a MySQL instance or a Google BigQuery database. They are not tested against other databases.

MySQL performance configuration

In addition to the settings required to migrate the Looker backend database to MySQL, highly active clusters may benefit from additional tuning and configuration. These settings can be made to the /etc/my.cnf file, or through the Cloud Console for cloud-managed instances.

The my.cnf configuration file is divided into several sections. The setting changes discussed below are made in the [mysqld] section.

Set the InnoDB buffer pool size

The InnoDB buffer pool size is the total RAM that is used to store the state of the InnoDB data files in memory. If the server is dedicated to running MySQL, the innodb_buffer_pool_size should be set to 50%-70% of total system memory.

If the total size of the database is small, it is allowable to set the InnoDB buffer pool to the size of the database rather than 50% or more of memory.

For this example, a server has 64 GB of memory; therefore, the InnoDB buffer pool should be between 32 GB and 45 GB. Bigger is typically better.

[mysqld]
...
innodb_buffer_pool_size=45G

Set the InnoDB buffer pool instances

When multiple threads attempt to search a large buffer pool, they could contend. To prevent this, the buffer pool is divided into smaller units that can be accessed by different threads without conflict. By default, the buffer pool is divided into 8 instances. This creates the potential for an 8 thread bottleneck. Increasing the number of buffer pool instances reduces the chance of a bottleneck. The innodb_buffer_pool_instances should be set so that each buffer pool gets at least 1 GB of memory.

[mysqld]
...
innodb_buffer_pool_instances=32

Optimize the InnoDB log file

When a transaction is committed, the database has the option to update the data in the actual file, or it can save details about the transaction in the log. If the database crashes before the data files have been updated, the log file can be "replayed" to apply the changes. Writing to the log file is a simple append operation. It is efficient to append to the log at commit time, then batch up multiple changes to the data files and write them in a single IO operation. When the log file is filled, the database has to pause processing new transactions and write all the changed data back to disk.

As a general rule of thumb, the InnoDB log file should be large enough to contain 1 hour of transactions.

There are typically two InnoDB log files. They should be about 25% of your InnoDB buffer pool. For an example database with a 32 GB buffer pool, the InnoDB log files should total 8 GB, or 4 GB per file.

[mysqld]
...
innodb_log_file_size=8GB

Configure InnoDB IO capacity

MySQL will throttle the speed at which writes are recorded to the disk so as not to overwhelm the server. The default values are conservative for most servers. For best performance use the sysbench utility to measure the random write speed to the data disk, then use that value to configure the IO capacity so that MySQL will write data more quickly.

On a cloud-hosted system, the cloud vendor should be able to report the performance of the disks used for data storage. For a self-hosted MySQL server, measure the speed of random writes to the data disk in IO operations per second (IOPS). The Linux utility sysbench is one way to measure this. Use that value for the innodb_io_capacity_max, and a value one-half to three-quarters of that for innodb_io_capacity. So, in the example below, we would see the values if we measured 800 IOPS.

[mysqld]
...
innodb_io_capacity=500
innodb_io_capacity_max=800

Configure InnoDB threads

MySQL will open at least one thread for each client being served. If many clients are connected simultaneously, that can lead to a huge number of threads being processed. This can cause the system to spend more time swapping than processing.

Benchmarking should be done to determine the ideal number of threads. To test, set the number of threads between the number of CPUs (or CPU cores) on the system and 4x the number of CPUs. For a 16-core system, this value is likely between 16 and 64.

[mysqld]
...
innodb_thread_concurrency=32

Transaction durability

A transaction value of 1 forces MySQL to write to disk for every transaction. If the server crashes, the transaction won't be lost, but database performance will be impacted. Setting this value to 0 or 2 can improve performance, but it will come at the risk of losing a couple of seconds' worth of data transactions.

[mysqld]
...
innodb_flush_log_at_trx_commit=1

Set the flush method

The operating system normally does buffering of writes to the disk. Since MySQL and the OS are both buffering, there is a performance penalty. Reducing the flush method one layer of buffering can improve performance.

[mysqld]
...
innodb_flush_method=O_DIRECT

Enable one file per table

By default, MySQL will use a single data file for all data. The innodb_file_per_table setting will create a separate file for each table, which improves performance and data management.

[mysqld]
...
innodb_file_per_table=ON

Disable stats on metadata

This setting disables the collection of stats on internal metadata tables, improving read performance.

[mysqld]
...
innodb_stats_on_metadata=OFF

Disable the query cache

The query cache is deprecated, so setting the query_cache_size and query_cache_type to 0 disables it.

[mysqld]
...
query_cache_size=0
query_cache_type=0

Enlarge the join buffer

The join_buffer is used to perform joins in memory. Increasing it can improve certain operations.

[mysqld]
...
join_buffer_size=512KB

Enlarge the temporary table and max heap sizes

The tmp_table_size and max_heap_table_size set reasonable defaults for temporary in-memory tables, before they are forced to disk.

[mysqld
...
tmp_table_size=32MB
max_heap_table_size=32MB

Adjust the table open cache

The table_open_cache setting determines the size of the cache that holds the file descriptors for open tables. The table_open_cache_instances setting breaks the cache into a number of smaller parts. There is a potential for thread contention in the table_open_cache, so dividing it into smaller parts helps increase concurrency.

[mysqld]
...
table_open_cache=2048
table_open_cache_instances=16

Git service

Looker is designed to work with a Git service to provide version management of the LookML files. Major Git hosting services are supported, including GitHub, GitLab, Bitbucket, etc. Git service providers offer additional value adds such as a GUI to view code changes and support for workflows like pull requests and change approvals. If required, Git can be run on a plain Linux server.

If a Git hosting service is not appropriate for your deployment because of security rules, many of these service providers offer versions that can be run in your own environment. GitLab, in particular, is commonly self-hosted and can be used as an open source product with no license cost or as a supported licensed product. GitHub Enterprise is available as a self-hosted service and is a supported commercial product.

The following sections list nuances for the most common service providers.

GitHub/GitHub Enterprise

The Setting up and testing a Git connection documentation page uses GitHub as an example.

GitLab/gitlab.com

Refer to the Using GitLab for version control in Looker Looker Community post for detailed setup steps for GitLab. If your repo is contained within subgroups, these can be added to the repo URL using:

https://gitlab.com/accountname/subgroup/reponame
git@gitlab.com:accountname/subgroup/reponame.git

Additionally, there are three different ways you can store Looker-generated SSH keys in GitLab: as a user SSH key, as a repository deploy key, and as a global shared deploy key. A more in-depth explanation can be found in the GitLab documentation.

Google Cloud Source

Refer to the Using Cloud Source Repositories for version control in Looker Community Post for steps to set up Git with Cloud Source Repositories.

Bitbucket Cloud

Refer to the Using Bitbucket for version control in Looker Community Post for steps for setting up Git with Bitbucket Cloud. As of August 2021, Bitbucket Cloud does not support secrets on deploy webhooks.

Bitbucket Server

To use pull requests with Bitbucket Server, you may need to complete the following steps:

  1. When you open a pull request, Looker will automatically use the default port number (7999) in the URL. If you are using a custom port number, you will need to replace the port number in the URL manually.
  2. You will need to hit the project's deploy webhook to sync the production branch in Looker with the repo's master branch.

Phabricator diffusion

Refer to the Setting up Phabricator and Looker for version control Community Post for steps on setting up Git with Phabricator.

Network

Inbound connections

Looker web application

By default, Looker listens for HTTPS requests on port 9999. Looker uses a self-signed certificate with a CN of self-signed.looker.com. The Looker server can alternately be configured to do the following:

  1. Accept HTTP connections from an SSL-termination load balancer/proxy, with the --ssl-provided-externally-by=<s> startup flag. The value should either be set to the IP address of the proxy, or to a host name that can be locally resolved to the IP address of the proxy. Looker will accept HTTP connections only from this IP address.
  2. Use a customer supplied SSL certificate, with the --ssl-keystore=<s> startup flag.

Looker API

The Looker API listens on port 19999. If the installation requires access to the API, then the load balancer should have the requisite forwarding rules. The same SSL considerations apply as with the main web application. We recommend using a distinct port from the web application.

Load balancers

A load balancer is often used to accept an HTTPS request at port 443 using the customer's certificate, then forward the request to the Looker server node at port 9999 using the self-signed certificate or HTTP. If load balancers are using Looker's self-signed certificate, they must be configured to accept that certificate.

Idle connections and timeouts

When a user starts a large request in Looker, that could result in a query that could be expensive to run on the database. If the user abandons that request in any way — by shutting the lid on their laptop, disconnecting from the network, killing that tab in the browser, etc. — Looker wants to know and terminate that database query.

To handle this situation, when the client web application makes a request to run a database query, the browser will open a socket connection via a long-lived HTTP request to the Looker server. This connection will sit open and idle. This socket will get disconnected if the client web application is killed or disconnected in any way. The server will see that disconnect and cancel any related database queries.

Load balancers often notice these open idle connections and kill them. In order to run Looker effectively, the load balancer must be configured to allow this connection to remain open for as long as the longest query a user might run. A timeout of at least 60 minutes is suggested.

Outbound connections

Looker servers can have unrestricted outbound access to all resources, including the public internet. This simplifies many tasks, such as installing Chromium, which requires access to the package repositories for the Linux distribution.

The following are outbound connections that Looker may need to make.

Internal database connection

By default, MySQL listens for connections on port 3306. The Looker nodes must be able to initiate connections to MySQL on this port. Depending on how the repository is hosted, you may need to traverse a firewall.

External services

Looker's telemetry and license servers are available via HTTPS on the public internet. Traffic from a Looker node to ping.looker.com:443 and license.looker.com:443 may need to be added to an allowlist.

Data warehouse connections

Cloud-hosted databases may require a connection via the public internet. For example, if you are using BigQuery, then accounts.google.com:443 and www.googleapis.com:443 may need to be added to an allowlist. If the database is outside of your own infrastructure, consult with your database host for network details.

SMTP services

By default, Looker sends outgoing mail via SendGrid. That may require adding smtp.sendgrid.net:587 to an allowlist. The SMTP settings can be changed in the configuration to use a different mail handler as well.

Action hubs, action servers, and webhooks

Many scheduler destinations, in particular webhooks and the ones that are enabled in the Looker Admin panel, involve sending data via HTTPS requests.

  • For webhooks, these destinations are specified at runtime by users, and may be contrary to the goal of firewalling outbound connections.
  • For an action hub, these requests are sent to actions.looker.com. Details can be found in our Looker Action Hub configuration documentation.
  • For other action servers, these requests are sent to the domains specified in the action server's configuration by administrators in the Looker Admin panel.

Proxy server

If the public internet cannot be reached directly, Looker can be configured to use a proxy server for HTTP(S) requests by adding a line like the following to lookerstart.cfg:

JAVAARGS="-Dhttp.proxyHost=myproxy.example.com -Dhttp.proxyPort=8080 
-Dhttp.nonProxyHosts=127.0.0.1|localhost -Dhttps.proxyHost=myproxy.example.com
-Dhttps.proxyPort=8080"

Note that internode communications happen over HTTPS, so if you use a proxy server and your instance is clustered, you will usually want to add the IPs/host names for all the nodes in the cluster to the Dhttp.nonProxyHosts argument.

Internode communications

Internal host identifier

Within a cluster, each node must be able to communicate with the other nodes. To allow this, the host name or IP address of each node is specified in the startup configuration. When the node starts up, this value will be written into the MySQL repository. Other members of the cluster can then refer to those values to communicate with this node. To specify the host name or IP address in the startup configuration, add -H node1.looker.example.com to the LOOKERARGS environment variable in lookerstart.cfg.

Since the host name must be unique per node, the lookerstart.cfg file needs to be unique on each instance. As an alternative to hardcoding the host name or IP address, the command hostname -I or hostname --fqdn can be used to find these at runtime. To implement this, add -H $(hostname -I) or -H $(hostname --fqdn) to the LOOKERARGS environment variable in lookerstart.cfg.

Internal ports

In addition to the ports 9999 and 19999, which are used for the web and API servers, respectively, the cluster nodes will communicate with each other through a message broker service, which uses ports 1551 and 61616. Ports 9999 and 19999 must be open to end-user traffic, but 1551 and 61616 must be open between cluster nodes.