Google Cloud offers configurable health checks for Google Cloud load balancer backends, Cloud Service Mesh backends, and application-based autohealing for managed instance groups. This document covers key health checking concepts.
Unless otherwise noted, Google Cloud health checks are implemented by dedicated software tasks that connect to backends according to parameters specified in a health check resource. Each connection attempt is called a probe. Google Cloud records the success or failure of each probe.
Based on a configurable number of sequential successful or failed probes, an overall health state is computed for each backend. Backends that respond successfully for the configured number of times are considered healthy. Backends that fail to respond successfully for a separately configurable number of times are unhealthy.
The overall health state of each backend determines eligibility to receive new requests or connections. You can configure the criteria that define a successful probe. This is discussed in detail in the section How health checks work.
Health checks implemented by dedicated software tasks use special routes that aren't defined in your Virtual Private Cloud (VPC) network. For more information, see Paths for health checks.
Health check categories, protocols, and ports
Health checks have a category and a protocol. The two categories are health checks and legacy health checks and their supported protocols are as follows:
Health checks
Legacy health checks:
The protocol and port determine how health check probes are done. For example, a health check can use the HTTP protocol on TCP port 80, or it can use the TCP protocol for a named port in an instance group.
You cannot convert a legacy health check to a health check, and you cannot convert a health check to a legacy health check.
Select a health check
Health checks must be compatible with the type of load balancer (or Cloud Service Mesh) and the backend types. The factors to consider when you select a health check are as follows:
- Category: health check or legacy health check. Only target pool-based external passthrough Network Load Balancers require legacy health checks. For all other products, you'll use regular health checks.
- Protocol: protocol that Google Cloud uses to probe the backends. It's best to use a health check (or legacy health check) whose protocol matches the protocol used by the load balancer's backend service or target pool. However, the health check protocols and load balancer protocols do not need to be the same.
- Port specification: ports that Google Cloud uses with the protocol.
You must specify a port for your health check. Health checks have two port
specification methods:
--port
and--use-serving-port
. For legacy health checks, there is one method:--port
. For more information about health check port requirements per load balancer, see Port specification flags.
The next section describes valid health check selections for each type of load balancer and backend.
Load balancer guide
This table shows the supported health check category and scope for each load balancer type.
Load balancer | Health check category and scope |
---|---|
Global external Application Load Balancer Classic Application Load Balancer * Global external proxy Network Load Balancer Classic proxy Network Load Balancer Cross-region internal Application Load Balancer Cross-region internal proxy Network Load Balancer |
Health check (global) |
Regional external Application Load Balancer Regional internal Application Load Balancer Regional internal proxy Network Load Balancer Regional external proxy Network Load Balancer |
Health check (regional) |
External passthrough Network Load Balancer | Backend service-based load balancer: Health check (regional) Target pool-based load balancer: Legacy health check |
Internal passthrough Network Load Balancer | Health check (global or regional) |
Load balancer mode | Legacy health checks supported |
---|---|
Global external Application Load Balancer Classic Application Load Balancer |
Yes, if both of the following are true:
|
Regional external Application Load Balancer | No |
Additional usage notes
For VM instance group backends, health checks are performed only on VM instances that are started. Stopped VM instances don't receive health check packets.
For internal passthrough Network Load Balancers, you can only use
TCP
orUDP
for the backend service's protocol. If you serve HTTP traffic from VMs behind an internal passthrough Network Load Balancer, it makes sense to employ a health check using the HTTP protocol.A target pool-based external passthrough Network Load Balancer must use a legacy HTTP health check. It cannot use a legacy HTTPS health check or any non-legacy health check. If you use a target pool-based external passthrough Network Load Balancer to balance TCP traffic, you need to run an HTTP service on the VMs being load balanced so that they can respond to health check probes.
For almost all other load balancer types, you must use regular, non-legacy health checks where the protocol matches the load balancer's backend service protocol.For backend services that use the gRPC protocol, use only gRPC or TCP health checks. Don't use HTTP(S) or HTTP/2 health checks.
Certain Envoy-based load balancers that use hybrid NEG backends don't support gRPC health checks. For more information, see the Hybrid NEGs overview.
Health checking with Cloud Service Mesh
Note the following differences in behavior when you're using health checks with Cloud Service Mesh.
With Cloud Service Mesh, health checking behavior for network endpoints of the type
INTERNET_FQDN_PORT
andNON_GCP_PRIVATE_IP_PORT
differs from health checking behavior for other types of network endpoints. Instead of using the dedicated software tasks, Cloud Service Mesh programs Envoy proxies to perform health checks for internet NEGs (INTERNET_FQDN_PORT
endpoints) and hybrid NEGs (NON_GCP_PRIVATE_IP_PORT
endpoints).Envoy supports the following protocols for health checking:
- HTTP
- HTTPS
- HTTP/2
- TCP
When Cloud Service Mesh is integrated with Service Directory and you bind a Service Directory service to a Cloud Service Mesh backend service, you cannot set a health check on the backend service.
How health checks work
The following sections describe how health checks work.
Probes
When you create a health check or a legacy health check, you specify the following flags or accept their default values. Each health check or legacy health check that you create is implemented by multiple probes. These flags control how frequently each probe evaluates instances in instance groups or endpoints in zonal NEGs.
A health check's settings cannot be configured on a per-backend basis. Health checks are associated with an entire backend service. For a target pool-based external passthrough Network Load Balancer, a legacy HTTP health check is associated with the entire target pool. Thus, the parameters for the probe are the same for all backends referenced by a given backend service or target pool.
Configuration flag | Purpose | Default value |
---|---|---|
Check intervalcheck-interval |
The check interval is the amount of time from the start of one probe issued by one prober to the start of the next probe issued by the same prober. Units are seconds. | 5s (5 seconds) |
Timeouttimeout |
The timeout is the amount of time that Google Cloud waits for a response to a probe. Its value must be less than or equal to the check interval. Units are seconds. | 5s (5 seconds) |
Probe IP ranges and firewall rules
For health checks to work, you must create ingress allow
firewall rules so
that traffic from Google Cloud probers can connect to your backends. For
instructions, see Create required firewall
rules.
The following table shows the source IP ranges to allow for each load balancer:
Product | Health check probe source IP ranges |
---|---|
|
For IPv6 traffic to the backends:
|
For IPv6 traffic to the backends:
|
|
|
|
External passthrough Network Load Balancer 3 |
For IPv4 traffic to the backends:
For IPv6 traffic to the backends:
|
Internal passthrough Network Load Balancer |
For IPv4 traffic to the backends:
For IPv6 traffic to the backends:
|
Cloud Service Mesh with internet NEG backends and hybrid NEG backends | IP addresses of the VMs running the Envoy software For a sample configuration, see the Cloud Service Mesh documentation |
1 Allowlisting Google's health check probe ranges isn't required for hybrid NEGs. However, if you're using a combination of hybrid and zonal NEGs in a single backend service, you need to allowlist the Google health check probe ranges for the zonal NEGs.
2 For regional internet NEGs, health checks are optional. Traffic from load balancers using regional internet NEGs originates from the proxy-only subnet and is then NAT-translated (by using Cloud NAT) to either manual or auto-allocated NAT IP addresses. This traffic includes both health check probes and user requests from the load balancer to the backends. For details, see Regional NEGs: Use Cloud NAT to egress.
3 Target pool-based external passthrough Network Load Balancers support only IPv4 traffic and
might proxy health checks through the metadata server. In this case,
health check packet sources match the IP address of the metadata server:
169.254.169.254
. You don't have to create firewall
rules to permit traffic from the metadata server. Packets from the
metadata server are always allowed.
Importance of firewall rules
Google Cloud requires that you create the necessary ingress allow
firewall rules to permit traffic from probers to your backends. As a best
practice, limit these rules to just the protocols and ports that
match those used by your health checks. For the source IP ranges, make sure to
use the documented probe IP ranges listed in the preceding section.
If you don't have ingress allow
firewall rules that permit the health check,
the implied deny
rule blocks
inbound traffic. When probers can't contact your backends, the
load balancer considers your backends to be unhealthy.
Security considerations for probe IP ranges
Consider the following information when planning health checks and the necessary firewall rules:
The probe IP ranges belong to Google. Google Cloud uses special routes outside of your VPC network but within Google's production network to facilitate communication from probers.
Google uses the probe IP ranges to send health check probes for external Application Load Balancers and external proxy Network Load Balancers. If a packet is received from the internet and the packet's source IP address is within a probe IP range, Google drops the packet. This includes the external IP address of a Compute Engine instance or a Google Kubernetes Engine (GKE) node.
The probe IP ranges are a complete set of possible IP addresses used by Google Cloud probers. If you use
tcpdump
or a similar tool, you might not observe traffic from all IP addresses in all probe IP ranges. As a best practice, create ingress firewall rules that allow all of the probe IP ranges as sources. Google Cloud can implement new probers automatically without notification.
Multiple probes and frequency
Google Cloud sends health check probes from multiple redundant systems called probers. Probers use specific source IP ranges. Google Cloud does not rely on just one prober to implement a health check—multiple probers simultaneously evaluate the instances in instance group backends or the endpoints in zonal NEG backends. If one prober fails, Google Cloud continues to track backend health states.
The interval and timeout settings that you configure for a health
check are applied to each prober. For a given backend, software access logs and
tcpdump
show more frequent probes than your configured settings.
This is expected behavior, and you cannot configure the number of probers that Google Cloud uses for health checks. However, you can estimate the effect of multiple simultaneous probes by considering the following factors.
To estimate the probe frequency per backend service, consider the following:
Base frequency per backend service. Each health check has an associated check frequency, inversely proportional to the configured check interval:
1⁄(check interval)
When you associate a health check with a backend service, you establish a base frequency used by each prober for backends on that backend service.
Probe scale factor. The backend service's base frequency is multiplied by the number of simultaneous probers that Google Cloud uses. This number can vary, but is generally between 5 and 10.
Multiple forwarding rules for internal passthrough Network Load Balancers. If you have configured multiple internal forwarding rules (each having a different IP address) pointing to the same regional internal backend service, Google Cloud uses multiple probers to check each IP address. The probe frequency per backend service is multiplied by the number of configured forwarding rules.
Multiple forwarding rules for external passthrough Network Load Balancers. If you have configured multiple forwarding rules that point to the same backend service or target pool, Google Cloud uses multiple probers to check each IP address. The probe frequency per backend VM, is multiplied by the number of configured forwarding rules.
Multiple target proxies for external Application Load Balancers. If you have multiple target proxies that direct traffic to the same URL map, Google Cloud uses multiple probers to check the IP address associated with each target proxy. The probe frequency per backend service is multiplied by the number of configured target proxies.
Multiple target proxies for external proxy Network Load Balancers and regional internal proxy Network Load Balancers. If you have configured multiple target proxies that direct traffic to the same backend service, Google Cloud uses multiple probers to check the IP address associated with each target proxy. The probe frequency per backend service is multiplied by the number of configured target proxies.
Sum over backend services. If a backend is used by multiple backend services, the backend instances are contacted as frequently as the sum of frequencies for each backend service's health check.
With zonal NEG backends, it's more difficult to determine the exact number of health check probes. For example, the same endpoint can be in multiple zonal NEGs. Those zonal NEGs don't necessarily have the same set of endpoints, and different endpoints can point to the same backend.
Destination for probe packets
The following table shows the network interface and destination IP addresses to which health check probers send packets, depending on the type of load balancer.
For external passthrough Network Load Balancers and internal passthrough Network Load Balancers, the application must bind to
the load balancer's IP address (or any IP address 0.0.0.0
).
Load balancer | Destination network interface | Destination IP address |
---|---|---|
|
|
|
|
|
|
External passthrough Network Load Balancer | Primary network interface (nic0 ) |
The IP address of the external forwarding rule. If multiple forwarding rules point to the same backend service (for target-pool based external passthrough Network Load Balancers, the same target pool), Google Cloud sends probes to each forwarding rule's IP address. This can result in an increase in the number of probes. |
Internal passthrough Network Load Balancer | For both instance group backends and zonal NEG backends with
GCE_VM_IP endpoints, the network interface used depends on
how the backend service is configured. For details, see
Backend
services and network interfaces.
|
The IP address of the internal forwarding rule. If multiple forwarding rules point to the same backend service, Google Cloud sends probes to each forwarding rule's IP address. This can result in an increase in the number of probes. |
Success criteria for HTTP, HTTPS, and HTTP/2
HTTP, HTTPS, and HTTP/2 health checks always require an HTTP 200 (OK)
response
code to be received before the health check timeout. All other HTTP response
codes, including redirect response codes like 301
and 302
, are considered
unhealthy.
In addition to requiring an HTTP 200 (OK)
response code, you can:
Configure each health check prober to send HTTP requests to a specific request path instead of the default request path,
/
.Configure each health check prober to check for the presence of an expected response string in the HTTP response body. The expected response string must consist only of single-byte, printable ASCII characters, located within the first 1,024 bytes of the HTTP response body.
The following table lists valid combinations of request path and response flags that are available for HTTP, HTTPS, and HTTP/2 health checks.
Configuration flags | Prober behavior | Success criteria |
---|---|---|
Neither --request-path nor --response
specified
|
The prober uses / as the request path. |
HTTP 200 (OK) response code only. |
Both --request-path and --response specified
|
The prober uses the configured request path. | HTTP 200 (OK) response code and up to the first
1,024 ASCII characters of the HTTP response body must match the expected
response string. |
Only --response specified
|
The prober uses / as the request path. |
HTTP 200 (OK) response code and up to the first
1,024 ASCII characters of the HTTP response body must match the expected
response string. |
Only --request-path specified
|
The prober uses the configured request path. | HTTP 200 (OK) response code only. |
Success criteria for SSL and TCP
TCP and SSL health checks have the following base success criteria:
For TCP health checks, a health check prober must successfully open a TCP connection to the backend before the health check timeout.
For SSL health checks, a health check prober must successfully open a TCP connection to the backend and complete the TLS/SSL handshake before the health check timeout.
For TCP health checks, the TCP connection must be closed in one of the following ways:
- By the health check prober sending either a FIN or RST (reset) packet, or
- By the backend sending a FIN packet. If a backend sends a TCP RST packet, the probe might be considered unsuccessful if the health check prober has already sent a FIN packet.
The following table lists valid combinations of request and response flags that are available for TCP and SSL health checks. Both request and response flags must consist only of single-byte, printable ASCII characters, each string being no more than 1,024 characters long.
Configuration flags | Prober behavior | Success criteria |
---|---|---|
Neither --request nor --response specified
|
The prober doesn't send any request string. | Base success criteria only. |
Both --request and --response specified
|
The prober sends the configured request string. | Base success criteria and the response string received by the prober must exactly match the expected response string. |
Only --response specified
|
The prober doesn't send any request string. | Base success criteria and the response string received by the prober must exactly match the expected response string. |
Only --request specified
|
The prober sends the configured request string. | Base success criteria only (any response string is not checked). |
Success criteria for gRPC
If you are using gRPC health checks, make sure that the gRPC service sends the
RPC response with the status OK
and the status field set to SERVING
or
NOT_SERVING
accordingly.
Note the following:
- gRPC health checks are used only with gRPC applications and Cloud Service Mesh.
- gRPC health checks don't support TLS.
For more information, see the following:
Success criteria for legacy health checks
If the response received by the legacy health check probe is HTTP 200 OK
,
the probe is considered successful. All other HTTP response codes, including a
redirect (301
, 302
), are considered unhealthy.
Health state
Google Cloud uses the following configuration flags to determine the overall health state of each backend to which traffic is load balanced.
Configuration flag | Purpose | Default value |
---|---|---|
Healthy thresholdhealthy-threshold |
The healthy threshold specifies the number of sequential successful probe results for a backend to be considered healthy. | A threshold of 2
probes. |
Unhealthy thresholdunhealthy-threshold |
The unhealthy threshold specifies the number of sequential failed probe results for a backend to be considered unhealthy. | A threshold of 2
probes. |
Google Cloud considers backends to be healthy after this healthy threshold has been met. Healthy backends are eligible to receive new connections.
Google Cloud considers backends to be unhealthy when the unhealthy threshold has been met. Unhealthy backends are not eligible to receive new connections; however, existing connections are not immediately terminated. Instead, the connection remains open until a timeout occurs or until traffic is dropped.
Existing connections might fail to return responses, depending on the cause for failing the probe. An unhealthy backend can become healthy if it is able to meet the healthy threshold again.
The specific behavior when all backends are unhealthy differs depending on the type of load balancer that you're using:
Load balancer | Behavior when all backends are unhealthy |
---|---|
Classic Application Load Balancer | Returns an HTTP `502` status code to clients when all backends are unhealthy. |
Global external Application Load Balancer Cross-region internal Application Load Balancer Regional external Application Load Balancer Regional internal Application Load Balancer |
Returns an HTTP `503` status code to clients when all backends are unhealthy. |
Proxy Network Load Balancers | Terminate client connections when all backends are unhealthy. |
Internal passthrough Network Load Balancer Backend service-based external passthrough Network Load Balancers |
Distribute traffic to all backend VMs as a last resort when all backends are unhealthy and failover is not configured. For more information about this behavior, see Traffic distribution for internal passthrough Network Load Balancers and Traffic distribution for backend service-based external passthrough Network Load Balancers. |
Target pool-based external passthrough Network Load Balancers | Distribute traffic to all backend VMs as a last resort when all backends are unhealthy. |
Additional notes
The following sections include some more notes about using health checks on Google Cloud.
Certificates and health checks
Google Cloud health check probers don't perform certificate validation, even for protocols that require that your backends use certificates (SSL, HTTPS, and HTTP/2)—for example:
- You can use self-signed certificates or certificates signed by any certificate authority (CA).
- Certificates that have expired or that are not yet valid are acceptable.
- Neither the
CN
nor thesubjectAlternativeName
attributes need to match aHost
header or DNS PTR record.
Headers
Health checks that use any protocol, but not legacy health checks, allow you to
set a proxy header by using the --proxy-header
flag.
Health checks that use HTTP, HTTPS, or HTTP/2 protocols and legacy health
checks allow you to specify an HTTP Host
header by using the --host
flag.
If you're using any custom request headers, note that the load balancer adds these headers only to the client requests, not to the health check probes. If your backend requires a specific header for authorization that is missing from the health check packet, the health check might fail.
Example health check
Suppose you set up a health check with the following settings:
- Interval: 30 seconds
- Timeout: 5 seconds
- Protocol: HTTP
- Unhealthy threshold: 2 (default)
- Healthy threshold: 2 (default)
With these settings, the health check behaves as follows:
- Multiple redundant systems are simultaneously configured with the health check parameters. Interval and timeout settings are applied to each system. For more information, see Multiple probes and frequency.
Each health check prober does the following:
- Initiates an HTTP connection from one of the source IP addresses to the backend instance every 30 seconds.
- Waits up to five seconds for an HTTP
200 (OK)
status code (the success criteria for HTTP, HTTPS, and HTTP/2 protocols).
A backend is considered unhealthy when at least one health check probe system does the following:
- Does not receive an
HTTP 200 (OK)
response code for two consecutive probes. For example, the connection might be refused, or there might be a connection or socket timeout. - Receives two consecutive responses that don't match the protocol-specific success criteria.
- Does not receive an
A backend is considered healthy when at least one health check probe system receives two consecutive responses that match the protocol-specific success criteria.
In this example, each prober initiates a connection every 30 seconds. Thirty seconds elapses between a prober's connection attempts regardless of the duration of the timeout (whether or not the connection timed out). In other words, the timeout must always be less than or equal to the interval, and the timeout never increases the interval.
In this example, each prober's timing looks like the following, in seconds:
- t=0: Start probe A.
- t=5: Stop probe A.
- t=30: Start probe B.
- t=35: Stop probe B.
- t=60: Start probe C.
- t=65: Stop probe C.
What's next
- To create, modify, and use health checks, see Use health checks.
- To troubleshoot health checks, enable health check logging.