Expose custom metrics for load balancers

Autopilot Standard

This section describes how to send one or more metrics from a Pod or workload to your load balancer.

These metrics come from the service or application that you are running. For example, see the metrics exposed by the vLLM Engine.

The load balancer can then use this data with utilization-based load balancing to balance workloads more efficiently. For example, you can use this feature to monitor the regions with heavier workload use, and then allow the load balancer to redirect traffic towards the region with the more available resources. From the vLLM example, a metric that could be useful to track utilization is gpu_cache_usage_perc.

Requirements

Requirements for the Pods are the following.

GKE 1.34.1-gke.1127000 or later with clusters in the Rapid channel.
Gateways API is enabled.
Using horizontal Pod autoscaling with the performance profile.

Requirements for the metrics are the following.

Metrics must be accessible on a HTTP endpoint.
The format of metrics must be in the Prometheus standard.
Load-balancers have restrictions on metric names. For example, the name can't exceed 64 characters. For the full list of restrictions, see the details about the field backends[].customMetrics[].name in the API reference for BackendService.

If your service's metric does not comply with these restrictions, you can rename it using the exportName field.
Only gauge metrics between 0 and 1 are supported, with 1 representing utilization of 100%.
Metric labels are ignored, so they can't be used to distinguish between metrics. Ensure your workload does not expose the same metric with multiple labels.
Maximum of 10 metrics can be exposed per cluster. Other services have their own limits. For example, see the limits and requirements for load balancers. Note that a cluster is able to use more than one load balancer.

Expose metrics for load balancing

Choose a metric to expose. You can choose any metric that your server exposes, and that also meets the requirements listed in the previous section.
Add the following custom resource, replacing details that are specific to your metric and Pod.
```
apiVersion: autoscaling.gke.io/v1beta1
kind: AutoscalingMetric
metadata:
  name: NAME
  namespace:NAMESPACE
spec:
  selector:
    matchLabels:
      name: APP_LABEL_NAME
  endpoints:
  - port: METRIC_PORT
    path: METRIC_PATH
    metrics:
    - name: METRIC
      exportName: METRIC_NEW_NAME
```
Replace the following to match your workload:
- NAME: the name of the AutoscalingMetric object.
- NAMESPACE: the namespace that the Pods are in.
- APP_LABEL_NAME: the label used for the Pod.
- METRIC_PORT: the port number.
- METRIC_PATH: the path to the metric. Verify the path used by your service or application; this path is often /metrics.
- METRIC: the name of the metric that you are exposing.
- Optional: METRIC_NEW_NAME: you can use this field to rename the metric. If the metric name does not comply with the name restrictions set by the load balancer, then use this field to rename it to a valid name.
  
  For the full list of restrictions, see the details about the field backends[].customMetrics[].name in the API reference for BackendService.
Apply the resource using the following command:
```
kubectl apply -f FILE_NAME.yaml
```
Replace FILE_NAME with the name of the YAML file.

When you have added the custom resource, the metric is pushed to the autoscaling API. The metric is read every few seconds and sent to the load balancer.

To expose a second metric, follow the same steps to create another custom resource.

Now that you have exposed the metrics to the load balancer, you can configure the load balancer to use these metrics. For details, see Configure the load balancer to use custom metrics.

For more information about working with the load balancer, see Configure utilization-based load balancing for GKE Services.

Troubleshoot metrics exposed to the load balancer

To verify that the metrics are exposed to the load balancer correctly, you can do the following:

Verify the logs in the GKE Metrics Agent. If an error occurred when trying to expose the metrics, then the logs might have signaled that there is an error. For more information about how to look for errors, see Troubleshooting system metrics.
You can use the load balancer in dry run mode to look at all of the metrics that it receives. To learn more about testing the metrics using the dryRun flag, see Configure the load balancer to use custom metrics.

What's next

For more details about utilization-based load balancing, see About utilization-based load balancers for GKE Services.
Learn how to Configure utilization-based load balancing for GKE Services.