Logging and monitoring for GKE on AWS

This topic shows you how to export logs and metrics from an GKE on AWS user cluster to Cloud Logging and Cloud Monitoring.

Overview

There are multiple options for logging and monitoring with GKE on AWS. GKE Enterprise can be integrated with Cloud Logging and Cloud Monitoring. As GKE Enterprise is based on open source Kubernetes, many open source and third party tools are compatible.

Logging and monitoring options

You have several logging and monitoring options for your GKE Enterprise cluster:

  1. Deploy the Cloud Logging and Cloud Monitoring agents to monitor and view logs from your workloads in the Google Cloud console. This topic explains this solution.

  2. Use open source tools such as Prometheus, Grafana, and Elasticsearch. This topic does not describe this solution.

  3. Use third-party solutions such as Datadog. This topic does not describe this solution.

Cloud Logging and Cloud Monitoring

With GKE Enterprise, Cloud Logging, and Cloud Monitoring you can create dashboards, send alerts, monitor, and review logs for the workloads running on your cluster. You must configure the Cloud Logging and Cloud Monitoring agents in order to collect logs and metrics into your Google Cloud project. If you do not configure these agents, GKE on AWS does not collect logging or monitoring data.

What data is collected

When configured, the agents collect logs and metric data from your cluster and the workloads running on your cluster. These data are stored in your Google Cloud project. You configure the project ID in the project_id field in a configuration file when you install the log forwarder.

The data collected includes the following:

  • Logs for system services on each of the worker nodes.
  • Application logs for all workloads running on the cluster.
  • Metrics for the cluster and system services. For more information on specific metrics, see GKE Enterprise metrics.
  • If your applications are configured with Prometheus scrape targets, and annotated with configuration including prometheus.io/scrape, prometheus.io/path, and prometheus.io/port, application metrics for Pods.

The agents can be disabled at any time. For more information, see Cleaning up. Data collected by the agents can be managed and deleted like any other metric and log data, as described in the Cloud Monitoring and Cloud Logging documentation.

Log data is stored according to your configured retention rules. Metrics data retention varies based on type.

Logging and monitoring components

To export cluster-level telemetry from GKE on AWS into Google Cloud, you deploy the following components into your cluster:

  • Stackdriver Log Forwarder (stackdriver-log-forwarder-*). A Fluentbit DaemonSet that forwards logs from each Kubernetes node to Cloud Logging.
  • GKE Metrics Agent (gke-metrics-agent-*). An OpenTelemetry Collector based DaemonSet that collects metrics data and forwards it to Cloud Monitoring.

Manifests for these components are in the anthos-samples repository on GitHub.

Prerequisites

  1. A Google Cloud project with billing enabled. For more information on costs, see Pricing for Google Cloud Observability.

    The project must also have the Cloud Logging and Cloud Monitoring APIs enabled. To enable these APIs, run the following commands:

    gcloud services enable logging.googleapis.com
    gcloud services enable monitoring.googleapis.com
    
  2. An GKE on AWS environment, including a user cluster registered with Connect. Run the following command to verify that your cluster is registered.

    gcloud container fleet memberships list
    

    If your cluster is registered, Google Cloud CLI prints the cluster's name and ID.

    NAME       EXTERNAL_ID
    cluster-0  1abcdef-1234-4266-90ab-123456abcdef
    

    If you do not see your cluster listed, see Connecting to a cluster with Connect

  3. Install the git command-line tool on your machine.

Setting up permissions for Google Cloud Observability

Logging and monitoring agents use Fleet Workload Identity to communicate with Cloud Logging and Cloud Monitoring. The identity needs permissions to write logs and metrics in your project. To add the permissions, running the following commands:

gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:PROJECT_ID.svc.id.goog[kube-system/stackdriver]" \
  --role=roles/logging.logWriter
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:PROJECT_ID.svc.id.goog[kube-system/stackdriver]" \
  --role=roles/monitoring.metricWriter

Replace PROJECT_ID with your Google Cloud project.

Connect to the bastion host

To connect to your GKE on AWS resources, perform the following steps. Select if you have an existing AWS VPC (or direct connection to your VPC) or created a dedicated VPC when creating your management service.

Existing VPC

If you have a direct or VPN connection to an existing VPC, omit the line env HTTP_PROXY=http://localhost:8118 from commands in this topic.

Dedicated VPC

When you create a management service in a dedicated VPC, GKE on AWS includes a bastion host in a public subnet.

To connect to your management service, perform the following steps:

  1. Change to the directory with your GKE on AWS configuration. You created this directory when Installing the management service.

    cd anthos-aws

  2. To open the tunnel, run the bastion-tunnel.sh script. The tunnel forwards to localhost:8118.

    To open a tunnel to the bastion host, run the following command:

    ./bastion-tunnel.sh -N
    

    Messages from the SSH tunnel appear in this window. When you are ready to close the connection, stop the process by using Control+C or closing the window.

  3. Open a new terminal and change into your anthos-aws directory.

    cd anthos-aws
  4. Check that you're able to connect to the cluster with kubectl.

    env HTTPS_PROXY=http://localhost:8118 \
    kubectl cluster-info
    

    The output includes the URL for the management service API server.

Cloud Logging and Cloud Monitoring on control plane nodes

With GKE on AWS 1.8.0 and higher, Cloud Logging and Cloud Monitoring for control plane nodes can be automatically configured when creating new user clusters. To enable Cloud Logging or Cloud Monitoring, you populate the controlPlane.cloudOperations section of your AWSCluster configuration.

cloudOperations:
  projectID: PROJECT_ID
  location: GC_REGION
  enableLogging: ENABLE_LOGGING
  enableMonitoring: ENABLE_MONITORING

Replace the following:

  • PROJECT_ID: your project ID.
  • GC_REGION: the Google Cloud region where you want to store logs. Choose a region that is near the AWS region. For more information, see Global Locations - Regions & Zones — for example, us-central1.
  • ENABLE_LOGGING: true or false, whether Cloud Logging is enabled on control plane nodes.
  • ENABLE_MONITORING: true or false, whether Cloud Monitoring is enabled on control plane nodes.

Next, follow the steps in Creating a custom user cluster .

Cloud Logging and Cloud Monitoring on worker nodes

Removing the previous version

If you have setup an earlier version of the logging and monitoring agents that includes stackdriver-log-aggregator (Fluentd) and stackdriver-prometheus-k8s (Prometheus), you might want to uninstall them first before moving on.

Installing the logging forwarder

In this section, you install the Stackdriver Log Forwarder onto your cluster.

  1. From the anthos-samples/aws-logging-monitoring/ directory, change into the logging/ directory.

    cd logging/
    
  2. Modify the file forwarder.yaml to match your project configuration:

    sed -i "s/PROJECT_ID/PROJECT_ID/g" forwarder.yaml
    sed -i "s/CLUSTER_NAME/CLUSTER_NAME/g" forwarder.yaml
    sed -i "s/CLUSTER_LOCATION/GC_REGION/g" forwarder.yaml
    

    Replace the following:

    • PROJECT_ID: your project ID.
    • CLUSTER_NAME: The name of your cluster — for example, cluster-0
    • GC_REGION: the Google Cloud region where you want to store logs. Choose a region that is near the AWS region. For more information, see Global Locations - Regions & Zones — for example, us-central1.
  3. (Optional) Based upon your workloads, the number of nodes in your cluster, and the number of pods per node, you might have to set memory and CPU resource requests. For more information, see Recommended CPU and memory allocations.

  4. From your anthos-aws directory, use anthos-gke to switch context to your user cluster.

    cd anthos-aws
    env HTTPS_PROXY=http://localhost:8118 \
      anthos-gke aws clusters get-credentials CLUSTER_NAME
    Replace CLUSTER_NAME with your user cluster name.

  5. Create the stackdriver service account if it does not exist and deploy the log forwarder to the cluster.

    env HTTPS_PROXY=http://localhost:8118 \
      kubectl create serviceaccount stackdriver -n kube-system
    env HTTPS_PROXY=http://localhost:8118 \
      kubectl apply -f forwarder.yaml
    
  6. Use kubectl to verify that the pods have started up.

    env HTTPS_PROXY=http://localhost:8118 \
      kubectl get pods -n kube-system | grep stackdriver-log
    

    You should see one forwarder pod per node in a node pool. For example, in a 6-node cluster, you should see six forwarder pods.

    stackdriver-log-forwarder-2vlxb              2/2     Running   0          21s
    stackdriver-log-forwarder-dwgb7              2/2     Running   0          21s
    stackdriver-log-forwarder-rfrdk              2/2     Running   0          21s
    stackdriver-log-forwarder-sqz7b              2/2     Running   0          21s
    stackdriver-log-forwarder-w4dhn              2/2     Running   0          21s
    stackdriver-log-forwarder-wrfg4              2/2     Running   0          21s
    

Testing log forwarding

In this section, you deploy a workload containing a basic HTTP web server with a load generator to your cluster. You then test that logs are present in Cloud Logging.

Before installing this workload, you can verify the manifests for the web server and load generator.

  1. Deploy the web server and load generator to your cluster.

    env HTTPS_PROXY=http://localhost:8118 \
      kubectl apply -f  https://raw.githubusercontent.com/GoogleCloudPlatform/istio-samples/master/sample-apps/helloserver/server/server.yaml
    env HTTPS_PROXY=http://localhost:8118 \
      kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/istio-samples/master/sample-apps/helloserver/loadgen/loadgen.yaml
    
  2. To verify that you can view logs from your cluster in the Cloud Logging dashboard, go to the Logs Explorer in the Google Cloud console:

    Go to Logs Explorer

  3. Copy the sample query below into the Query builder field.

    resource.type="k8s_container" resource.labels.cluster_name="CLUSTER_NAME"
    

    Replace CLUSTER_NAME with your cluster name.

  4. Click Run query. You should see recent cluster logs appear under Query results.

    Cluster logs in Google Cloud Observability

  5. After you have confirmed the logs appear in query results, remove the load generator and web server.

    env HTTPS_PROXY=http://localhost:8118 \
      kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/istio-samples/master/sample-apps/helloserver/loadgen/loadgen.yaml
    
    env HTTPS_PROXY=http://localhost:8118 \
      kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/istio-samples/master/sample-apps/helloserver/server/server.yaml
    

Installing the metrics collector

In this section, you install an agent to send data to Cloud Monitoring.

  1. From the anthos-samples/aws-logging-monitoring/logging/ directory, change into the anthos-samples/aws-logging-monitoring/monitoring/ directory.

    cd ../monitoring
    
  2. Modify the file gke-metrics-agent.yaml to match your project configuration:

    sed -i "s/PROJECT_ID/PROJECT_ID/g" gke-metrics-agent.yaml
    sed -i "s/CLUSTER_NAME/CLUSTER_NAME/g" gke-metrics-agent.yaml
    sed -i "s/CLUSTER_LOCATION/GC_REGION/g" gke-metrics-agent.yaml
    

    Replace the following:

    • PROJECT_ID: your project ID.
    • CLUSTER_NAME: The name of your cluster — for example, cluster-0
    • GC_REGION: the Google Cloud region where you want to store logs. Choose a region that is near the AWS region. For more information, see Global Locations - Regions & Zones — for example, us-central1.
  3. (Optional) Based upon your workloads, the number of nodes in your cluster, and the number of pods per node, you might have to set memory and CPU resource requests. For more information, see Recommended CPU and memory allocations.

  4. Create the stackdriver service account if it does not exist and deploy the metrics agent to your cluster.

    env HTTPS_PROXY=http://localhost:8118 \
      kubectl create serviceaccount stackdriver -n kube-system
    env HTTPS_PROXY=http://localhost:8118 \
      kubectl apply -f gke-metrics-agent.yaml
    
  5. Use the kubectl tool to verify that the gke-metrics-agent Pod is running.

    env HTTPS_PROXY=http://localhost:8118 \
      kubectl get pods -n kube-system | grep gke-metrics-agent
    

    You should see one agent pod per node in a node pool. For example, in a 3-node cluster, you should see three agent pods.

    gke-metrics-agent-gjxdj                    2/2     Running   0          102s
    gke-metrics-agent-lrnzl                    2/2     Running   0          102s
    gke-metrics-agent-s6p47                    2/2     Running   0          102s
    
  6. To verify that your cluster metrics are being exported to Cloud Monitoring, go to the Metrics Explorer in the Google Cloud console:

    Go to Metrics Explorer

  7. In Metrics Explorer, click Query editor and then copy in the following command:

    fetch k8s_container
    | metric 'kubernetes.io/anthos/otelcol_exporter_sent_metric_points'
    | filter
        resource.project_id == 'PROJECT_ID'
        && (resource.cluster_name =='CLUSTER_NAME')
    | align rate(1m)
    | every 1m
    

    Replace the following:

    • PROJECT_ID: your project ID.
    • CLUSTER_NAME: the cluster name you used when Creating a user cluster— for example, cluster-0.
  8. Click Run query. The rate of metric points sent to Cloud Monitoring from each gke-metrics-agent pod in your cluster appears.

    Monitoring for the cluster

    Some other metrics worth trying are, but not limited to:

    • kubernetes.io/anthos/container_memory_working_set_bytes: Container memory usage;
    • kubernetes.io/anthos/container_cpu_usage_seconds_total: Container CPU usage;
    • kubernetes.io/anthos/apiserver_aggregated_request_total: kube-apiserver request count, only available if Cloud Monitoring is enabled on control plane.

    For a complete list of Metrics available, see Anthos Metrics. For information on how to use the user interface, see Metrics Explorer.

Creating a Dashboard in Cloud Monitoring

In this section, you create a Cloud Monitoring dashboard that monitors container status in your cluster.

  1. From the anthos-samples/aws-logging-monitoring/monitoring/ directory, change into the anthos-samples/aws-logging-monitoring/monitoring/dashboards directory.

    cd dashboards
    
  2. Replace instances of the CLUSTER_NAME string in pod-status.json with your cluster name.

    sed -i "s/CLUSTER_NAME/CLUSTER_NAME/g" pod-status.json
    

    Replace CLUSTER_NAME with your cluster name.

  3. Create a custom dashboard with the configuration file by running the following command:

    gcloud monitoring dashboards create --config-from-file=pod-status.json
    
  4. To verify that your dashboard is created, go to Cloud Monitoring Dashboards in the Google Cloud console.

    Go to Dashboards

    Open the newly created dashboard with a name in the format of CLUSTER_NAME (Anthos cluster on AWS) pod status.

Cleaning up

In this section, you remove the logging and monitoring components from your cluster.

  1. Delete the monitoring dashboard in the Dashboards list view in the Google Cloud console by clicking the delete button associated with the dashboard name.

  2. Change into the anthos-samples/aws-logging-monitoring/ directory.

    cd anthos-samples/aws-logging-monitoring
    
  3. To remove all the resources created in this guide, run the following commands:

    env HTTPS_PROXY=http://localhost:8118 \
      kubectl delete -f logging/
    env HTTPS_PROXY=http://localhost:8118 \
      kubectl delete -f monitoring/
    

Recommended CPU and memory allocations

This section includes recommended CPU and allocations for the individual components used in logging and monitoring. Each of the following tables lists CPU and memory requests for a cluster with a range of node sizes. You set resource requests for a component in the file listed in the table.

For more information, see Kubernetes best practices: Resource requests and limits and Managing Resources for Containers.

1-10 Nodes

File Resource CPU Requests CPU Limits Memory Requests Memory Limits
monitoring/gke-metrics-agent.yaml gke-metrics-agent 30m 100m 50Mi 500Mi
logging/forwarder.yaml stackdriver-log-forwarder 50m 100m 100Mi 600Mi

10-100 Nodes

File Resource CPU Requests CPU Limits Memory Requests Memory Limits
monitoring/gke-metrics-agent.yaml gke-metrics-agent 50m 100m 50Mi 500Mi
logging/forwarder.yaml stackdriver-log-forwarder 60m 100m 100Mi 600Mi

More than 100 nodes

File Resource CPU Requests CPU Limits Memory Requests Memory Limits
monitoring/gke-metrics-agent.yaml gke-metrics-agent 50m 100m 100Mi N/A
logging/forwarder.yaml stackdriver-log-forwarder 60m 100m 100Mi 600Mi

What's next?

Learn about Cloud Logging:

Learn about Cloud Monitoring: