A new version of Anthos clusters on AWS (GKE on AWS) was released on April 30. See the release notes for more information.

Configuring logging and monitoring for Anthos clusters on AWS

This topic shows you how to export logs and metrics from an Anthos clusters on AWS user cluster to Cloud Logging and Cloud Monitoring.

Overview

There are multiple options for logging and monitoring with Anthos clusters on AWS. Anthos can be integrated with Cloud Logging and Cloud Monitoring. As Anthos is based on open source Kubernetes, many open source and third party tools are compatible.

Logging and monitoring options

You have several logging and monitoring options for your Anthos cluster:

  1. Deploy the Cloud Logging and Cloud Monitoring agents to monitor and view logs from your workloads in the Google Cloud console. This topic explains this solution.

  2. Use open source tools such as Prometheus, Grafana, and Elasticsearch. This topic does not describe this solution.

  3. Use third-party solutions such as Datadog. This topic does not describe this solution.

Cloud Logging and Cloud Monitoring

With Anthos, Cloud Logging, and Cloud Monitoring you can create dashboards, send alerts, monitor, and review logs for the workloads running on your cluster. You must configure the Cloud Logging and Cloud Monitoring agents in order to collect logs and metrics into your Google Cloud project. If you do not configure these agents, Anthos clusters on AWS does not collect logging or monitoring data.

What data is collected

When configured, the agents collect logs and metric data from your cluster and the workloads running on your cluster. These data are stored in your Google Cloud project. You configure the project ID in the project_id field in a configuration file when you install the log aggregator and forwarder.

The data collected includes the following:

  • Logs for system services on each of the worker nodes.
  • Application logs for all workloads running on the cluster.
  • Metrics for the cluster and system services. For more information on specific metrics, see Anthos metrics.
  • If your applications are configured with Prometheus scrape targets, and annotated with configuration including prometheus.io/scrape, prometheus.io/path, and prometheus.io/port, application metrics for Pods.

The agents can be disabled at any time. For more information, see Cleaning up. Data collected by the agents can be managed and deleted like any other metric and log data, as described in the Cloud Monitoring and Cloud Logging documentation.

Log data is stored according to your configured retention rules. Metrics data retention varies based on type.

Logging and monitoring components

To export cluster-level telemetry from Anthos clusters on AWS into Google Cloud, you deploy the following components into your cluster:

  • Stackdriver Log Aggregator (stackdriver-log-aggregator-*). A Fluentd StatefulSet that sends logs to Cloud Logging.
  • Stackdriver Log Forwarder (stackdriver-log-forwarder-*). A Fluentbit DaemonSet that forwards logs from each Kubernetes node to the Stackdriver Log Aggregator.
  • Stackdriver Metrics Collector (stackdriver-prometheus-k8s-*). A Prometheus StatefulSet, configured with a sidecar container to send Prometheus metrics to Cloud Monitoring. The sidecar is another container inside the same Pod which reads metrics which the Prometheus server stores on disk. The collector uses the Cloud Monitoring API to write metrics data to your Cloud project.

Manifests for these components are in the anthos-samples repository on GitHub.

Prerequisites

  1. A Google Cloud project with billing enabled. For more information on costs, see Pricing for Google Cloud's operations suite.

    The project must also have the Cloud Logging and Cloud Monitoring APIs enabled.

  2. An Anthos clusters on AWS environment, including a user cluster registered with Connect. Run the following command to verify that your cluster is registered.

    gcloud container hub memberships list
    

    If your cluster is registered, Cloud SDK prints the cluster's name and ID.

    NAME       EXTERNAL_ID
    cluster-0  1abcdef-1234-4266-90ab-123456abcdef
    

    If you do not see your cluster listed, see Connecting to a cluster with Connect

  3. Install the git command-line tool on your machine.

Connect to the bastion host

To connect to your Anthos clusters on AWS resources, perform the following steps. Select if you have an existing AWS VPC (or direct connection to your VPC) or created a dedicated VPC when creating your management service.

Existing VPC

If you have a direct or VPN connection to an existing VPC, omit the line env HTTP_PROXY=http://localhost:8118 from commands in this topic.

Dedicated VPC

When you create a management service in a dedicated VPC, Anthos clusters on AWS includes a bastion host in a public subnet.

To connect to your management service, perform the following steps:

  1. Change to the directory with your Anthos clusters on AWS configuration. You created this directory when Installing the management service.

    cd anthos-aws

  2. To open the tunnel, run the bastion-tunnel.sh script. The tunnel forwards to localhost:8118.

    To open a tunnel to the bastion host, run the following command:

    ./bastion-tunnel.sh -N
    

    Messages from the SSH tunnel appear in this window. When you are ready to close the connection, stop the process by using Control+C or closing the window.

  3. Open a new terminal and change into your anthos-aws directory.

    cd anthos-aws
  4. Check that you're able to connect to the cluster with kubectl.

    env HTTPS_PROXY=http://localhost:8118 \
    kubectl cluster-info
    

    The output includes the URL for the management service API server.

Setting up Google Cloud's operations suite

Before you configure your cluster, perform the following steps.

  1. Create a Cloud Monitoring workspace for your Google Cloud project.

  2. Clone the sample repository and change into the anthos-samples/aws-logging-monitoring directory.

    git clone https://github.com/GoogleCloudPlatform/anthos-samples
    cd anthos-samples/aws-logging-monitoring
    
  3. Set an environment variable to the Google Cloud project where you registered your cluster.

    PROJECT_ID="PROJECT_ID"
    

    Replace PROJECT_ID with your project ID.

  4. Use the gcloud command-line tool to create a Google Cloud service account with permissions to write metrics and logs to the Cloud Monitoring and Cloud Logging APIs.

    gcloud iam service-accounts create anthos-lm-forwarder
    
    gcloud projects add-iam-policy-binding $PROJECT_ID \
      --member="serviceAccount:anthos-lm-forwarder@${PROJECT_ID}.iam.gserviceaccount.com" \
      --role=roles/logging.logWriter
    
    gcloud projects add-iam-policy-binding $PROJECT_ID \
      --member="serviceAccount:anthos-lm-forwarder@${PROJECT_ID}.iam.gserviceaccount.com" \
      --role=roles/monitoring.metricWriter
    
  5. Create and download a key for the service account you just created.

    gcloud iam service-accounts keys create credentials.json \
      --iam-account anthos-lm-forwarder@${PROJECT_ID}.iam.gserviceaccount.com
    
  6. From your anthos-aws directory, use anthos-gke to switch context to your user cluster.

    cd anthos-aws
    env HTTPS_PROXY=http://localhost:8118 \
      anthos-gke aws clusters get-credentials CLUSTER_NAME
    Replace CLUSTER_NAME with your user cluster name.

  7. Use the kubectl command-line tool to create a Secret in the cluster with the key's contents.

    env HTTPS_PROXY=http://localhost:8118 \
      kubectl create secret generic google-cloud-credentials \
      -n kube-system --from-file credentials.json
    

Installing the logging aggregator and forwarder

In this section, you install the Stackdriver Log Aggregator and Stackdriver Log Forwarder onto your cluster.

  1. From the anthos-samples/aws-logging-monitoring/ directory, change into the logging/ directory.

    cd logging/
    
  2. Edit the file aggregator.yaml. At the bottom of the file, find and edit the following variables:

    project_id PROJECT_ID
    k8s_cluster_name CLUSTER_NAME
    k8s_cluster_location GC_REGION
    

    Replace the following:

    • PROJECT_ID: your project ID.
    • CLUSTER_NAME: The name of your cluster — for example, cluster-0
    • GC_REGION: the Google Cloud region where you want to store logs. Choose a region that is near the AWS region. For more information, see Global Locations - Regions & Zones — for example, us-central1.
  3. (Optional) Based upon your workloads, the number of nodes in your cluster, and the number of pods per node, you might have to set memory and CPU resource requests. For more information, see Recommended CPU and memory allocations.

  4. From your anthos-aws directory, use anthos-gke to switch context to your user cluster.

    cd anthos-aws
    env HTTPS_PROXY=http://localhost:8118 \
      anthos-gke aws clusters get-credentials CLUSTER_NAME
    Replace CLUSTER_NAME with your user cluster name.

  5. Deploy the log aggregator and forwarder to the cluster.

    env HTTPS_PROXY=http://localhost:8118 \
      kubectl apply -f aggregator.yaml
    
    env HTTPS_PROXY=http://localhost:8118 \
      kubectl apply -f forwarder.yaml
    
  6. Use kubectl to verify that the pods have started up.

    env HTTPS_PROXY=http://localhost:8118 \
      kubectl get pods -n kube-system | grep stackdriver-log
    

    You should see two aggregator pods, and one forwarder pod per node in a node pool. For example, in a 6-node cluster, you should see six forwarder and two aggregator pods.

    stackdriver-log-aggregator-0                 1/1     Running   0          139m
    stackdriver-log-aggregator-1                 1/1     Running   0          139m
    stackdriver-log-forwarder-2vlxb              1/1     Running   0          139m
    stackdriver-log-forwarder-dwgb7              1/1     Running   0          139m
    stackdriver-log-forwarder-rfrdk              1/1     Running   0          139m
    stackdriver-log-forwarder-sqz7b              1/1     Running   0          139m
    stackdriver-log-forwarder-w4dhn              1/1     Running   0          139m
    stackdriver-log-forwarder-wrfg4              1/1     Running   0          139m
    
  7. Use kubectl to verify that logs are being sent to Google Cloud.

    env HTTPS_PROXY=http://localhost:8118 \
      kubectl logs stackdriver-log-aggregator-0 -n kube-system
    

    The output includes information logs sent to Google Cloud.

    2021-01-02 03:04:05 +0000 [info]: #3 [google_cloud] Successfully sent gRPC to Stackdriver Logging API.
    

Testing log forwarding

In this section, you deploy a workload containing a basic HTTP web server with a load generator to your cluster. You then test that logs are present in Cloud Logging.

Before installing this workload, you can verify the manifests for the web server and load generator.

  1. Deploy the web server and load generator to your cluster.

    env HTTPS_PROXY=http://localhost:8118 \
      kubectl apply -f  https://raw.githubusercontent.com/GoogleCloudPlatform/istio-samples/master/sample-apps/helloserver/server/server.yaml
    env HTTPS_PROXY=http://localhost:8118 \
      kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/istio-samples/master/sample-apps/helloserver/loadgen/loadgen.yaml
    
  2. To verify that you can view logs from your cluster in the Cloud Logging dashboard, go to the Logs Explorer in the Google Cloud Console:

    Go to Logs Explorer

  3. Copy the sample query below into the Query builder field.

    resource.type="k8s_container" resource.labels.cluster_name="CLUSTER_NAME"
    

    Replace CLUSTER_NAME with your cluster name.

  4. Click Run query. You should see recent cluster logs appear under Query results.

    Cluster logs in Google Cloud's operations suite

  5. After you have confirmed the logs appear in query results, remove the load generator and web server.

    env HTTPS_PROXY=http://localhost:8118 \
      kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/istio-samples/master/sample-apps/helloserver/loadgen/loadgen.yaml
    
    env HTTPS_PROXY=http://localhost:8118 \
      kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/istio-samples/master/sample-apps/helloserver/server/server.yaml
    

Installing the metrics collector

In this section, you install an agent to send data to Cloud Monitoring.

  1. From the anthos-samples/aws-logging-monitoring/logging/ directory, change into the anthos-samples/aws-logging-monitoring/monitoring/ directory.

    cd ../monitoring
    
  2. Open prometheus.yaml in a text editor. This file contains a manifest with configuration for the Stackdriver-Prometheus sidecar.

    Find the StatefulSet named stackdriver-prometheus-k8s. Under spec.args, set the following variables to match your environment.

    "--stackdriver.project-id=PROJECT_ID"
    "--stackdriver.kubernetes.location=GC_REGION"
    "--stackdriver.generic.location=GC_REGION"
    "--stackdriver.kubernetes.cluster-name=CLUSTER_NAME"
    

    Replace the following:

    • PROJECT_ID: your project ID.
    • GC_REGION: the Google Cloud region where you want to store logs. Choose a region that is near the AWS region. See Global Locations - Regions & Zones for more information— for example, us-central1.
    • CLUSTER_NAME: The name of your cluster.
  3. (Optional) Based upon your workloads, the number of nodes in your cluster, and the number of pods per node, you might have to set memory and CPU resource requests. For more information, see Recommended CPU and memory allocations.

  4. Deploy the stackdriver-prometheus StatefulSet and the exporter sidecar to your cluster.

    env HTTPS_PROXY=http://localhost:8118 \
      kubectl apply -f server-configmap.yaml
    
    env HTTPS_PROXY=http://localhost:8118 \
      kubectl apply -f sidecar-configmap.yaml
    
    env HTTPS_PROXY=http://localhost:8118 \
      kubectl apply -f prometheus.yaml
    
  5. Use the kubectl tool to verify that the stackdriver-prometheus Pod is running.

    env HTTPS_PROXY=http://localhost:8118 \
      kubectl get pods -n kube-system | grep stackdriver-prometheus
    

    The kubectl tool outputs the status of the stackdriver-prometheus Pod.

    stackdriver-prometheus-k8s-0         2/2     Running   0          5h24m
    
  6. Use the kubectl tool to get the stackdriver-prometheus logs and verify that the pod has started up.

    env HTTPS_PROXY=http://localhost:8118 \
      kubectl logs stackdriver-prometheus-k8s-0 -n kube-system \
      stackdriver-prometheus-sidecar
    

    The output includes status messages from the workload.

    level=info ts=2021-01-02T03:04:05.678Z caller=main.go:598 msg="Web server started"
    level=info ts=2021-01-02T03:04:05.678Z caller=main.go:579 msg="Stackdriver client started"
    
  7. To verify that your cluster metrics are being exported to Cloud Monitoring, go to the Metrics Explorer in the Google Cloud Console:

    Go to Metrics Explorer

  8. In Metrics Explorer, click Query editor and then copy in the following command:

    fetch k8s_container
    | metric 'kubernetes.io/anthos/up'
    | filter
        resource.project_id == 'PROJECT_ID'
        && (resource.cluster_name =='CLUSTER_NAME')
    | group_by 1m, [value_up_mean: mean(value.up)]
    | every 1m
    

    Replace the following:

    • PROJECT_ID: your project ID.
    • CLUSTER_NAME: the cluster name you used when Creating a user cluster— for example, cluster-0.
  9. Click run query. You should see 1.0. in the chart, indicating that the cluster is up.

    Monitoring for the cluster

Creating a Dashboard in Cloud Monitoring

In this section, you create a Cloud Monitoring dashboard that monitors container status in your cluster.

  1. From the anthos-samples/aws-logging-monitoring/monitoring/ directory, change into the anthos-samples/aws-logging-monitoring/monitoring/dashboards directory.

    cd dashboards
    
  2. Replace instances of the CLUSTER_NAME string in pod-status.json with your cluster name.

    sed -i "s/CLUSTER_NAME/CLUSTER_NAME/g" pod-status.json
    

    Replace CLUSTER_NAME with your cluster name.

  3. Create a custom dashboard with the configuration file by running the following command:

    gcloud monitoring dashboards create --config-from-file=pod-status.json
    
  4. To verify that your dashboard is created, go to Cloud Monitoring Dashboards in the Google Cloud Console.

    Go to Dashboards

    Open the newly created dashboard with a name in the format of CLUSTER_NAME (Anthos cluster on AWS) pod status.

Cleaning up

In this section, you remove the logging and monitoring components from your cluster.

  1. Delete the monitoring dashboard in the Dashboards list view in the Google Cloud Console by clicking the delete button associated with the dashboard name.

  2. Change into the anthos-samples/aws-logging-monitoring/ directory.

    cd anthos-samples/aws-logging-monitoring
    
  3. To remove all the resources created in this guide, run the following commands:

    env HTTPS_PROXY=http://localhost:8118 \
      kubectl delete -f logging/
    env HTTPS_PROXY=http://localhost:8118 \
      kubectl delete -f monitoring/
    env HTTPS_PROXY=http://localhost:8118 \
      kubectl delete secret google-cloud-credentials -n kube-system
    env HTTPS_PROXY=http://localhost:8118 \
      kubectl delete -n kube-system \
        pvc/stackdriver-log-aggregator-persistent-volume-claim-stackdriver-log-aggregator-0
    env HTTPS_PROXY=http://localhost:8118 \
      kubectl delete -n kube-system \
        pvc/stackdriver-log-aggregator-persistent-volume-claim-stackdriver-log-aggregator-1
    env HTTPS_PROXY=http://localhost:8118 \
      kubectl delete -n kube-system \
        pvc/stackdriver-prometheus-data-stackdriver-prometheus-k8s-0
    
    rm -r credentials.json
    gcloud iam service-accounts delete anthos-lm-forwarder
    

Recommended CPU and memory allocations

This section includes recommended CPU and allocations for the individual components used in logging and monitoring. Each of the following tables lists CPU and memory requests for a cluster with a range of node sizes. You set resource requests for a component in the file listed in the table.

For more information, see Kubernetes best practices: Resource requests and limits and Managing Resources for Containers.

1-3 Nodes

File Resource CPU Requests CPU Limits Memory Requests Memory Limits
monitoring/prometheus.yaml prometheus-server 100m 390m 330Mi N/A
monitoring/prometheus.yaml stackdriver-prometheus-sidecar 25m 340m 370Mi N/A
logging/aggregator.yaml stackdriver-log-aggregator 70m 170m 1150Mi N/A
logging/forwarder.yaml stackdriver-log-forwarder 30m 100m 100Mi 600Mi

4-10 Nodes

File Resource CPU Requests CPU Limits Memory Requests Memory Limits
monitoring/prometheus.yaml prometheus-server 120m 390m 420Mi N/A
monitoring/prometheus.yaml stackdriver-prometheus-sidecar 40m 340m 400Mi N/A
logging/aggregator.yaml stackdriver-log-aggregator 70m 170m 1300Mi N/A
logging/forwarder.yaml stackdriver-log-forwarder 50m 100m 100Mi 600Mi

11-25 Nodes

File Resource CPU Requests CPU Limits Memory Requests Memory Limits
monitoring/prometheus.yaml prometheus-server 150m 390m 500Mi N/A
monitoring/prometheus.yaml stackdriver-prometheus-sidecar 200m 340m 600Mi N/A
logging/aggregator.yaml stackdriver-log-aggregator 80m 300m 1500Mi N/A
logging/forwarder.yaml stackdriver-log-forwarder 60m 100m 100Mi 600Mi

25-50 Nodes

File Resource CPU Requests CPU Limits Memory Requests Memory Limits
monitoring/prometheus.yaml prometheus-server 150m 390m 650Mi N/A
monitoring/prometheus.yaml stackdriver-prometheus-sidecar 200m 340m 1500Mi N/A
logging/aggregator.yaml stackdriver-log-aggregator 150m 170m 1600Mi N/A
logging/forwarder.yaml stackdriver-log-forwarder 60m 100m 100Mi 600Mi

51-100 Nodes

File Resource CPU Requests CPU Limits Memory Requests Memory Limits
monitoring/prometheus.yaml prometheus-server 160m 500m 1800Mi N/A
monitoring/prometheus.yaml stackdriver-prometheus-sidecar 200m 500m 1900Mi N/A
logging/aggregator.yaml stackdriver-log-aggregator 220m 1100m 1600Mi N/A
logging/forwarder.yaml stackdriver-log-forwarder 60m 100m 100Mi 600Mi

More than 100 nodes

File Resource CPU Requests CPU Limits Memory Requests Memory Limits
monitoring/prometheus.yaml prometheus-server 400m N/A 6500Mi N/A
monitoring/prometheus.yaml stackdriver-prometheus-sidecar 400m N/A 7500Mi N/A
logging/aggregator.yaml stackdriver-log-aggregator 450m N/A 1700Mi N/A
logging/forwarder.yaml stackdriver-log-forwarder 60m 100m 100Mi 600Mi

What's next?

Learn about Cloud Logging:

Learn about Cloud Monitoring: