This topic shows you how to export logs and metrics from an GKE on AWS user cluster to Cloud Logging and Cloud Monitoring.
Overview
There are multiple options for logging and monitoring with GKE on AWS. GKE Enterprise can be integrated with Cloud Logging and Cloud Monitoring. As GKE Enterprise is based on open source Kubernetes, many open source and third party tools are compatible.
Logging and monitoring options
You have several logging and monitoring options for your GKE Enterprise cluster:
Deploy the Cloud Logging and Cloud Monitoring agents to monitor and view logs from your workloads in the Google Cloud console. This topic explains this solution.
Use open source tools such as Prometheus, Grafana, and Elasticsearch. This topic does not describe this solution.
Use third-party solutions such as Datadog. This topic does not describe this solution.
Cloud Logging and Cloud Monitoring
With GKE Enterprise, Cloud Logging, and Cloud Monitoring you can create dashboards, send alerts, monitor, and review logs for the workloads running on your cluster. You must configure the Cloud Logging and Cloud Monitoring agents in order to collect logs and metrics into your Google Cloud project. If you do not configure these agents, GKE on AWS does not collect logging or monitoring data.
What data is collected
When configured, the agents collect logs and metric data from your cluster and
the workloads running on your cluster. These data are stored in your
Google Cloud project. You configure the project ID in the project_id
field in a configuration file when you
install the log forwarder.
The data collected includes the following:
- Logs for system services on each of the worker nodes.
- Application logs for all workloads running on the cluster.
- Metrics for the cluster and system services. For more information on specific metrics, see GKE Enterprise metrics.
- If your applications are configured with
Prometheus scrape targets,
and annotated with configuration including
prometheus.io/scrape
,prometheus.io/path
, andprometheus.io/port
, application metrics for Pods.
The agents can be disabled at any time. For more information, see Cleaning up. Data collected by the agents can be managed and deleted like any other metric and log data, as described in the Cloud Monitoring and Cloud Logging documentation.
Log data is stored according to your configured retention rules. Metrics data retention varies based on type.
Logging and monitoring components
To export cluster-level telemetry from GKE on AWS into Google Cloud, you deploy the following components into your cluster:
- Stackdriver Log Forwarder (stackdriver-log-forwarder-*). A Fluentbit DaemonSet that forwards logs from each Kubernetes node to Cloud Logging.
- GKE Metrics Agent (gke-metrics-agent-*). An OpenTelemetry Collector based DaemonSet that collects metrics data and forwards it to Cloud Monitoring.
Manifests for these components are in the anthos-samples repository on GitHub.
Prerequisites
A Google Cloud project with billing enabled. For more information on costs, see Pricing for Google Cloud Observability.
The project must also have the Cloud Logging and Cloud Monitoring APIs enabled. To enable these APIs, run the following commands:
gcloud services enable logging.googleapis.com gcloud services enable monitoring.googleapis.com
An GKE on AWS environment, including a user cluster registered with Connect. Run the following command to verify that your cluster is registered.
gcloud container fleet memberships list
If your cluster is registered, Google Cloud CLI prints the cluster's name and ID.
NAME EXTERNAL_ID cluster-0 1abcdef-1234-4266-90ab-123456abcdef
If you do not see your cluster listed, see Connecting to a cluster with Connect
Install the
git
command-line tool on your machine.
Setting up permissions for Google Cloud Observability
Logging and monitoring agents use Fleet Workload Identity to communicate with Cloud Logging and Cloud Monitoring. The identity needs permissions to write logs and metrics in your project. To add the permissions, running the following commands:
gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:PROJECT_ID.svc.id.goog[kube-system/stackdriver]" \
--role=roles/logging.logWriter
gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:PROJECT_ID.svc.id.goog[kube-system/stackdriver]" \
--role=roles/monitoring.metricWriter
Replace PROJECT_ID
with your Google Cloud project.
Connect to the bastion host
To connect to your GKE on AWS resources, perform the following steps. Select if you have an existing AWS VPC (or direct connection to your VPC) or created a dedicated VPC when creating your management service.
Existing VPC
If you have a direct or VPN connection to an existing VPC, omit the line
env HTTP_PROXY=http://localhost:8118
from commands in this topic.
Dedicated VPC
When you create a management service in a dedicated VPC, GKE on AWS includes a bastion host in a public subnet.
To connect to your management service, perform the following steps:
Change to the directory with your GKE on AWS configuration. You created this directory when Installing the management service.
cd anthos-aws
To open the tunnel, run the
bastion-tunnel.sh
script. The tunnel forwards tolocalhost:8118
.To open a tunnel to the bastion host, run the following command:
./bastion-tunnel.sh -N
Messages from the SSH tunnel appear in this window. When you are ready to close the connection, stop the process by using Control+C or closing the window.
Open a new terminal and change into your
anthos-aws
directory.cd anthos-aws
Check that you're able to connect to the cluster with
kubectl
.env HTTPS_PROXY=http://localhost:8118 \ kubectl cluster-info
The output includes the URL for the management service API server.
Cloud Logging and Cloud Monitoring on control plane nodes
With GKE on AWS 1.8.0 and higher, Cloud Logging and
Cloud Monitoring for control plane nodes can be automatically configured when
creating new user clusters. To enable Cloud Logging or Cloud Monitoring,
you populate the controlPlane.cloudOperations
section of your
AWSCluster
configuration.
cloudOperations:
projectID: PROJECT_ID
location: GC_REGION
enableLogging: ENABLE_LOGGING
enableMonitoring: ENABLE_MONITORING
Replace the following:
PROJECT_ID
: your project ID.GC_REGION
: the Google Cloud region where you want to store logs. Choose a region that is near the AWS region. For more information, see Global Locations - Regions & Zones — for example,us-central1
.ENABLE_LOGGING
:true
orfalse
, whether Cloud Logging is enabled on control plane nodes.ENABLE_MONITORING
:true
orfalse
, whether Cloud Monitoring is enabled on control plane nodes.
Next, follow the steps in Creating a custom user cluster .
Cloud Logging and Cloud Monitoring on worker nodes
Removing the previous version
If you have setup an earlier version of the logging and monitoring agents that
includes stackdriver-log-aggregator
(Fluentd) and stackdriver-prometheus-k8s
(Prometheus), you might want to uninstall them first before moving on.
Installing the logging forwarder
In this section, you install the Stackdriver Log Forwarder onto your cluster.
From the
anthos-samples/aws-logging-monitoring/
directory, change into thelogging/
directory.cd logging/
Modify the file
forwarder.yaml
to match your project configuration:sed -i "s/PROJECT_ID/PROJECT_ID/g" forwarder.yaml sed -i "s/CLUSTER_NAME/CLUSTER_NAME/g" forwarder.yaml sed -i "s/CLUSTER_LOCATION/GC_REGION/g" forwarder.yaml
Replace the following:
PROJECT_ID
: your project ID.CLUSTER_NAME
: The name of your cluster — for example,cluster-0
GC_REGION
: the Google Cloud region where you want to store logs. Choose a region that is near the AWS region. For more information, see Global Locations - Regions & Zones — for example,us-central1
.
(Optional) Based upon your workloads, the number of nodes in your cluster, and the number of pods per node, you might have to set memory and CPU resource requests. For more information, see Recommended CPU and memory allocations.
From your
anthos-aws
directory, useanthos-gke
to switch context to your user cluster.cd anthos-aws env HTTPS_PROXY=http://localhost:8118 \ anthos-gke aws clusters get-credentials CLUSTER_NAME
Replace CLUSTER_NAME with your user cluster name.Create the
stackdriver
service account if it does not exist and deploy the log forwarder to the cluster.env HTTPS_PROXY=http://localhost:8118 \ kubectl create serviceaccount stackdriver -n kube-system env HTTPS_PROXY=http://localhost:8118 \ kubectl apply -f forwarder.yaml
Use
kubectl
to verify that the pods have started up.env HTTPS_PROXY=http://localhost:8118 \ kubectl get pods -n kube-system | grep stackdriver-log
You should see one forwarder pod per node in a node pool. For example, in a 6-node cluster, you should see six forwarder pods.
stackdriver-log-forwarder-2vlxb 2/2 Running 0 21s stackdriver-log-forwarder-dwgb7 2/2 Running 0 21s stackdriver-log-forwarder-rfrdk 2/2 Running 0 21s stackdriver-log-forwarder-sqz7b 2/2 Running 0 21s stackdriver-log-forwarder-w4dhn 2/2 Running 0 21s stackdriver-log-forwarder-wrfg4 2/2 Running 0 21s
Testing log forwarding
In this section, you deploy a workload containing a basic HTTP web server with a load generator to your cluster. You then test that logs are present in Cloud Logging.
Before installing this workload, you can verify the manifests for the web server and load generator.
Deploy the web server and load generator to your cluster.
env HTTPS_PROXY=http://localhost:8118 \ kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/istio-samples/master/sample-apps/helloserver/server/server.yaml env HTTPS_PROXY=http://localhost:8118 \ kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/istio-samples/master/sample-apps/helloserver/loadgen/loadgen.yaml
To verify that you can view logs from your cluster in the Cloud Logging dashboard, go to the Logs Explorer in the Google Cloud console:
Copy the sample query below into the Query builder field.
resource.type="k8s_container" resource.labels.cluster_name="CLUSTER_NAME"
Replace CLUSTER_NAME with your cluster name.
Click Run query. You should see recent cluster logs appear under Query results.
After you have confirmed the logs appear in query results, remove the load generator and web server.
env HTTPS_PROXY=http://localhost:8118 \ kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/istio-samples/master/sample-apps/helloserver/loadgen/loadgen.yaml env HTTPS_PROXY=http://localhost:8118 \ kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/istio-samples/master/sample-apps/helloserver/server/server.yaml
Installing the metrics collector
In this section, you install an agent to send data to Cloud Monitoring.
From the
anthos-samples/aws-logging-monitoring/logging/
directory, change into theanthos-samples/aws-logging-monitoring/monitoring/
directory.cd ../monitoring
Modify the file
gke-metrics-agent.yaml
to match your project configuration:sed -i "s/PROJECT_ID/PROJECT_ID/g" gke-metrics-agent.yaml sed -i "s/CLUSTER_NAME/CLUSTER_NAME/g" gke-metrics-agent.yaml sed -i "s/CLUSTER_LOCATION/GC_REGION/g" gke-metrics-agent.yaml
Replace the following:
PROJECT_ID
: your project ID.CLUSTER_NAME
: The name of your cluster — for example,cluster-0
GC_REGION
: the Google Cloud region where you want to store logs. Choose a region that is near the AWS region. For more information, see Global Locations - Regions & Zones — for example,us-central1
.
(Optional) Based upon your workloads, the number of nodes in your cluster, and the number of pods per node, you might have to set memory and CPU resource requests. For more information, see Recommended CPU and memory allocations.
Create the
stackdriver
service account if it does not exist and deploy the metrics agent to your cluster.env HTTPS_PROXY=http://localhost:8118 \ kubectl create serviceaccount stackdriver -n kube-system env HTTPS_PROXY=http://localhost:8118 \ kubectl apply -f gke-metrics-agent.yaml
Use the
kubectl
tool to verify that thegke-metrics-agent
Pod is running.env HTTPS_PROXY=http://localhost:8118 \ kubectl get pods -n kube-system | grep gke-metrics-agent
You should see one agent pod per node in a node pool. For example, in a 3-node cluster, you should see three agent pods.
gke-metrics-agent-gjxdj 2/2 Running 0 102s gke-metrics-agent-lrnzl 2/2 Running 0 102s gke-metrics-agent-s6p47 2/2 Running 0 102s
To verify that your cluster metrics are being exported to Cloud Monitoring, go to the Metrics Explorer in the Google Cloud console:
In Metrics Explorer, click Query editor and then copy in the following command:
fetch k8s_container | metric 'kubernetes.io/anthos/otelcol_exporter_sent_metric_points' | filter resource.project_id == 'PROJECT_ID' && (resource.cluster_name =='CLUSTER_NAME') | align rate(1m) | every 1m
Replace the following:
PROJECT_ID
: your project ID.CLUSTER_NAME
: the cluster name you used when Creating a user cluster— for example,cluster-0
.
Click Run query. The rate of metric points sent to Cloud Monitoring from each
gke-metrics-agent
pod in your cluster appears.Some other metrics worth trying are, but not limited to:
kubernetes.io/anthos/container_memory_working_set_bytes
: Container memory usage;kubernetes.io/anthos/container_cpu_usage_seconds_total
: Container CPU usage;kubernetes.io/anthos/apiserver_aggregated_request_total
: kube-apiserver request count, only available if Cloud Monitoring is enabled on control plane.
For a complete list of Metrics available, see Anthos Metrics. For information on how to use the user interface, see Metrics Explorer.
Creating a Dashboard in Cloud Monitoring
In this section, you create a Cloud Monitoring dashboard that monitors container status in your cluster.
From the
anthos-samples/aws-logging-monitoring/monitoring/
directory, change into theanthos-samples/aws-logging-monitoring/monitoring/dashboards
directory.cd dashboards
Replace instances of the
CLUSTER_NAME
string inpod-status.json
with your cluster name.sed -i "s/CLUSTER_NAME/CLUSTER_NAME/g" pod-status.json
Replace
CLUSTER_NAME
with your cluster name.Create a custom dashboard with the configuration file by running the following command:
gcloud monitoring dashboards create --config-from-file=pod-status.json
To verify that your dashboard is created, go to Cloud Monitoring Dashboards in the Google Cloud console.
Open the newly created dashboard with a name in the format of
CLUSTER_NAME (Anthos cluster on AWS) pod status
.
Cleaning up
In this section, you remove the logging and monitoring components from your cluster.
Delete the monitoring dashboard in the Dashboards list view in the Google Cloud console by clicking the delete button associated with the dashboard name.
Change into the
anthos-samples/aws-logging-monitoring/
directory.cd anthos-samples/aws-logging-monitoring
To remove all the resources created in this guide, run the following commands:
env HTTPS_PROXY=http://localhost:8118 \ kubectl delete -f logging/ env HTTPS_PROXY=http://localhost:8118 \ kubectl delete -f monitoring/
Recommended CPU and memory allocations
This section includes recommended CPU and allocations for the individual components used in logging and monitoring. Each of the following tables lists CPU and memory requests for a cluster with a range of node sizes. You set resource requests for a component in the file listed in the table.
For more information, see Kubernetes best practices: Resource requests and limits and Managing Resources for Containers.
1-10 Nodes
File | Resource | CPU Requests | CPU Limits | Memory Requests | Memory Limits |
---|---|---|---|---|---|
monitoring/gke-metrics-agent.yaml |
gke-metrics-agent | 30m | 100m | 50Mi | 500Mi |
logging/forwarder.yaml |
stackdriver-log-forwarder | 50m | 100m | 100Mi | 600Mi |
10-100 Nodes
File | Resource | CPU Requests | CPU Limits | Memory Requests | Memory Limits |
---|---|---|---|---|---|
monitoring/gke-metrics-agent.yaml |
gke-metrics-agent | 50m | 100m | 50Mi | 500Mi |
logging/forwarder.yaml |
stackdriver-log-forwarder | 60m | 100m | 100Mi | 600Mi |
More than 100 nodes
File | Resource | CPU Requests | CPU Limits | Memory Requests | Memory Limits |
---|---|---|---|---|---|
monitoring/gke-metrics-agent.yaml |
gke-metrics-agent | 50m | 100m | 100Mi | N/A |
logging/forwarder.yaml |
stackdriver-log-forwarder | 60m | 100m | 100Mi | 600Mi |
What's next?
Learn about Cloud Logging:
- Cloud Logging overview
- Using the Logs Explorer
- Building queries for Cloud Logging
- Create logs-based metrics