Google Distributed Cloud nettest
identifies connectivity issues in the
Kubernetes objects in your clusters, such as Pods, Nodes, Services, and some
external targets. nettest
doesn't check connections from external targets to
Pods, Nodes, or Services. This document describes how to deploy and run nettest
with one of the manifests, nettest.yaml
or nettest_rhel.yaml
, in
the anthos-samples
GitHub repository. Use nettest_rhel.yaml
if you run Google Distributed Cloud on
Red HatEnterprise Linux (RHEL) or CentOS. Use nettest.yaml
if you run
Google Distributed Cloud on Ubuntu.
This document also describes how you interpret the logs generated by nettest
to identify connectivity problems with your clusters.
About nettest
The nettest
diagnostic tool consists of the following Kubernetes objects. Each
object is specified in the nettest
YAML manifest files.
cloudprober
: a DaemonSet and a Service responsible for collecting network connection status, such as error rate and latency.echoserver
: a DaemonSet and a Service responsible for responding tocloudprober
, providing it the metrics for network connectivity.nettest
: a Pod containing theprometheus
andnettest
containers.prometheus
collects metrics fromcloudprober
.nettest
queriesprometheus
and displays the network test results in the log.
nettest-engine
: a ConfigMap to configure thenettest
container in thenettest
Pod.
The manifest also specifies the nettest
namespace and a dedicated
ServiceAccount (along with ClusterRole and ClusterRoleBinding) to isolate
nettest
from other cluster resources.
Run nettest
Deploy nettest
by running the following command for your operating system.
When the nettest
Pod starts, the test runs automatically. The test takes about
five minutes to complete.
For Ubuntu OS:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/anthos-samples/main/anthos-bm-utils/abm-nettest/nettest.yaml
For RHEL or CentOS OS:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/anthos-samples/main/anthos-bm-utils/abm-nettest/nettest_rhel.yaml
Get the test results
After the test has completed, which should take around five minutes after the
nettest
manifest is deployed, run the following command to see the nettest
results:
kubectl -n nettest logs nettest -c nettest
While nettest
is running, it sends messages like the following to stdout
:
I0413 03:33:04.879141 1 collectorui.go:130] Listening on ":8999"
I0413 03:33:04.879258 1 prometheus.go:172] Running prometheus controller
E0413 03:33:04.879628 1 prometheus.go:178] Prometheus controller: failed to
retries probers: Get "http://127.0.0.1:9090/api/v1/targets": dial tcp 127.0.0.1:9090:
connect: connection refused
If nettest
runs successfully without identifying any connectivity failures,
you see the following log entry:
I0211 21:58:34.689290 1 validate_metrics.go:78] Metric validation passed!
If nettest
found connection issues, it writes log entries like the following:
E0211 06:40:11.948634 1 collector.go:65] Engine error: step validateMetrics failed:
"Error rate in percentage": probe from "10.200.0.3" to "172.26.115.210:80" has value 100.000000,
threshold is 1.000000
"Error rate in percentage": probe from "10.200.0.3" to "172.26.27.229:80" has value 100.000000,
threshold is 1.000000
"Error rate in percentage": probe from "192.168.3.248" to "echoserver-hostnetwork_10.200.0.2_8080"
has value 2.007046, threshold is 1.000000
Although the default threshold is one percent (1.000000
), error rates up to
five percent can be ignored safely. For example, the error rate for connectivity
from IP address 192.168.3.248
to echoserver-hostnetwork_10.200.0.2_8080
in
the preceding example is approximately two percent (2.007046
). This is an
example of a reported connectivity issue that you can ignore.
Interpret the test results
When nettest
finishes and finds a connectivity issue, you see the following
entry in the nettest
Pod logs:
"Error rate in percentage": probe from {src} to {dst} has value 100.000000, threshold is 1.000000
Here, {src}
and {dst}
can be either:
echoserver
Pod IP: the connection to/from a Pod on the node.- Node IP: the connection to/from the node.
- Service IP (see the following text for details)
In addition, {dst}
can also be:
google.com
: an external connection.dns
: the connection to a non-hostNetwork
Service through DNS, that isechoserver-non-hostnetwork.nettest.svc.cluster.local
.The details for Service IP are in JSON-formatted probe entries in the log, like the following example. The following probe example shows that
172.26.27.229:80
is the address forservice-clusterip
. There are two probes with thistargets
value, one for the Pod (pod-service-clusterip
) and one for the Node (node-service-clusterip
).probe { name: "node-service-clusterip" … targets { host_names: "172.26.27.229:80" }
Validate your fixes
When have addressed all reported connectivity issues, remove the nettest
Pod
and reapply the nettest
manifest to rerun the connectivity tests.
For example, to rerun nettest
for Ubuntu, run the following commands:
kubectl -n nettest delete pod nettest
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/anthos-samples/main/anthos-bm-utils/abm-nettest/nettest.yaml
Clean up nettest
When you're done testing, run the following commands to remove all nettest
resources:
kubectl delete namespace nettest
kubectl delete clusterroles nettest:nettest
kubectl delete clusterrolebindings nettest:nettest