部署分布式跟踪功能以观察微服务延迟时间

Last reviewed 2023-08-11 UTC

本文档介绍了如何部署使用分布式跟踪功能观察微服务延迟时间中所述的参考架构。本文档中介绍的部署使用 OpenTelemetry 和 Cloud Trace 捕获微服务应用的跟踪记录信息。

此部署中的示例应用由两个使用 Go 编写的微服务组成。

本文档假定您熟悉以下内容：

Go 编程语言
Google Kubernetes Engine (GKE)

目标

创建 GKE 集群并部署示例应用。
查看 OpenTelemetry 插桩代码。
查看插桩生成的跟踪记录和日志。

架构

下图展示了您部署的架构。

具有两个 GKE 集群的部署的架构。

您可以使用 Cloud Build（一个全代管式持续集成、交付和部署平台）根据示例代码构建容器映像并将其存储在 Artifact Registry 中。 GKE 集群在部署时从 Artifact Registry 中拉取映像。

前端服务接受对 / 网址的 HTTP 请求并调用后端服务。后端服务的地址由环境变量定义。

后端服务接受对 / 网址的 HTTP 请求，并对环境变量中定义的外部网址进行拨出调用。外部调用完成后，后端服务向调用者返回 HTTP 状态调用（例如 200）。

费用

在本文档中，您将使用 Google Cloud 的以下收费组件：

您可使用价格计算器根据您的预计使用情况来估算费用。 Google Cloud 新用户可能有资格申请免费试用。

完成本文档中描述的任务后，您可以通过删除所创建的资源来避免继续计费。如需了解详情，请参阅清理。

准备工作

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

Enable the GKE, Cloud Trace, Cloud Build, Cloud Storage, and Artifact Registry APIs.

Enable the APIs

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

Enable the GKE, Cloud Trace, Cloud Build, Cloud Storage, and Artifact Registry APIs.

Enable the APIs

设置您的环境

在本部分中，您将使用在整个部署中使用的工具设置环境。您将通过 Cloud Shell 运行本部署中的所有终端命令。

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

将环境变量设置为您的 Google Cloud 项目的 ID：

export PROJECT_ID=$(gcloud config list --format 'value(core.project)' 2>/dev/null)

通过克隆关联的 Git 代码库下载此部署所需的文件：
```
git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples.git
cd kubernetes-engine-samples/observability/distributed-tracing
WORKDIR=$(pwd)
```
您将代码库文件夹设为工作目录 ($WORKDIR)，您将在该目录中执行与此部署相关的所有任务。这样，如果您不想保留资源，则可以在完成部署后删除该文件夹。

安装工具

在 Cloud Shell 中，安装 kubectx 和 kubens：
```
git clone https://github.com/ahmetb/kubectx $WORKDIR/kubectx
export PATH=$PATH:$WORKDIR/kubectx
```
您可以使用这些工具来处理多个 Kubernetes 集群、上下文和命名空间。
在 Cloud Shell 中，安装开源负载生成工具 Apache Bench：
```
sudo apt-get install apache2-utils
```

创建 Docker 代码库

创建 Docker 代码库以存储此部署的示例映像。

控制台

在 Google Cloud 控制台中，打开代码库页面。

打开“代码库”页面
点击创建代码库。
指定 distributed-tracing-docker-repo 作为该代码库的名称。
选择 Docker 作为格式，选择标准作为模式。
在位置类型下，选择区域，然后选择位置 us-west1。
点击创建。

该代码库会被添加到代码库列表中。

gcloud

在 Cloud Shell 中，在位置 us-west1 中创建一个名为 distributed-tracing-docker-repo 且带有说明 docker repository 的新 Docker 代码库：

gcloud artifacts repositories create distributed-tracing-docker-repo --repository-format=docker \
--location=us-west1 --description="Docker repository for distributed tracing deployment"

验证代码库是否已创建：
```
gcloud artifacts repositories list
```

创建 GKE 集群

在本部分中，您将创建两个 GKE 集群，然后在其中部署示例应用。默认情况下，创建的 GKE 集群对 Cloud Trace API 拥有只写权限，因此您在创建集群时无需定义访问权限。

在 Cloud Shell 中，创建集群：

gcloud container clusters create backend-cluster \
    --zone=us-west1-a \
    --verbosity=none --async

gcloud container clusters create frontend-cluster \
    --zone=us-west1-a \
    --verbosity=none

在本教程中，集群位于 us-west1-a 可用区中。如需了解详情，请参阅地理位置和区域。

获取集群凭据并将其存储在本地：

gcloud container clusters get-credentials backend-cluster --zone=us-west1-a
gcloud container clusters get-credentials frontend-cluster --zone=us-west1-a

重命名集群的上下文，以便在部署后面更轻松地访问这些上下文：

kubectx backend=gke_${PROJECT_ID}_us-west1-a_backend-cluster
kubectx frontend=gke_${PROJECT_ID}_us-west1-a_frontend-cluster

查看 OpenTelemetry 插桩

在以下部分中，您将查看示例应用中 main.go 文件的代码。这有助于您了解如何使用上下文传播来允许将多个请求的 span 附加到单个父级跟踪记录。

查看应用代码中的导入

import (
	"context"
	"fmt"
	"io/ioutil"
	"log"
	"net/http"
	"os"
	"strconv"

	cloudtrace "github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/trace"
	"github.com/gorilla/mux"
	"go.opentelemetry.io/contrib/detectors/gcp"
	"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
	"go.opentelemetry.io/contrib/propagators/autoprop"
	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/sdk/resource"
	"go.opentelemetry.io/otel/sdk/trace"
)

对于导入，请注意以下事项：

go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp 软件包包含 otelhttp 插件，该插件可用于对 HTTP 服务器或 HTTP 客户端进行插桩。服务器插桩从 HTTP 请求中检索 span 上下文，并记录 span 以便服务器处理请求。客户端插桩会将 span 上下文注入到传出 HTTP 请求中，并记录等待响应所用时间的 span。
go.opentelemetry.io/contrib/propagators/autoprop 软件包提供了 OpenTelemetry TextMapPropagator 接口的实现，otelhttp 会使用该接口处理传播。传播器确定用于在 HTTP 等传输中存储跟踪上下文的格式和键。具体而言，otelhttp 会将 HTTP 标头传递给传播器。传播器会从标头将 span 上下文提取到 Go 上下文中，或者将 Go 上下文中的 span 上下文进行编码并注入到标头中（具体取决于是客户端还是服务器）。默认情况下，autoprop 软件包使用 W3C 跟踪上下文传播格式注入和提取 span 上下文。
github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/trace 导入项会将跟踪记录导出到 Cloud Trace.。
github.com/gorilla/mux 导入项是示例应用用于处理请求的库。
go.opentelemetry.io/contrib/detectors/gcp 导入会向 span 添加属相（例如 cloud.availability_zone），用于标识您的应用在 Google Cloud 中的运行位置。
用于设置 OpenTelemetry 的 go.opentelemetry.io/otel、go.opentelemetry.io/otel/sdk/trace 和 go.opentelemetry.io/otel/sdk/resource 导入项。

查看 `main` 函数

main 函数设置目的地为 Cloud Trace 的跟踪记录导出，并使用 mux 路由器来处理向 / 网址发出的请求。

func main() {
	ctx := context.Background()
	// Set up the Cloud Trace exporter.
	exporter, err := cloudtrace.New()
	if err != nil {
		log.Fatalf("cloudtrace.New: %v", err)
	}
	// Identify your application using resource detection.
	res, err := resource.New(ctx,
		// Use the GCP resource detector to detect information about the GKE Cluster.
		resource.WithDetectors(gcp.NewDetector()),
		resource.WithTelemetrySDK(),
	)
	if err != nil {
		log.Fatalf("resource.New: %v", err)
	}
	tp := trace.NewTracerProvider(
		trace.WithBatcher(exporter),
		trace.WithResource(res),
	)
	// Set the global TracerProvider which is used by otelhttp to record spans.
	otel.SetTracerProvider(tp)
	// Flush any pending spans on shutdown.
	defer tp.ForceFlush(ctx)

	// Set the global Propagators which is used by otelhttp to propagate
	// context using the w3c traceparent and baggage formats.
	otel.SetTextMapPropagator(autoprop.NewTextMapPropagator())

	// Handle incoming request.
	r := mux.NewRouter()
	r.HandleFunc("/", mainHandler)
	var handler http.Handler = r

	// Use otelhttp to create spans and extract context for incoming http
	// requests.
	handler = otelhttp.NewHandler(handler, "server")
	log.Fatal(http.ListenAndServe(fmt.Sprintf(":%v", os.Getenv("PORT")), handler))
}

请注意有关此代码的以下事项：

您可以配置 OpenTelemetry TracerProvider，它会在 Google Cloud 上运行时检测属性，并将跟踪记录导出到 Cloud Trace。
您可以使用 otel.SetTracerProvider 和 otel.SetTextMapPropagators 函数设定全局 TracerProvider 和 Propagator 设置。默认情况下，诸如 otelhttp 之类的插桩库使用全局注册的 TracerProvider 来创建 span，并使用 Propagator 来传播上下文。
您可以使用 otelhttp.NewHandler 封装 HTTP 服务器，以便对 HTTP 服务器进行插桩。

查看 `mainHandler` 函数

func mainHandler(w http.ResponseWriter, r *http.Request) {
	// Use otelhttp to record a span for the outgoing call, and propagate
	// context to the destination.
	destination := os.Getenv("DESTINATION_URL")
	resp, err := otelhttp.Get(r.Context(), destination)
	if err != nil {
		log.Fatal("could not fetch remote endpoint")
	}
	defer resp.Body.Close()
	_, err = ioutil.ReadAll(resp.Body)
	if err != nil {
		log.Fatalf("could not read response from %v", destination)
	}

	fmt.Fprint(w, strconv.Itoa(resp.StatusCode))
}

如需捕获发送到目的地的出站请求的延迟时间，请使用 otelhttp 插件发出 HTTP 请求。您还可以使用 r.Context 函数将传入请求与传出请求相关联，如以下列表所示：

// Use otelhttp to record a span for the outgoing call, and propagate
// context to the destination.
resp, err := otelhttp.Get(r.Context(), destination)

部署应用

在本部分中，您将使用 Cloud Build 为后端和前端服务构建容器映像，并将其部署到 GKE 集群。

构建 Docker 容器

在 Cloud Shell 中，从工作目录提交构建：

cd $WORKDIR
gcloud builds submit . --tag us-west1-docker.pkg.dev/$PROJECT_ID/distributed-tracing-docker-repo/backend:latest

确认容器映像已成功创建且可在 Artifact Registry 中使用：
```
gcloud artifacts docker images list us-west1-docker.pkg.dev/$PROJECT_ID/distributed-tracing-docker-repo
```
如果输出内容类似如下，则表明容器映像已成功创建，其中 PROJECT_ID 是您的 Google Cloud 项目的 ID：
```
NAME
us-west1-docker.pkg.dev/PROJECT_ID/distributed-tracing-docker-repo/backend
```

部署后端服务

在 Cloud Shell 中，将 kubectx 上下文设置为 backend 集群：
```
kubectx backend
```

为 backend 部署创建 YAML 文件：

export PROJECT_ID=$(gcloud info --format='value(config.project)')
envsubst < backend-deployment.yaml | kubectl apply -f -

确认 pod 正在运行。

kubectl get pods

输出显示的 Status 值为 Running：

NAME                       READY   STATUS    RESTARTS   AGE
backend-645859d95b-7mx95   1/1     Running   0          52s
backend-645859d95b-qfdnc   1/1     Running   0          52s
backend-645859d95b-zsj5m   1/1     Running   0          52s

使用负载均衡器公开 backend 部署：

kubectl expose deployment backend --type=LoadBalancer

获取 backend 服务的 IP 地址：

kubectl get services backend

输出类似于以下内容：

NAME      TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)          AGE
backend   LoadBalancer   10.11.247.58   34.83.88.143   8080:30714/TCP   70s

如果 EXTERNAL-IP 字段的值为 <pending>，请重复该命令，直到该值为 IP 地址。

将上一步中的 IP 地址捕获到变量中：

export BACKEND_IP=$(kubectl get svc backend -ojson | jq -r '.status.loadBalancer.ingress[].ip')

部署前端服务

在 Cloud Shell 中，将 kubectx 上下文设置为后端集群：
```
kubectx frontend
```

为 frontend 部署创建 YAML 文件：

export PROJECT_ID=$(gcloud info --format='value(config.project)')
envsubst < frontend-deployment.yaml | kubectl apply -f -

确认 pod 正在运行。

kubectl get pods

输出显示的 Status 值为 Running：

NAME                        READY   STATUS    RESTARTS   AGE
frontend-747b445499-v7x2w   1/1     Running   0          57s
frontend-747b445499-vwtmg   1/1     Running   0          57s
frontend-747b445499-w47pf   1/1     Running   0          57s

使用负载均衡器公开 frontend 部署：

kubectl expose deployment frontend --type=LoadBalancer

获取 frontend 服务的 IP 地址：

kubectl get services frontend

输出类似于以下内容：

NAME       TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)         AGE
frontend   LoadBalancer   10.27.241.93   34.83.111.232   8081:31382/TCP  70s

如果 EXTERNAL-IP 字段的值为 <pending>，请重复该命令，直到该值为 IP 地址。

将上一步中的 IP 地址捕获到变量中：

export FRONTEND_IP=$(kubectl get svc frontend -ojson | jq -r '.status.loadBalancer.ingress[].ip')

加载应用并查看跟踪记录

在本部分中，您将使用 Apache Bench 实用程序为您的应用创建请求。然后，在 Cloud Trace 中查看生成的跟踪记录。

在 Cloud Shell 中，使用 Apache Bench 通过 3 个并发线程来生成 1000 个请求：
```
ab -c 3 -n 1000 http://${FRONTEND_IP}:8081/
```
在 Google Cloud 控制台中，转到跟踪记录列表页面：

转到跟踪记录列表
如需查看时间轴，请点击其中一个标有 server 的 URI。

此跟踪记录包含四个名称如下的 span：
- 第一个 server span 捕获在前端服务器中处理 HTTP 请求的端到端延迟时间。
- 第一个 HTTP GET span 捕获由前端客户端向后端发出的 GET 调用的延迟时间。
- 第二个 server span 捕获在后端服务器中处理 HTTP 请求的端到端延迟时间。
- 第二个 HTTP GET span 捕获由后端客户端向 google.com 发出的 GET 调用的延迟时间。

清理

避免产生费用的最简单方法是删除您为本部署创建的 Google Cloud 项目。或者，您也可以删除各个资源。

删除项目

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.

删除各个资源

如需删除个别资源而不是删除整个项目，请在 Cloud Shell 中运行以下命令：

gcloud container clusters delete frontend-cluster --zone=us-west1-a
gcloud container clusters delete backend-cluster --zone=us-west1-a
gcloud artifacts repositories delete distributed-tracing-docker-repo --location us-west1

后续步骤

了解 OpenTelemetry。
如需查看更多参考架构、图表和最佳实践，请浏览云架构中心。

部署分布式跟踪功能以观察微服务延迟时间

目标

架构

费用

准备工作

设置您的环境

安装工具

创建 Docker 代码库

控制台

gcloud

创建 GKE 集群

查看 OpenTelemetry 插桩

查看应用代码中的导入

查看 main 函数

查看 mainHandler 函数

部署应用

构建 Docker 容器

部署后端服务

部署前端服务

加载应用并查看跟踪记录

清理

删除项目

删除各个资源

后续步骤

查看 `main` 函数

查看 `mainHandler` 函数