使用 Terraform 在 Compute Engine 虚拟机上创建 Google Distributed Cloud 混合集群

本文档介绍如何使用 Terraform 在 Compute Engine 上设置虚拟机，以便在高可用性 (HA) 模式下安装和试用 Google Distributed Cloud。如需了解如何使用 Google Cloud CLI 执行此任务，请参阅在 Compute Engine 虚拟机上试用 Google Distributed Cloud。

您无需准备任何硬件，即可快速试用 Google Distributed Cloud。提供的 Terraform 脚本会在 Compute Engine 上创建虚拟机网络，以便用于运行 Google Distributed Cloud。在本教程中，我们使用混合集群部署模型。

完成以下步骤来运行示例集群：

执行 Terraform 脚本，在 Compute Engine 上设置虚拟机网络
部署混合集群
验证集群

准备工作

部署需要以下资源：

一个可以访问互联网并安装了以下工具的工作站：Git、Google Cloud CLI 和 Terraform（>= v0.15.5，< 1.2）。
Google Cloud 项目。

转到项目选择
项目中满足下列要求之一的服务账号及其密钥文件下载到工作站：
1. 服务账号具有所有者权限
2. 该服务账号同时具有 Editor 和 Project IAM Admin 权限
转到“服务账号”

在 Compute Engine 上设置虚拟机网络

在本部分，您将使用 anthos-samples 代码库中的 Terraform 脚本。这些脚本会使用以下资源配置 Compute Engine：

六个虚拟机，用于部署混合集群：
- 一个用于将混合集群部署到其他机器的管理员虚拟机。
- 运行混合集群控制平面所需的三个控制平面节点的三个虚拟机。
- 在混合集群上运行工作负载所需的两个工作器节点的两个虚拟机。
所有节点之间的 VxLAN 叠加网络，用于模拟 L2 连接。
从管理员虚拟机访问控制平面和工作器节点的 SSH。

Google Cloud 上使用 Compute Engine 虚拟机的裸金属基础设施

您可以通过向 instance_count Terraform 变量添加新节点名称来更改集群中的节点数量：

###################################################################################
# The recommended instance count for High Availability (HA) is 3 for Control plane
# and 2 for Worker nodes.
###################################################################################
variable "instance_count" {
  description = "Number of instances to provision per layer (Control plane and Worker nodes) of the cluster"
  type        = map(any)
  default = {
    "controlplane" : 3
    "worker" : 2
  }
}

下载 anthos-bm-gcp-terraform 示例的 Terraform 脚本：

git clone https://github.com/GoogleCloudPlatform/anthos-samples
cd anthos-samples/anthos-bm-gcp-terraform

更新 terraform.tfvars.sample 文件，以包含特定于您的环境的变量：

project_id       = "PROJECT_ID"
region           = "GOOGLE_CLOUD_REGION"
zone             = "GOOGLE_CLOUD_ZONE"
credentials_file = "PATH_TO_GOOGLE_CLOUD_SERVICE_ACCOUNT_KEY_FILE"

将 terraform.tfvars.sample 文件重命名为 terraform 用于变量文件的默认名称：
```
mv terraform.tfvars.sample terraform.tfvars
```
注意：如果在此过程稍后运行 terraform apply 时使用 -var-file 标志来明确引用该文件，则可以跳过重命名变量文件。
将示例目录初始化为 Terraform 工作目录。这将设置所需的 Terraform 状态管理配置，类似于 git init：
```
terraform init
```
创建 Terraform 执行计划。此步骤比较资源的状态、验证脚本并创建执行计划：
```
terraform plan
```
应用 Terraform 脚本中描述的更改。此步骤将在给定提供方（在本例中为 Google Cloud）上执行计划，以达到所需的资源状态：
```
terraform apply  # when prompted to confirm the Terraform plan, type 'Yes' and enter
```
注意：apply 命令会设置基于 Compute Engine 虚拟机的基础设施。这可能需要几分钟时间（大约需要 3-5 分钟），才能设置整个裸机集群。

部署混合集群

Terraform 执行完成后，您就可以部署混合集群了。

使用 SSH 连接到管理员主机：
```
gcloud compute ssh tfadmin@cluster1-abm-ws0-001 --project=PROJECT_ID --zone=GOOGLE_CLOUD_ZONE
```
您可以忽略任何有关更新虚拟机的消息并完成本教程。如果您计划将虚拟机保留为测试环境，则可能需要按照 Ubuntu 文档中的说明更新操作系统或升级到下一个版本。

运行以下代码块以在配置的 Compute Engine 虚拟机上创建 cluster1 混合集群：

sudo ./run_initialization_checks.sh && \
sudo bmctl create config -c cluster1 && \
sudo cp ~/cluster1.yaml bmctl-workspace/cluster1 && \
sudo bmctl create cluster -c cluster1

运行 bmctl 命令会开始设置新的混合集群。这包括对节点进行预检检查、创建管理员集群和用户集群，以及使用 Connect Agent 向 Google Cloud 注册集群。整个设置过程最多可能需要 15 分钟。创建集群时，您会看到以下输出：

    Created config: bmctl-workspace/cluster1/cluster1.yaml
    Creating bootstrap cluster... OK
    Installing dependency components... OK
    Waiting for preflight check job to finish... OK
    - Validation Category: machines and network
            - [PASSED] 10.200.0.3
            - [PASSED] 10.200.0.4
            - [PASSED] 10.200.0.5
            - [PASSED] 10.200.0.6
            - [PASSED] 10.200.0.7
            - [PASSED] gcp
            - [PASSED] node-network
    Flushing logs... OK
    Applying resources for new cluster
    Waiting for cluster to become ready OK
    Writing kubeconfig file
    kubeconfig of created cluster is at bmctl-workspace/cluster1/cluster1-kubeconfig, please run
    kubectl --kubeconfig bmctl-workspace/cluster1/cluster1-kubeconfig get nodes
    to get cluster node status.
    Please restrict access to this file as it contains authentication credentials of your cluster.
    Waiting for node pools to become ready OK
    Moving admin cluster resources to the created admin cluster
    Flushing logs... OK
    Deleting bootstrap cluster... OK

验证集群并与之交互

您可以在管理员机器的 bmctl-workspace 目录下找到集群的 kubeconfig 文件。要验证部署，请完成以下步骤。

如果您断开与管理员主机的连接，请使用 SSH 连接到该主机：

# You can copy the command from the output of the Terraform execution above
gcloud compute ssh tfadmin@cluster1-abm-ws0-001 --project=PROJECT_ID --zone=GOOGLE_CLOUD_ZONE

使用集群配置文件的路径设置 KUBECONFIG 环境变量，以便在集群上运行 kubectl 命令：

export CLUSTER_ID=cluster1
export KUBECONFIG=$HOME/bmctl-workspace/$CLUSTER_ID/$CLUSTER_ID-kubeconfig
kubectl get nodes

您应该会看到输出的集群节点，类似于以下输出：

NAME          STATUS   ROLES    AGE   VERSION
cluster1-abm-cp1-001   Ready    master   17m   v1.18.6-gke.6600
cluster1-abm-cp2-001   Ready    master   16m   v1.18.6-gke.6600
cluster1-abm-cp3-001   Ready    master   16m   v1.18.6-gke.6600
cluster1-abm-w1-001    Ready    <none>   14m   v1.18.6-gke.6600
cluster1-abm-w2-001    Ready    <none>   14m   v1.18.6-gke.6600

从 Google Cloud 控制台登录集群

如需在 Google Cloud 控制台中观察工作负载，您必须登录集群。

如需详细了解如何登录集群，请参阅通过 Google Cloud 控制台使用集群。

清理

您可以通过以下两种方式清理集群设置。

控制台

如果您为此过程创建了一个专用项目，请从 Google Cloud 控制台中删除该 Google Cloud 项目。

删除项目

Terraform

取消注册集群，然后再删除 Terraform 创建的所有资源。

# Use SSH to connect to the admin host
gcloud compute ssh tfadmin@cluster1-abm-ws0-001 --project=PROJECT_ID --zone=GOOGLE_CLOUD_ZONE

# Reset the cluster
export CLUSTER_ID=cluster1
export KUBECONFIG=$HOME/bmctl-workspace/$CLUSTER_ID/$CLUSTER_ID-kubeconfig
sudo bmctl reset --cluster $CLUSTER_ID

# log out of the admin host
exit

使用 Terraform 删除所有资源。

terraform destroy --auto-approve