使用配置控制器管理 GKE 集群

本教程介绍如何使用 GKE 集群蓝图通过配置控制器预配 Google Kubernetes Engine (GKE) 集群。如果您是 GKE 集群运维人员,并且希望以声明方式管理集群配置,请遵循以下做法。

配置控制器是一个托管式服务,用于预配和编排 Anthos 和 Google Cloud 资源。它提供一个 API 端点,可以作为 Anthos Config Management 的组件预配、启用和编排 Google Cloud 资源。

KRM 蓝图是一种将常用的资源打包在一起的方法,同时将可在整个组织中推广的最佳做法编写为代码。

GKE 集群蓝图是一种 KRM 蓝图,其中包含在现有 Google Cloud 网络之上管理 GKE 集群所需的所有资源。您可以多次实例化蓝图以设置多个集群。

目标

  • 以声明方式配置 GKE 集群。
  • 使用配置控制器应用配置。

费用

本教程使用 Google Cloud 的以下收费组件:

如需查看 GKE 集群蓝图中包含的资源的完整列表,请参阅 GKE 软件包的“Resources”部分及其子软件包。

如需根据您的预计使用量来估算费用,请使用价格计算器

完成本教程后,您可以删除所创建的资源以避免继续计费。如需了解详情,请参阅清理

要求

准备工作

  1. 在 Cloud Console 中,激活 Cloud Shell。

    激活 Cloud Shell

    Cloud Shell 会话随即会在 Cloud Console 的底部启动,并显示命令行提示符。Cloud Shell 是一个已安装 Cloud SDK 的 Shell 环境,其中包括 gcloud 命令行工具以及已为当前项目设置的值。该会话可能需要几秒钟时间来完成初始化。

  2. 您可以从 Cloud Shell 运行本教程中的所有命令。

设置环境

在 Cloud Shell 中,运行以下命令:

  1. 安装 Kubernetes 的主要命令行界面 kubectl

    gcloud components install kubectl
    
  2. 安装 KRM 蓝图的主要命令行界面 kpt

    gcloud components install kpt
    
  3. 配置 kubectlkpt 以连接到配置控制器:

    gcloud alpha anthos config controller get-credentials CONFIG_CONTROLLER_NAME \
        --location COMPUTE_REGION \
        --project CONFIG_CONTROLLER_PROJECT_ID
    

    请替换以下内容:

    • CONFIG_CONTROLLER_NAME:配置控制器集群的名称。

    • COMPUTE_REGION:配置控制器集群的区域(例如 us-central1)。

    • CONFIG_CONTROLLER_PROJECT_ID:配置控制器集群的项目 ID。

  4. 启用 Resource Manager API:

    gcloud services enable cloudresourcemanager.googleapis.com \
        --project PROJECT_ID
    

    PROJECT_ID 替换为您的项目 ID。

  5. 安装 ResourceGroup CRD(如果尚未安装):

    kpt live install-resource-group
    
  6. 验证项目命名空间中是否已配置配置连接器且运行状况良好:

    kubectl get ConfigConnectorContext -n PROJECT_NAMESPACE \
        -o "custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,HEALTHY:.status.healthy"
    

    PROJECT_NAMESPACE 替换为您要用于管理项目资源的命名空间(例如 config-control)。

    输出示例:

    NAMESPACE        NAME                                                HEALTHY
    config-control   configconnectorcontext.core.cnrm.cloud.google.com   true
    

配置 GKE 集群

如需使用 GKE 集群蓝图配置 GKE 集群,请运行以下命令。

  1. 使用 kpt 从所需工作目录中提取 GKE 集群蓝图:

    kpt pkg get \
        https://github.com/GoogleCloudPlatform/blueprints.git/catalog/gke@v0.3.0 \
        CLUSTER_NAME
    

    CLUSTER_NAME 替换为要用于 GKE 集群的所需名称(例如 hello-cluster)。

  2. 移至集群目录:

    cd ./CLUSTER_NAME/
    
  3. 通过修改 setters.yaml 文件来配置软件包:

    cat > setters.yaml << EOF
    apiVersion: v1
    kind: ConfigMap
    metadata: # kpt-merge: /setters
      name: setters
    data:
      # The cluster name
      cluster-name: CLUSTER_NAME
      # The environment (set as a label on the cluster)
      environment: dev
      # The compute location (region or zone)
      location: us-central1
      # The project in which to manage cluster resources
      platform-project-id: PROJECT_ID
      # The namespace in which to manage cluster resources
      platform-namespace: PROJECT_NAMESPACE
      # The name of the VPC in which to create a dedicated subnet
      network-name: default
      # The project that the VPC is in
      network-project-id: PROJECT_ID
      # The namespace in which to manage network resources
      networking-namespace: PROJECT_NAMESPACE
      # The private IP range for masters to use when peering to the VPC
      master-ip-range: 192.168.0.0/28
      # The private IP range for nodes to use, allocated to the dedicated subnet
      node-ip-range: 10.4.0.0/22
      # The private IP range for pods to use, allocated to the dedicated subnet
      pod-ip-range: 10.5.0.0/16
      # The private IP range for services to use, allocated to the dedicated subnet
      service-ip-range: 10.6.0.0/16
      # The namespace in which to manage service enablement resources
      projects-namespace: PROJECT_NAMESPACE
      # The group in which to manage the list of groups that can be used for RBAC.
      # Must be named exactly 'gke-security-groups'.
      security-group: gke-security-groups@YOUR_DOMAIN
    EOF
    

    请替换以下内容:

    • PROJECT_ID:您的项目的 ID。

      在本教程中,集群和网络部署到同一项目。

    • PROJECT_NAMESPACE:用于管理项目资源的命名空间(例如 config-control)。

      在本教程中,集群、网络和服务的启用在同一命名空间中进行管理。

    • YOUR_DOMAIN:您的群组使用的网域(例如 example.com)。

    其他所有数据字段可根据需要进行重新配置。

    提供的默认值应该在其他具有默认网络的空项目中有效。

  4. 将 setter 值呈现为模板化资源:

    kpt fn render
    

    输出示例:

    Package "example/cluster":
    [RUNNING] "gcr.io/kpt-fn/apply-setters:v0.1"
    [PASS] "gcr.io/kpt-fn/apply-setters:v0.1"
      Results:
        [INFO] set field value to "example-us-west4" in file "cluster/cluster.yaml" in field "metadata.name"
        [INFO] set field value to "config-control" in file "cluster/cluster.yaml" in field "metadata.namespace"
        [INFO] set field value to "dev" in file "cluster/cluster.yaml" in field "metadata.labels.gke.io/environment"
        [INFO] set field value to "platform-project-id" in file "cluster/cluster.yaml" in field "metadata.annotations.cnrm.cloud.google.com/project-id"
        ...(10 line(s) truncated, use '--truncate-output=false' to disable)
    
    Package "example/nodepools/primary":
    [RUNNING] "gcr.io/kpt-fn/apply-setters:v0.1"
    [PASS] "gcr.io/kpt-fn/apply-setters:v0.1"
      Results:
        [INFO] set field value to "gke-example-us-east4-primary" in file "nodepools/primary/node-iam.yaml" in field "metadata.name"
        [INFO] set field value to "config-control" in file "nodepools/primary/node-iam.yaml" in field "metadata.namespace"
        [INFO] set field value to "platform-project-id" in file "nodepools/primary/node-iam.yaml" in field "metadata.annotations.cnrm.cloud.google.com/project-id"
        [INFO] set field value to "gke-example-us-east4-primary" in file "nodepools/primary/node-iam.yaml" in field "spec.displayName"
        ...(23 line(s) truncated, use '--truncate-output=false' to disable)
    
    Package "example/subnet":
    [RUNNING] "gcr.io/kpt-fn/apply-setters:v0.1"
    [PASS] "gcr.io/kpt-fn/apply-setters:v0.1"
      Results:
        [INFO] set field value to "platform-project-id-example-us-west4" in file "subnet/subnet.yaml" in field "metadata.name"
        [INFO] set field value to "networking" in file "subnet/subnet.yaml" in field "metadata.namespace"
        [INFO] set field value to "network-project-id" in file "subnet/subnet.yaml" in field "metadata.annotations.cnrm.cloud.google.com/project-id"
        [INFO] set field value to "platform-project-id-example-us-west4" in file "subnet/subnet.yaml" in field "spec.description"
        ...(5 line(s) truncated, use '--truncate-output=false' to disable)
    
    Package "example":
    [RUNNING] "gcr.io/kpt-fn/apply-setters:v0.1"
    [PASS] "gcr.io/kpt-fn/apply-setters:v0.1"
      Results:
        [INFO] set field value to "example" in file "cluster/cluster.yaml" in field "metadata.name"
        [INFO] set field value to "config-control" in file "cluster/cluster.yaml" in field "metadata.namespace"
        [INFO] set field value to "dev" in file "cluster/cluster.yaml" in field "metadata.labels.gke.io/environment"
        [INFO] set field value to "example-project-1234" in file "cluster/cluster.yaml" in field "metadata.annotations.cnrm.cloud.google.com/project-id"
        ...(44 line(s) truncated, use '--truncate-output=false' to disable)
    
    Successfully executed 4 function(s) in 4 package(s).
    

应用配置更改

上述步骤中的本地更改在应用之前不会影响云。

如需应用配置更改,请运行以下命令。

  1. 使用 kpt 初始化工作目录,此操作会创建一个资源来跟踪更改:

    kpt live init --namespace PROJECT_NAMESPACE
    

    PROJECT_NAMESPACE 替换为用于管理项目资源的命名空间(例如 config-control)。

  2. 预览将创建的资源:

    kpt live apply --dry-run
    

    所有资源都应显示“created (dry-run)”。

    输出示例:

    service.serviceusage.cnrm.cloud.google.com/example-project-1234-example-container created (dry-run)
    computesubnetwork.compute.cnrm.cloud.google.com/example-project-1234-example created (dry-run)
    containercluster.container.cnrm.cloud.google.com/example created (dry-run)
    containernodepool.container.cnrm.cloud.google.com/example-primary created (dry-run)
    iampolicymember.iam.cnrm.cloud.google.com/artifactreader-gke-example-primary created (dry-run)
    iampolicymember.iam.cnrm.cloud.google.com/logwriter-gke-example-primary created (dry-run)
    iampolicymember.iam.cnrm.cloud.google.com/metricwriter-gke-example-primary created (dry-run)
    iamserviceaccount.iam.cnrm.cloud.google.com/gke-example-primary created (dry-run)
    8 resource(s) applied. 8 created, 0 unchanged, 0 configured, 0 failed (dry-run)
    0 resource(s) pruned, 0 skipped, 0 failed (dry-run)
    
  3. 使用 kpt 应用资源:

    kpt live apply
    

    所有资源都应显示“已创建”。

    输出示例:

    service.serviceusage.cnrm.cloud.google.com/example-project-1234-example-container created
    computesubnetwork.compute.cnrm.cloud.google.com/example-project-1234-example created
    containercluster.container.cnrm.cloud.google.com/example created
    containernodepool.container.cnrm.cloud.google.com/example-primary created
    iampolicymember.iam.cnrm.cloud.google.com/artifactreader-gke-example-primary created
    iampolicymember.iam.cnrm.cloud.google.com/logwriter-gke-example-primary created
    iampolicymember.iam.cnrm.cloud.google.com/metricwriter-gke-example-primary created
    iamserviceaccount.iam.cnrm.cloud.google.com/gke-example-primary created
    8 resource(s) applied. 8 created, 0 unchanged, 0 configured, 0 failed
    0 resource(s) pruned, 0 skipped, 0 failed
    

验证成功时的行为

如需验证更改是否已应用并且其指定的资源是否已预配,请运行以下命令。

  1. 等待资源准备就绪:

    kpt live status --output table --poll-until current
    

    该命令将进行轮询,直到所有资源的状态为 Current,条件为 Ready

    如果需要,请使用 ctrl-c 来中断操作。

    输出示例:

    NAMESPACE   RESOURCE                                  STATUS      CONDITIONS      AGE     MESSAGE
    config-con  ComputeSubnetwork/example-project-1234-e  Current     Ready           41m     Resource is Ready
    config-con  ContainerCluster/example                  Current     Ready           41m     Resource is Ready
    config-con  ContainerNodePool/example-primary         Current     Ready           41m     Resource is Ready
    config-con  IAMPolicyMember/artifactreader-gke-examp  Current     Ready           41m     Resource is Ready
    config-con  IAMPolicyMember/logwriter-gke-example-pr  Current     Ready           41m     Resource is Ready
    config-con  IAMPolicyMember/metricwriter-gke-example  Current     Ready           41m     Resource is Ready
    config-con  IAMServiceAccount/gke-example-primary     Current     Ready           41m     Resource is Ready
    config-con  Service/example-project-1234-example-con  Current     Ready           41m     Resource is Ready
    
  2. 如果出现错误,使用默认事件输出来查看完整的错误消息:

    kpt live status
    

常见问题解答

清理

如果您决定停止使用配置控制器,则应先清理使用配置控制器创建的所有资源,然后删除配置控制器本身。

  1. 使用 kpt 从工作目录中删除资源:

    kpt live destroy
    
  2. 等待所有资源删除完毕:

    until [ -z "$(kubectl get -R -f . --ignore-not-found | tee /dev/fd/2)" ]; \
    do sleep 1; done
    

    该命令将进行轮询,直到所有资源的状态为 Deleted

    如果需要,请使用 ctrl-c 来中断操作。

后续步骤