迁移集群以使用节点代理

本文档介绍了如何为新集群和现有集群启用节点代理,以提供更安全的集群操作。从 1.33 版开始,Google Distributed Cloud for Bare Metal 能够从使用 Ansible over SSH 进行集群操作过渡到使用 Node Agent 的更安全的基于代理的模型。使用节点代理管理集群操作可解决在敏感环境中需要对客户节点进行 SSH 访问的安全问题。 在新模型中,Node Agent 二进制文件在每个节点上运行。节点代理通过安全的 gRPC 通道与控制器等客户端通信,以管理所有节点配置活动。Google Distributed Cloud 会在集群控制器与节点代理之间以及 bmctl 与节点代理之间强制执行双向传输层安全协议 (mTLS),以对 gRPC 连接进行身份验证和加密。

bmctl nodeagent 命令可让迁移现有集群以使用节点代理的过程变得简单可靠。这些命令可减少手动操作,提高各节点之间的一致性,并自动执行证书创建和轮换等关键任务。bmctl 命令主要通过 SSH 运行。这样,即使集群控制器运行不正常或其标准通信渠道受损,管理员也可以部署或重新部署代理。

节点代理和相应的 bmctl nodeagent 命令支持 Google Distributed Cloud for Bare Metal 版本 1.33.0 及更高版本。您可以为现有 1.33 版或更高版本的集群启用节点代理,也可以在创建 1.33 版或更高版本的集群时启用节点代理。

本页面适用于管理底层技术基础设施生命周期的管理员、架构师和运维人员。如需详细了解我们在 Google Cloud 内容中提及的常见角色和示例任务,请参阅常见的 GKE Enterprise 用户角色和任务

准备工作

在将集群迁移到节点代理模式之前,请确保所有集群节点都满足以下要求:

  • 每个节点都有一个专门用于节点代理的开放端口。默认情况下,Node Agent 使用端口 9192,但您可以在部署、启用或安装新集群期间配置此端口。如需了解详情,请参阅自定义节点代理端口

  • 每个节点都安装了 containerd 版本 1.7 或更高版本。

迁移到节点代理模式

迁移到节点代理模式的过程分为两个步骤:

  1. 部署节点代理:将节点代理组件部署到集群中的所有节点。

  2. 启用节点代理模式

    • 对于现有集群,请使用 bmctl nodeagent 命令启用该模式。
    • 对于新集群,请在创建之前将启用注解和相应的凭据路径添加到集群配置文件。

部署节点代理

bmctl nodeagent deploy 命令使用 SSH 将节点代理服务部署到指定集群中的一个或多个目标节点。此命令会安装或重新安装 Node Agent。它通过 SSH 连接并执行必要的步骤,包括转移二进制文件、选择性地生成和转移证书,以及设置 systemd 服务。它需要对目标节点具有 SSH 访问权限和 sudo 权限。

您可以通过多种方式指定目标节点:直接通过 --nodes 标志指定;通过使用 --cluster 标志的集群配置文件指定;或通过引用集群自定义资源指定。如需详细了解节点代理命令和选项,请参阅 bmctl 命令参考文档

在全新环境中部署

对于初始部署,请下载 nodeagentd 二进制文件并生成新的证书授权机构 (CA)。以下命令从集群配置文件中检索节点列表。--sa-key 标志可提供从 Cloud Storage 存储桶下载 nodeagentd 二进制文件所需的凭据。

  • 如需在新集群上首次部署节点代理,请使用以下命令:

    bmctl nodeagent deploy \
        --pull-binaries true \
        --generate-ca-creds true \
        --cluster CLUSTER_NAME \
        --ssh-user USERNAME \
        --ssh-key SSH_KEY_PATH \
        --sa-key SERVICE_ACCOUNT_KEY_PATH
    

    替换以下内容:

    • CLUSTER_NAME:您要在其节点上部署 Node Agent 的集群的名称。

    • USERNAME:已配置对节点的 SSH 访问权限的用户名。默认情况下,SSH 配置为使用 root,但如果您设置了登录用户,则使用该用户名。

    • SSH_KEY_PATH:SSH 私钥文件的路径。

    • SERVICE_ACCOUNT_KEY_PATH:具有拉取注册表映像权限的服务账号密钥文件的路径。默认情况下,这是 anthos-baremetal-gcr 服务账号的 JSON 密钥文件。

    命令输出类似于以下示例:

    Please check the logs at bmctl-workspace/demo-cluster/log/nodeagent_deploy-20250819-175703/nodeagent_deploy.log
    [2025-08-19 17:57:03+0000] INFO: Executing 'nodeagent deploy'...
    [2025-08-19 17:57:05+0000] -------------------- Deployment Plan --------------------
    [2025-08-19 17:57:05+0000]   Target Cluster:            demo-cluster
    [2025-08-19 17:57:05+0000]   SSH User:                  root
    [2025-08-19 17:57:05+0000]   SSH Key:                   rootSSH
    [2025-08-19 17:57:05+0000]   Concurrency:               25
    [2025-08-19 17:57:05+0000]   Generate Credentials:      true
    [2025-08-19 17:57:05+0000]   Deploy Credentials:        true
    [2025-08-19 17:57:05+0000]   Server Cert Validity Days: 1825
    [2025-08-19 17:57:05+0000]   Verify SSH Host Keys:      true
    [2025-08-19 17:57:05+0000]   Node Agent pull version:   1.33.0-gke.799
    [2025-08-19 17:57:05+0000]   Target Nodes Source:       cluster YAML
    [2025-08-19 17:57:05+0000]   Nodes Port:                9192
    [2025-08-19 17:57:05+0000]   Target Nodes (4):          10.200.0.2, 10.200.0.3, 10.200.0.4, 10.200.0.5
    [2025-08-19 17:57:05+0000] ---------------------------------------------------------
    Proceed with deployment? [y/N]: y
    [2025-08-19 17:57:07+0000] INFO: User confirmed.
    [2025-08-19 17:57:07+0000] Downloading Node Agent binary (1.33.0-gke.799)... OK
    [2025-08-19 17:57:08+0000] INFO: Node Agent binary pulled and stored at bmctl-workspace/bins/nodeagentd
    [2025-08-19 17:57:08+0000] INFO: Starting generate credentials (CAs and client credentials) phase...
    [2025-08-19 17:57:08+0000] Generating credentials for the cluster: demo-cluster, 2025-08-19T17:57:08Z
    [2025-08-19 17:57:08+0000] ------------ Credentials Options ------------
    [2025-08-19 17:57:08+0000] Cluster Name:           demo-cluster
    [2025-08-19 17:57:08+0000] Key Algorithm:          rsa
    [2025-08-19 17:57:08+0000] Key Length:             4096
    [2025-08-19 17:57:08+0000] CA Validity (days):     3650
    [2025-08-19 17:57:08+0000] Client Validity (days): 1825
    [2025-08-19 17:57:08+0000] Server CA CN:           Node Agent Server CA
    [2025-08-19 17:57:08+0000] Client CA CN:           Node Agent Client CA
    [2025-08-19 17:57:08+0000] Creds path:             bmctl-workspace/demo-cluster/nodeagent-creds
    [2025-08-19 17:57:08+0000] --------------------------------------------
    [2025-08-19 17:57:08+0000] Generating credentials... OK
    [2025-08-19 17:57:19+0000] Certificates have been created and stored in bmctl-workspace/demo-cluster/nodeagent-creds
    [2025-08-19 17:57:19+0000] INFO: Attempting to load CAs from: bmctl-workspace/demo-cluster/nodeagent-creds
    [2025-08-19 17:57:19+0000] INFO: Server CA loaded successfully. Subject: CN=Node Agent Server CA,O=GCD-SO,L=Sunnyvale,ST=California,C=US, Key Type: *rsa.PrivateKey
    [2025-08-19 17:57:19+0000] INFO: Client CA loaded successfully. Subject: CN=Node Agent Client CA,O=GCD-SO,L=Sunnyvale,ST=California,C=US, Key Type: *rsa.PrivateKey
    [2025-08-19 17:57:19+0000] ===============================================
    [2025-08-19 17:57:19+0000] --- Starting Artifact Preparation ---
    [2025-08-19 17:57:19+0000] Starting artifact preparation for 4 nodes (concurrency: 25)...
    [2025-08-19 17:57:23+0000] --- Finished Artifact Preparation ---
    [2025-08-19 17:57:23+0000] INFO: Preparation SUCCEEDED for node 10.200.0.2
    [2025-08-19 17:57:23+0000] INFO: Preparation SUCCEEDED for node 10.200.0.3
    [2025-08-19 17:57:23+0000] INFO: Preparation SUCCEEDED for node 10.200.0.4
    [2025-08-19 17:57:23+0000] INFO: Preparation SUCCEEDED for node 10.200.0.5
    [2025-08-19 17:57:23+0000] ===============================================
    [2025-08-19 17:57:23+0000] --- Starting Deployment Phase ---
    [2025-08-19 17:57:23+0000] INFO: Starting deployment to 4 nodes (Concurrency: 25)...
    [2025-08-19 17:57:36+0000] INFO: All host deployments finished.
    [2025-08-19 17:57:36+0000] INFO: --- Deployment Phase Completed Successfully ---
    [2025-08-19 17:57:36+0000]
    ===============================================
    --- Deployment Summary ---
      Host: 10.200.0.2, Status: SUCCESS
      Host: 10.200.0.3, Status: SUCCESS
      Host: 10.200.0.4, Status: SUCCESS
      Host: 10.200.0.5, Status: SUCCESS
    -----------------------------------------------
    Total Nodes Attempted: 4 | SUCCESS: 4 | FAILED: 0
    ===============================================
    

升级 Node Agent 版本

节点代理升级与集群升级无关。如需升级节点代理,请使用 bmctl nodeagent deploy 命令并将 --pull-binaries 设置为 true。升级节点代理时,请将 --generate-ca-creds 设置为 false,以使用现有 CA,而不是重新生成它们。重新生成 CA 需要更新相应的集群凭据,此过程仅用于凭据轮替。输出结果类似于全新部署,但没有 CA 生成的日志。

升级节点代理会重启节点代理进程,这可能会中断任何正在运行的作业。虽然大多数作业都可以通过重试机制恢复,但为了尽可能减少潜在的中断,请按以下步骤操作:

  1. 确保没有正在进行的集群升级或其他安装后配置活动。

  2. 验证集群是否处于运行状态。

  3. 启动节点代理升级:

    bmctl nodeagent deploy \
        --pull-binaries true \
        --generate-ca-creds false \
        --cluster CLUSTER_NAME \
        --ssh-user USERNAME \
        --ssh-key SSH_KEY_PATH \
        --sa-key SERVICE_ACCOUNT_KEY_PATH
    

    替换以下内容:

    • CLUSTER_NAME:您要在其节点上部署 Node Agent 的集群的名称。

    • USERNAME:已配置对节点的 SSH 访问权限的用户名。默认情况下,SSH 配置为使用 root,但如果您设置了登录用户,则使用该用户名。

    • SSH_KEY_PATH:SSH 私钥文件的路径。

    • SERVICE_ACCOUNT_KEY_PATH:具有拉取注册表映像权限的服务账号密钥文件的路径。默认情况下,这是 anthos-baremetal-gcr 服务账号的 JSON 密钥文件。

    命令输出类似于以下示例:

    Please check the logs at bmctl-workspace/demo-cluster/log/nodeagent_deploy-20250819-180416/nodeagent_deploy.log
    [2025-08-19 18:04:16+0000] INFO: Executing 'nodeagent deploy'...
    [2025-08-19 18:04:18+0000] -------------------- Deployment Plan --------------------
    [2025-08-19 18:04:18+0000]   Target Cluster:            demo-cluster
    [2025-08-19 18:04:18+0000]   SSH User:                  root
    [2025-08-19 18:04:18+0000]   SSH Key:                   rootSSH
    [2025-08-19 18:04:18+0000]   Concurrency:               25
    [2025-08-19 18:04:18+0000]   Generate Credentials:      false
    [2025-08-19 18:04:18+0000]   Deploy Credentials:        true
    [2025-08-19 18:04:18+0000]   Server Cert Validity Days: 1825
    [2025-08-19 18:04:18+0000]   Verify SSH Host Keys:      true
    [2025-08-19 18:04:18+0000]   Node Agent pull version:   1.33.0-gke.799
    [2025-08-19 18:04:18+0000]   Target Nodes Source:       cluster YAML
    [2025-08-19 18:04:18+0000]   Nodes Port:                9192
    [2025-08-19 18:04:18+0000]   Target Nodes (4):          10.200.0.2, 10.200.0.3, 10.200.0.4, 10.200.0.5
    [2025-08-19 18:04:18+0000] ---------------------------------------------------------
    Proceed with deployment? [y/N]: y
    [2025-08-19 18:04:20+0000] INFO: User confirmed.
    [2025-08-19 18:04:20+0000] Downloading Node Agent binary (1.33.0-gke.799)... OK
    [2025-08-19 18:04:22+0000] INFO: Node Agent binary pulled and stored at bmctl-workspace/bins/nodeagentd
    [2025-08-19 18:04:22+0000] INFO: Attempting to load CAs from: bmctl-workspace/demo-cluster/nodeagent-creds
    [2025-08-19 18:04:22+0000] INFO: Server CA loaded successfully. Subject: CN=Node Agent Server CA,O=gcd-SO,L=Sunnyvale,ST=California,C=US, Key Type: *rsa.PrivateKey
    [2025-08-19 18:04:22+0000] INFO: Client CA loaded successfully. Subject: CN=Node Agent Client CA,O=gcd-SO,L=Sunnyvale,ST=California,C=US, Key Type: *rsa.PrivateKey
    [2025-08-19 18:04:22+0000] ===============================================
    [2025-08-19 18:04:22+0000] --- Starting Artifact Preparation ---
    [2025-08-19 18:04:22+0000] Starting artifact preparation for 4 nodes (concurrency: 25)...
    

部署或重新部署到特定节点

如果您添加或恢复集群节点,则可以指定要部署节点代理的特定节点,而不是将节点代理部署到集群中的所有节点。您可以使用 --nodes 标志指定部署节点。

  • 如需将 Node Agent 部署到特定节点,请使用以下命令:

    bmctl nodeagent deploy \
        --pull-binaries true \
        --cluster CLUSTER_NAME \
        --ssh-user USERNAME \
        --ssh-key SSH_KEY_PATH \
        --sa-key SERVICE_ACCOUNT_KEY_PATH \
        --nodes NODE_IP_ADDRESS_LIST
    

    替换以下内容:

    • CLUSTER_NAME:您要在其节点上部署 Node Agent 的集群的名称。

    • USERNAME:已配置对节点的 SSH 访问权限的用户名。默认情况下,SSH 配置为使用 root,但如果您设置了登录用户,则使用该用户名。

    • SSH_KEY_PATH:SSH 私钥文件的路径。

    • SERVICE_ACCOUNT_KEY_PATH:具有拉取注册表映像权限的服务账号密钥文件的路径。默认情况下,这是 anthos-baremetal-gcr 服务账号的 JSON 密钥文件。

    • NODE_IP_ADDRESS_LIST:要部署 Node Agent 的节点的 IP 地址的英文逗号分隔列表。

    命令输出类似于以下示例:

    Please check the logs at bmctl-workspace/demo-cluster/log/nodeagent_deploy-20250819-181751/nodeagent_deploy.log
    [2025-08-19 18:17:51+0000] INFO: Executing 'nodeagent deploy'...
    [2025-08-19 18:17:54+0000] -------------------- Deployment Plan --------------------
    [2025-08-19 18:17:54+0000]   Target Cluster:            demo-cluster
    [2025-08-19 18:17:54+0000]   SSH User:                  user
    [2025-08-19 18:17:54+0000]   SSH Key:                   SSH_KEY_PATH
    [2025-08-19 18:17:54+0000]   Concurrency:               25
    [2025-08-19 18:17:54+0000]   Generate Credentials:      false
    [2025-08-19 18:17:54+0000]   Deploy Credentials:        true
    [2025-08-19 18:17:54+0000]   Server Cert Validity Days: 1825
    [2025-08-19 18:17:54+0000]   Verify SSH Host Keys:      true
    [2025-08-19 18:17:54+0000]   Node Agent pull version:   1.33.0-gke.799
    [2025-08-19 18:17:54+0000]   Target Nodes Source:       nodes flag
    [2025-08-19 18:17:54+0000]   Nodes Port:                9192
    [2025-08-19 18:17:54+0000]   Target Nodes (3):          10.200.0.2, 10.200.0.3
    [2025-08-19 18:17:54+0000] ---------------------------------------------------------
    Proceed with deployment? [y/N]:
    

如需查看 bmctl nodeagent deploy 命令选项的完整列表,请参阅 bmctl 命令参考文档中的 nodeagent deploy

启用节点代理

在将节点代理部署到集群中的所有节点后,enable 命令会在现有运行中的集群内激活节点代理模式。此命令还会创建或更新集群中的节点代理凭据。

为现有正在运行的集群启用节点代理

您可以在现有 1.33 版及更高版本的集群上启用节点代理。

  • 如需在现有集群上启用节点代理,请使用以下命令:

    ./bmctl nodeagent enable \
        --kubeconfig KUBECONFIG \
        --cluster CLUSTER_NAME \
        --ensure-status=true
    

    替换以下内容:

    • KUBECONFIG:您要为哪个集群启用节点代理,该集群的 kubeconfig 文件的路径。

    • CLUSTER_NAME:您要在其节点上部署 Node Agent 的集群的名称。

    命令输出类似于以下示例:

    Please check the logs at bmctl-workspace/demo-cluster/log/nodeagent_enable-20250819-183058/nodeagent_enable.log
    [2025-08-19 18:30:58+0000] Enable Node Agent for cluster: demo-cluster
    [2025-08-19 18:31:00+0000] Update Node Agent credentials
    [2025-08-19 18:31:00+0000] ----------------------------------------------------------
    [2025-08-19 18:31:00+0000] Server CA certificate path: bmctl-workspace/demo-cluster/nodeagent-creds/server_ca_cert.pem
    [2025-08-19 18:31:00+0000] Server CA private key path: bmctl-workspace/demo-cluster/nodeagent-creds/server_ca_key.pem
    [2025-08-19 18:31:00+0000] Client CA certificate path: bmctl-workspace/demo-cluster/nodeagent-creds/client_ca_cert.pem
    [2025-08-19 18:31:00+0000] Client CA private key path: bmctl-workspace/demo-cluster/nodeagent-creds/client_ca_key.pem
    [2025-08-19 18:31:00+0000] Client certificate path: bmctl-workspace/demo-cluster/nodeagent-creds/client_cert.pem
    [2025-08-19 18:31:00+0000] Client private key path: bmctl-workspace/demo-cluster/nodeagent-creds/client_key.pem
    [2025-08-19 18:31:00+0000] ----------------------------------------------------------
    [2025-08-19 18:31:00+0000] Node Agent client credentials secret has been created/updated
    [2025-08-19 18:31:00+0000] Node Agent server CA secret has been created/updated
    [2025-08-19 18:31:00+0000] Node Agent client CA secret has been created/updated
    [2025-08-19 18:31:00+0000] Successfully created/updated Node Agent credentials secrets in namespace cluster-demo-cluster
    [2025-08-19 18:31:00+0000] Annotation 'baremetal.cluster.gke.io/node-agent-port' not found on cluster cluster-demo-cluster/demo-cluster, no removal needed.
    [2025-08-19 18:31:00+0000] Successfully enable Node Agent for cluster: demo-cluster
    [2025-08-19 18:31:00+0000] ----------------------------------------------------------
    [2025-08-19 18:31:00+0000] Verifying Node Agent status on all nodes...
    [2025-08-19 18:31:00+0000] --------------------- Total nodes: 3 ----------------------
    [2025-08-19 18:31:00+0000] node: control-0--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1577
    [2025-08-19 18:31:00+0000] node: control-1--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1578
    [2025-08-19 18:31:00+0000] node: control-2--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1581
    [2025-08-19 18:31:00+0000] ----------------------------------------------------------
    [2025-08-19 18:31:00+0000] Verified Node Agent status on all nodes in cluster
    

新集群安装

您可以在创建 1.33 及更高版本的集群时启用节点代理。

如需为新集群启用节点代理,请按以下步骤操作:

  1. 对于新的管理员集群,请将以下凭据文件路径添加到管理员集群配置文件的顶部部分:

    nodeAgentServerCACertificatePath: bmctl-workspace/demo-cluster/nodeagent-creds/server_ca_cert.pem
    nodeAgentServerCAPrivateKeyPath: bmctl-workspace/demo-cluster/nodeagent-creds/server_ca_key.pem
    nodeAgentClientCACertificatePath: bmctl-workspace/demo-cluster/nodeagent-creds/client_ca_cert.pem
    nodeAgentClientCAPrivateKeyPath: bmctl-workspace/demo-cluster/nodeagent-creds/client_ca_key.pem
    nodeAgentClientCertificatePath: bmctl-workspace/demo-cluster/nodeagent-creds/client_cert.pem
    nodeAgentClientPrivateKeyPath: bmctl-workspace/demo-cluster/nodeagent-creds/client_key.pem
    
  2. 在集群配置文件的集群元数据部分中添加节点代理启用注解:

    kind: Cluster
    metadata:
      annotations:
        baremetal.cluster.gke.io/enable-node-agent: ""
    
  3. 按照标准说明创建集群。

如需查看 bmctl nodeagent enable 命令选项的完整列表,请参阅 bmctl 命令参考文档中的 nodeagent enable

轮替凭据

rotate-credentials 命令会轮换节点和集群内的节点代理凭据。这包括轮替证书授权机构 (CA) 的功能。--generate-ca-creds 标志指示命令重新生成 CA,并使用这些新生成的 CA 为服务器(节点)和客户端(控制器)签署证书。

  • 如需轮换凭据并重新生成和使用新的 CA,请使用以下命令:

    bmctl nodeagent rotate-credentials \
        --kubeconfig KUBECONFIG \
        --generate-ca-creds true \
        --cluster CLUSTER_NAME \
        --ssh-user USERNAME \
        --ssh-key SSH_KEY_PATH
    

    替换以下内容:

    • KUBECONFIG:您要为哪个集群启用节点代理,该集群的 kubeconfig 文件的路径。

    • CLUSTER_NAME:您要在其节点上部署 Node Agent 的集群的名称。

    • USERNAME:已配置对节点的 SSH 访问权限的用户名。默认情况下,SSH 配置为使用 root,但如果您设置了登录用户,则使用该用户名。

    • SSH_KEY_PATH:SSH 私钥文件的路径。

    命令输出类似于以下示例:

    Please check the logs at bmctl-workspace/demo-cluster/log/nodeagent_rotate_credentials-20250819-184216/nodeagent_rotate_credentials.log
    [2025-08-19 18:42:16+0000] INFO: Executing 'nodeagent rotate-credentials'...
    [2025-08-19 18:42:18+0000] ------------------- Credentials Rotation  -------------------
    [2025-08-19 18:42:18+0000]   Target Cluster:            demo-cluster
    [2025-08-19 18:42:18+0000]   SSH User:                  root
    [2025-08-19 18:42:18+0000]   SSH Key:                   rootSSH
    [2025-08-19 18:42:18+0000]   Concurrency:               25
    [2025-08-19 18:42:18+0000]   Generate Credentials:      true
    [2025-08-19 18:42:18+0000]   Deploy Credentials:        true
    [2025-08-19 18:42:18+0000]   Server Cert Validity Days: 1825
    [2025-08-19 18:42:18+0000]   Verify SSH Host Keys:      true
    [2025-08-19 18:42:18+0000]   Target Nodes Source:       cluster CR
    [2025-08-19 18:42:18+0000]   Nodes Port:                9192
    [2025-08-19 18:42:18+0000]   Target Nodes (3):          10.200.0.2, 10.200.0.3, 10.200.0.4
    [2025-08-19 18:42:18+0000] ---------------------------------------------------------
    Proceed with credentials rotation? [y/N]: [2025-08-19 18:42:18+0000] INFO: Non-interactive mode enabled; automatically confirming.
    [2025-08-19 18:42:18+0000] INFO: Starting generate credentials (CAs and client credentials) phase...
    [2025-08-19 18:42:18+0000] Generating credentials for the cluster: demo-cluster, 2025-08-19T18:42:18Z
    [2025-08-19 18:42:18+0000] ------------ Credentials Options ------------
    [2025-08-19 18:42:18+0000] Cluster Name:           demo-cluster
    [2025-08-19 18:42:18+0000] Key Algorithm:          rsa
    [2025-08-19 18:42:18+0000] Key Length:             4096
    [2025-08-19 18:42:18+0000] CA Validity (days):     3650
    [2025-08-19 18:42:18+0000] Client Validity (days): 1825
    [2025-08-19 18:42:18+0000] Server CA CN:           Node Agent Server CA
    [2025-08-19 18:42:18+0000] Client CA CN:           Node Agent Client CA
    [2025-08-19 18:42:18+0000] Creds path:             bmctl-workspace/demo-cluster/nodeagent-creds
    [2025-08-19 18:42:18+0000] --------------------------------------------
    [2025-08-19 18:42:18+0000] Generating credentials... OK
    Credential directory 'bmctl-workspace/demo-cluster/nodeagent-creds' already exists. Do you want to back it up and continue? (y/N): y
    [2025-08-19 18:42:27+0000] INFO: User confirmed.
    [2025-08-19 18:42:27+0000] Credentials backup to bmctl-workspace/demo-cluster/nodeagent-creds_backup_20250819_184227
    [2025-08-19 18:42:27+0000] Certificates have been created and stored in bmctl-workspace/demo-cluster/nodeagent-creds
    [2025-08-19 18:42:27+0000] INFO: Attempting to load CAs from: bmctl-workspace/demo-cluster/nodeagent-creds
    [2025-08-19 18:42:27+0000] INFO: Server CA loaded successfully. Subject: CN=Node Agent Server CA,O=gcd-SO,L=Sunnyvale,ST=California,C=US, Key Type: *rsa.PrivateKey
    [2025-08-19 18:42:27+0000] INFO: Client CA loaded successfully. Subject: CN=Node Agent Client CA,O=gcd-SO,L=Sunnyvale,ST=California,C=US, Key Type: *rsa.PrivateKey
    [2025-08-19 18:42:27+0000] ===============================================
    [2025-08-19 18:42:34+0000] INFO: All host deployments finished.
    [2025-08-19 18:42:34+0000] INFO: --- Deployment Phase Completed Successfully ---
    [2025-08-19 18:42:34+0000]
    ===============================================
    --- Deployment Summary ---
      Host: 10.200.0.2, Status: SUCCESS
      Host: 10.200.0.3, Status: SUCCESS
      Host: 10.200.0.4, Status: SUCCESS
    -----------------------------------------------
    Total Nodes Attempted: 3 | SUCCESS: 3 | FAILED: 0
    ===============================================
    

如需查看 bmctl nodeagent rotate-credentials 命令选项的完整列表,请参阅 bmctl 命令参考文档中的 nodeagent rotate-credentials

检查状态

status 命令可提供有关节点上 Node Agent 运行状态的信息。您可以通过 --nodes 标志直接指定目标节点,也可以通过使用 --cluster 标志的集群配置文件指定目标节点,还可以通过引用集群的自定义资源来指定目标节点。

当您从集群配置文件或 --nodes 标志获取节点时,系统会从本地文件系统检索凭据。如果节点源是集群自定义资源,系统会从集群中检索凭据。

以下优先顺序决定了节点代理端口:

  1. --port 标志
  2. Kubeconfig 文件
  3. 集群配置文件

验证节点代理状态

仅使用 --cluster 标志,您就可以根据集群配置文件中的指定内容检查节点代理状态。

  • 如需根据集群配置文件检查 Node Agent 状态,请使用以下命令:

    ./bmctl nodeagent status \
        --cluster CLUSTER_NAME
    

    CLUSTER_NAME 替换为您要检查的集群的名称。

    命令输出类似于以下示例:

    Please check the logs at bmctl-workspace/demo-cluster/log/nodeagent_status-20250819-205707/nodeagent_status.log
    [2025-08-19 20:57:07+0000] Check Node Agent for cluster: demo-cluster
    [2025-08-19 20:57:09+0000] ----------------------------------------------------------
    [2025-08-19 20:57:09+0000] Verifying Node Agent status on all nodes...
    [2025-08-19 20:57:09+0000] Target Nodes Source: cluster YAML
    [2025-08-19 20:57:09+0000] --------------------- Total nodes: 4 ----------------------
    [2025-08-19 20:57:09+0000] node: control-0--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1175
    [2025-08-19 20:57:09+0000] node: control-1--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1174
    [2025-08-19 20:57:09+0000] node: control-2--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1176
    [2025-08-19 20:57:09+0000] node: worker-0--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1179
    [2025-08-19 20:57:09+0000] ----------------------------------------------------------
    [2025-08-19 20:57:09+0000] Verified Node Agent status on all nodes in cluster
    

从集群验证节点代理状态

--cluster 标志与 --kubeconfig 标志结合使用,您可以根据集群自定义资源检查节点代理状态。

  • 如需根据集群自定义资源检查节点代理状态,请使用以下命令:

    ./bmctl nodeagent status \
        --cluster CLUSTER_NAME \
        --kubeconfig KUBECONFIG
    

    替换以下内容:

    • CLUSTER_NAME:您要在其节点上部署 Node Agent 的集群的名称。

    • KUBECONFIG:您要为哪个集群启用节点代理,该集群的 kubeconfig 文件的路径。

    命令输出类似于以下示例:

    Please check the logs at bmctl-workspace/demo-cluster/log/nodeagent_status-20250819-205712/nodeagent_status.log
    [2025-08-19 20:57:12+0000] Check Node Agent for cluster: demo-cluster
    [2025-08-19 20:57:14+0000] ----------------------------------------------------------
    [2025-08-19 20:57:14+0000] Verifying Node Agent status on all nodes...
    [2025-08-19 20:57:14+0000] Target Nodes Source: cluster CR
    [2025-08-19 20:57:14+0000] --------------------- Total nodes: 3 ----------------------
    [2025-08-19 20:57:14+0000] node: control-0--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1180
    [2025-08-19 20:57:14+0000] node: control-1--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1179
    [2025-08-19 20:57:14+0000] node: control-2--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1180
    [2025-08-19 20:57:14+0000] ----------------------------------------------------------
    [2025-08-19 20:57:14+0000] Verified Node Agent status on all nodes in cluster
    

从节点验证节点代理状态

--cluster 标志与 --nodes 标志搭配使用,您可以检查特定集群节点的节点代理状态。

  • 如需检查特定节点的 Node Agent 状态,请使用以下命令:

    ./bmctl nodeagent status \
        --cluster CLUSTER_NAME \
        --nodes NODE_IP_ADDRESS_LIST
    

    替换以下内容:

    • CLUSTER_NAME:您要在其节点上部署 Node Agent 的集群的名称。

    • NODE_IP_ADDRESS_LIST:要部署 Node Agent 的节点的 IP 地址的英文逗号分隔列表。

    命令输出类似于以下示例:

    Please check the logs at bmctl-workspace/demo-cluster/log/nodeagent_status-20250819-210050/nodeagent_status.log
    [2025-08-19 21:00:50+0000] Check Node Agent for cluster: demo-cluster
    [2025-08-19 21:00:53+0000] ----------------------------------------------------------
    [2025-08-19 21:00:53+0000] Verifying Node Agent status on all nodes...
    [2025-08-19 21:00:53+0000] Target Nodes Source: nodes flag
    [2025-08-19 21:00:53+0000] --------------------- Total nodes: 1 ----------------------
    [2025-08-19 21:00:53+0000] node: control-0--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1399
    [2025-08-19 21:00:53+0000] ----------------------------------------------------------
    [2025-08-19 21:00:53+0000] Verified Node Agent status on all nodes in cluster
    

如需查看 bmctl nodeagent status 命令选项的完整列表,请参阅 bmctl 命令参考文档中的 nodeagent status

SSH 用户权限

非根用户可以执行 bmctl nodeagent 命令。这要求用户拥有完整的无密码 sudo 权限或明确的无密码 sudo 允许列表。

Node Agent 的显式无密码 sudo 允许列表具有以下权限:

# Permission to create the necessary folders and set permissions.
/bin/mkdir -p /etc/nodeagentd
/bin/chmod 0755 /etc/nodeagentd
/bin/mkdir -p /usr/local/bin
/bin/chmod 0755 /usr/local/bin
/bin/mkdir -p /etc/systemd/system
/bin/chmod 0755 /etc/systemd/system

# Permission to place the main application executable and link it.
/bin/rm -f /usr/local/bin/nodeagentd-*
/bin/touch /usr/local/bin/nodeagentd-*
/bin/cp -f /home/deployer/.deploy_tmp_*/* /usr/local/bin/nodeagentd-*
/bin/chmod 0755 /usr/local/bin/nodeagentd-*
/bin/rm -f /usr/local/bin/nodeagentd
/bin/ln -s /usr/local/bin/nodeagentd-* /usr/local/bin/nodeagentd

# Permission to place configuration files in /etc/nodeagentd and set permissions.
/bin/rm -f /etc/nodeagentd/*
/bin/touch /etc/nodeagentd/*
/bin/cp -f /home/deployer/.deploy_tmp_*/* /etc/nodeagentd/*
/bin/chmod 0600 /etc/nodeagentd/*
/bin/chmod 0644 /etc/nodeagentd/*

# Permission to place the systemd unit file.
/bin/rm -f /etc/systemd/system/nodeagentd.service
/bin/touch /etc/systemd/system/nodeagentd.service
/bin/cp -f /home/deployer/.deploy_tmp_*/* /etc/systemd/system/nodeagentd.service
/bin/chmod 0644 /etc/systemd/system/nodeagentd.service

# Permission to interact with systemd service.
/bin/systemctl daemon-reload
/bin/systemctl stop nodeagentd
/bin/systemctl start nodeagentd
/bin/systemctl enable --now nodeagentd

# Permission to remove the temporary files used for the deployment.
/bin/rm -f /home/deployer/.deploy_tmp_*/*

SSH 主机密钥验证

确保所有节点都已添加到管理员工作站上的 known_hosts 文件中。否则,请使用 --enforce-host-key-verify=false 标志在部署期间停用主机密钥验证 (nodeagent deploy) 和凭据轮换 (nodeagent rotate-credentials)。

自定义节点代理端口

节点代理允许自定义端口。在部署期间使用 --port 标志指定此自定义端口。这会将设置传播到每个节点上的 Node Agent 配置。自定义端口必须与客户端配置保持一致,如以下方法中所述。

对于现有集群

如需更新正在运行的现有集群,请使用 --port 标志指定新的自定义端口。此设置会传播到客户端(控制器)。

对于新集群

创建新集群时,请向集群配置添加以下注解,以指定 Node Agent 的自定义端口:

kind: Cluster
metadata:
  annotations:
    baremetal.cluster.gke.io/node-agent-port: "10086"

性能

部署和启用过程会在不到一分钟的时间内完成。凭据轮换运行时长与标准部署相当,甚至更快。