Create an admin cluster

This page shows how to create an admin cluster for Google Distributed Cloud. The admin cluster manages user clusters that run your workloads. If you want to use topology domains, see Create an admin cluster for use with topology domains.

This page is for Admins, Architects, and Operators who set up, monitor, and manage the tech infrastructure. To learn more about common roles and example tasks that we reference in Google Cloud content, see Common GKE Enterprise user roles and tasks.

For more details about the admin cluster, see the installation overview.

Procedure overview

These are the primary steps involved in creating an admin cluster:

  1. Fill in your configuration files.
    Specify the details for your new admin cluster by completing and validating an admin cluster configuration file, a credentials configuration file, and possibly an IP block file.
  2. Import OS images to vSphere, and push container images to the private registry if applicable.
    Run gkectl prepare.
  3. Create an admin cluster.
    Use gkectl to create a new admin cluster as specified in your completed configuration files. When Google Distributed Cloud creates an admin cluster, it deploys a Kubernetes in Docker (kind) cluster to temporarily host the Kubernetes controllers needed to create the admin cluster. This transient cluster is called a bootstrap cluster. User clusters are created and upgraded by their managing admin without the use of a bootstrap cluster.
  4. Verify that your admin cluster is running.
    Use kubectl to view your cluster nodes.

At the end of this procedure, you will have a running admin cluster that you can use to create and manage user clusters.

If you use VPC Service Controls, you might see errors when you run some gkectl commands, such as "Validation Category: GCP - [UNKNOWN] GCP service: [Stackdriver] could not get GCP services". To avoid these errors, add the --skip-validation-gcp parameter to your commands.

Before you begin

  • Make sure you have set up and can sign in to your admin workstation as described in Create an admin workstation. The admin workstation has the tools you need to create your admin cluster. Do all the steps in this document on your admin workstation.

  • Review the IP addresses planning document. Ensure that you have enough IP addresses available for the three control-plane nodes and a control-plane VIP. If you plan to create any kubeception user clusters, then you must have enough IP addresses available for the control-plane nodes of those user clusters.

  • Review the load balancing overview and revisit your decision about the kind of load balancer you want to use. For manual load balancers, you must set up the load balancer before you create your admin cluster.

  • Look ahead at the privateRegistry section, and decide whether you want to use a public or private registry for Google Distributed Cloud components.

  • Look ahead at the osImageType field, and decide what type of operating system you want to run on your admin cluster nodes.

  • If your organization requires outbound traffic to pass through a proxy server, make sure to allowlist required APIs and the Artifact Registry address.

  • In version 1.29 and higher, server-side preflight checks are enabled by default. Server-side preflight checks require additional firewall rules. In Firewall rules for admin clusters, search for "Preflight checks" and make sure all required firewall rules are configured. Server-side preflight checks are run on the bootstrap cluster instead of locally on the admin workstation.

Fill in your configuration file

If you used gkeadm to create your admin workstation, it generated a configuration file named admin-cluster.yaml.

If you didn't use gkeadm to create your admin workstation, then generate admin-cluster.yaml by running this command on your admin workstation:

gkectl create-config admin

This configuration file is for creating your admin cluster.

Familiarize yourself with the configuration file by scanning the admin cluster configuration file document. You might want to keep this document open in a separate tab or window, because you will refer to it as you complete the following steps.

name

If you want to specify a name for your admin cluster, fill in the name field.

bundlePath

The bundle is a zipped file that contains cluster components. It is included with the admin workstation. This field is already filled in for you.

vCenter

The fields in this section are already filled in with values that you entered when you created your admin workstation.

enableAdvancedCluster

If you want to enable the preview advanced cluster feature, set enableAdvancedCluster to true.

Note the following limitations with the advanced cluster preview:

  • You can enable advanced cluster at cluster creation time for new 1.31 clusters only.
  • After advanced cluster is enabled, you won't be able to upgrade the cluster to 1.32. Only enable advanced cluster in a test environment.

network

Fill in the network.controlPlaneIPBlock section and the network.hostConfig section. Also set adminMaster.replicas to 3.

The network.podCIDR and network.serviceCIDR fields have prepopulated values that you can leave unchanged unless they conflict with addresses already being used in your network. Kubernetes uses these ranges to assign IP addresses to Pods and Services in your cluster.

Fill in the rest of the fields in the network section of the configuration file as needed.

loadBalancer

Set aside a VIP for the Kubernetes API server of your admin cluster. Provide your VIP as the value for loadBalancer.vips.controlPlaneVIP

For more information, see VIPs in the admin cluster subnet.

Decide what type of load balancing you want to use. The options are:

  • MetalLB bundled load balancing. Set loadBalancer.kind to "MetalLB".

  • Manual load balancing. Set loadBalancer.kind to "ManualLB", and remove the manualLB section.

For more information about load balancing options, see Overview of load balancing.

antiAffinityGroups

Set antiAffinityGroups.enabled to true or false according to your preference.

Use this field to specify whether you want Google Distributed Cloud to create VMware Distributed Resource Scheduler (DRS) anti-affinity rules for your admin cluster nodes, causing them to be spread across at least three physical hosts in your data center.

adminMaster

If you want to specify CPU and memory for the control-plane nodes of the admin cluster, fill in the cpus and memoryMB fields in the adminMaster section.

Admin clusters must have three control plane nodes. Set the replicas field in the adminMaster section to 3.

proxy

If the network that will have your admin cluster nodes is behind a proxy server, fill in the proxy section.

privateRegistry

Decide where you want to keep container images for the Google Distributed Cloud components. The options are:

  • Artifact Registry

  • Your own private Docker registry.

    If you want to use your own private registry, fill in the privateRegistry section.

componentAccessServiceAccountKeyPath

Google Distributed Cloud uses your component access service account to download cluster components from Artifact Registry. This field holds the path of a JSON key file for your component access service account.

This field is already filled in for you.

gkeConnect

Register your admin cluster to a Google Cloud fleet by filling in the gkeConnect section. If you include the stackdriver and cloudAuditLogging sections in the configuration file, the ID in gkeConnect.projectID must be the same as the ID set in stackdriver.projectID and cloudAuditLogging.projectID. If the project IDs aren't the same, cluster creation fails.

In 1.28 and later, you can optionally specify a region where the Fleet and Connect services run in gkeConnect.location. If you don't include this field, the cluster uses the global instances of these services.

If you include gkeConnect.location, the region that you specify must be the same as the region configured in cloudAuditLogging.clusterLocation, stackdriver.clusterLocation, and gkeOnPremAPI.location. If the regions aren't the same, cluster creation fails.

gkeOnPremAPI

If the GKE On-Prem API is enabled in your Google Cloud project, all clusters in the project are enrolled in the GKE On-Prem API automatically in the region configured in stackdriver.clusterLocation. The gkeOnPremAPI.location region must be the same as the region specified in cloudAuditLogging.clusterLocation, gkeConnect.location, and stackdriver.clusterLocation. If the regions aren't the same, cluster creation fails.

  • If you want to enroll all clusters in the project in the GKE On-Prem API, be sure to do the steps in Before you begin to activate and use the GKE On-Prem API in the project.

  • If you don't want to enroll the cluster in the GKE On-Prem API, include this section and set gkeOnPremAPI.enabled to false. If you don't want to enroll any clusters in the project, disable gkeonprem.googleapis.com (the service name for the GKE On-Prem API) in the project. For instructions, see Disabling services.

stackdriver

If you want to enable Cloud Logging and Cloud Monitoring for your cluster, fill in the stackdriver section.

This section is required by default. That is, if you don't fill in this section, then you must include the --skip-validation-stackdriver flag when you run gkectl create admin.

Note the following requirements:

# advanced-cluster-change #

If you enable advanced cluster, you must specify the same path in cloudAuditLogging.serviceAccountKeyPath and stackdriver.serviceAccountKeyPath.

  • The ID in stackdriver.projectID must be the same as the ID in gkeConnect.projectID and cloudAuditLogging.projectID.

  • The Google Cloud region set in stackdriver.clusterLocation must be the same as the region set in cloudAuditLogging.clusterLocation and gkeConnect.location. Additionally, if gkeOnPremAPI.enabled is true, the same region must be set in gkeOnPremAPI.location.

If the project IDs and regions aren't the same, cluster creation fails.

cloudAuditLogging

If you want to integrate the audit logs from your cluster's Kubernetes API server with Cloud Audit Logs, fill in the cloudAuditLogging section.

Note the following requirements:

# advanced-cluster-change #

If you enable advanced cluster, you must specify the same path in cloudAuditLogging.serviceAccountKeyPath and stackdriver.serviceAccountKeyPath.

  • The ID in cloudAuditLogging.projectID must be the same as the ID in gkeConnect.projectID and stackdriver.projectID.

  • The Google Cloud region set in cloudAuditLogging.clusterLocation must be the same as the region set in stackdriver.clusterLocation and gkeConnect.location (if the field is included in your configuration file). Additionally, if gkeOnPremAPI.enabled is true, the same region must be set in gkeOnPremAPI.location.

If the project IDs and regions aren't the same, cluster creation fails.

clusterBackup

If you want to enable backing up of the admin cluster, set clusterBackup.datastore to the vSphere datastore where you want to save cluster backups.

If you enable advanced cluster, remove this section. Backing up the admin cluster to a vSphere datastore isn't supported.

autoRepair

If you want to enable automatic node repair for your admin cluster, set autoRepair.enabled to true.

secretsEncryption

If you want to enable always-on Secrets encryption, fill in the secretsEncryption section.

If you enable advanced cluster, set secretsEncryption.enabled to false. Always-on Secrets encryption isn't supported.

osImageType

Decide what type of OS image you want to use for the admin cluster nodes, and fill in the osImageType section accordingly.

If you enable advanced cluster, set osImageType to either ubuntu_cgroupv2 or ubuntu_containerd.

Example of filled-in configuration files

Here is an example of a filled-in admin cluster configuration file. The configuration enables some, but not all, of the available features.

vc-01-admin-cluster.yaml

apiVersion: v1
kind: AdminCluster
name: "gke-admin-01"
bundlePath: "/var/lib/gke/bundles/gke-onprem-vsphere-1.28.0-gke.1-full.tgz"
vCenter:
  address: "vc01.example"
  datacenter: "vc-01"
  cluster: "vc01-workloads-1"
  resourcePool: "vc-01-pool-1"
  datastore: "vc01-datastore-1"
  caCertPath: "/usr/local/google/home/me/certs/vc01-cert.pem""
  credentials:
    fileRef:
      path: "credential.yaml"
      entry: "vCenter"
network:
  hostConfig:
    dnsServers:
    - "203.0.113.1"
    - "198.51.100.1"
    ntpServers:
    - "216.239.35.4"
  serviceCIDR: "10.96.232.0/24"
  podCIDR: "192.168.0.0/16"
  vCenter:
    networkName: "vc01-net-1"
  controlPlaneIPBlock:
    netmask: "255.255.248.0"
    gateway: "21.0.143.254"
    ips:
    - ip: "21.0.140.226"
      hostname: "admin-cp-vm-1"
    - ip: "21.0.141.48"
      hostname: "admin-cp-vm-2"
    - ip: "21.0.141.65"
      hostname: "admin-cp-vm-3"
loadBalancer:
  vips:
    controlPlaneVIP: "172.16.20.59"
  kind: "MetalLB"
antiAffinityGroups:
  enabled: true
adminMaster:
  cpus: 4
  memoryMB: 16384
  replicas: 3
componentAccessServiceAccountKeyPath: "sa-key.json"
gkeConnect:
  projectID: "my-project-123"
  registerServiceAccountKeyPath: "connect-register-sa-2203040617.json"
stackdriver:
  projectID: "my-project-123"
  clusterLocation: "us-central1"
  enableVPC: false
  serviceAccountKeyPath: "log-mon-sa-2203040617.json"
  disableVsphereResourceMetrics: false
clusterBackup:
  datastore: "vc-01-datastore-bu"
autoRepair:
  enabled: true
osImageType: "ubuntu_containerd"

Validate your configuration file

After you've filled in your admin cluster configuration file, run gkectl check-config to verify that the file is valid:

gkectl check-config --config ADMIN_CLUSTER_CONFIG

Replace ADMIN_CLUSTER_CONFIG with the path of your admin cluster configuration file.

If the command returns any failure messages, fix the issues and validate the file again.

If you want to skip the more time-consuming validations, pass the --fast flag. To skip individual validations, use the --skip-validation-xxx flags. To learn more about the check-config command, see Running preflight checks.

Get OS images

Run gkectl prepare to initialize your vSphere environment:

gkectl prepare --config ADMIN_CLUSTER_CONFIG

The gkectl prepare command performs the following preparatory tasks:

  • Imports OS images to vSphere and marks them as VM templates.

  • If you are using a private Docker registry, pushes the container images to your registry.

  • Optionally, validates the container images' build attestations, thereby verifying the images were built and signed by Google and are ready for deployment.

Create the admin cluster

Create the admin cluster:

gkectl create admin --config ADMIN_CLUSTER_CONFIG

If you use VPC Service Controls, you might see errors when you run some gkectl commands, such as "Validation Category: GCP - [UNKNOWN] GCP service: [Stackdriver] could not get GCP services". To avoid these errors, add the --skip-validation-gcp parameter to your commands.

Resume creation of the admin cluster after a failure

If the admin cluster creation fails or is canceled, you can run the create command again:

gkectl create admin --config ADMIN_CLUSTER_CONFIG

Locate the admin cluster kubeconfig file

The gkectl create admin command creates a kubeconfig file named kubeconfig in the current directory. You will need this kubeconfig file later to interact with your admin cluster.

The kubeconfig file contains the name of your admin cluster. To view the cluster name, you can run:

kubectl config get-clusters --kubeconfig ADMIN_CLUSTER_KUBECONFIG

The output shows the name of the cluster. For example:

NAME
gke-admin-tqk8x

If you like, you can change the name and location of your kubeconfig file.

Manage the checkpoint.yaml file

This section applies only to non-HA admin clusters. The checkpoint.yaml file isn't used the creation of HA admin clusters.

When you ran the gkectl create admin command to create the admin cluster, it created a checkpoint file in the same datastore folder as the admin cluster data disk. By default, this file has the name DATA_DISK_NAME‑checkpoint.yaml. If the length of DATA_DISK_NAME is greater than or equal to 245 characters, then, due to the vSphere limit on filename length, the name is DATA_DISK_NAME.yaml.

This file contains the admin cluster state and credentials, and is used for future upgrades. Don't delete this file unless you are following the process for deleting an admin cluster.

If you have enabled VM encryption in your instance of vCenter Server, then you must have the Cryptographic operations.Direct Access privilege before you create or upgrade your admin cluster. Otherwise the checkpoint will not be uploaded. If you cannot obtain this privilege, then you can disable uploading the checkpoint file by using the hidden flag --disable-checkpoint when you run a relevant command.

The checkpoint.yaml file is automatically updated when you run the gkectl upgrade admin command, or when you run a gkectl update command that affects the admin cluster.

Verify that your admin cluster is running

Verify that your admin cluster is running:

kubectl get nodes --kubeconfig ADMIN_CLUSTER_KUBECONFIG

Replace ADMIN_CLUSTER_KUBECONFIG with the path of your admin cluster kubeconfig file.

The output shows the admin cluster nodes. For example:

admin-cp-vm-1   Ready    control-plane,master   ...
admin-cp-vm-2   Ready    control-plane,master   ...
admin-cp-vm-3   Ready    control-plane,master   ...

Back up files

We recommend that you back up your admin cluster kubeconfig file. That is, copy the kubeconfig file from your admin workstation to another location. Then if you lose access to the admin workstation, or if the kubeconfig file on your admin workstation gets accidentally deleted, you still have access to the admin cluster.

We also recommend that you back up the private SSH key for your admin cluster. Then if you lose access to the admin cluster, you can still use SSH to connect to the admin cluster nodes. This will allow you to troubleshoot and investigate any issues with connectivity to the admin cluster.

Extract the SSH key from the admin cluster to a file named admin-cluster-ssh-key:

kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG get secrets -n kube-system sshkeys \
    -o jsonpath='{.data.vsphere_tmp}' | base64 -d > admin-cluster-ssh-key

Now you can back up admin-cluster-ssh-key to another location of your choice.

RBAC policies

When you fill in the gkeConnect section in your admin cluster configuration file, the cluster is registered to your fleet during creation or update. To enable fleet management functionality, Google Cloud deploys the Connect agent and creates a Google service account that represents the project that the cluster is registered to. The Connect agent establishes a connection with the service account to handle requests to the cluster's Kubernetes API server. This enables access to cluster and workload management features in Google Cloud, including access to the Google Cloud console, which lets you interact with your cluster.

The admin cluster's Kubernetes API server needs to be able to authorize requests from the Connect agent. To ensure this, the following role-based access control (RBAC) policies are configured on the service account:

  • An impersonation policy that authorizes the Connect agent to send requests to the Kubernetes API server on behalf of the service account.

  • A permissions policy that specifies the operations that are allowed on other Kubernetes resources.

The service account and RBAC policies are needed so that you can manage the lifecycle of your user clusters in the Google Cloud console.

Troubleshooting

See Troubleshooting cluster creation and upgrade.

What's next

Create a user cluster