Creating user clusters in a multi-cluster setup

In GKE on Bare Metal, user clusters run your workloads, and in a multi-cluster architecture, user clusters are created and managed by an admin cluster.

Once you've created an admin cluster, calling the bmctl create config command creates a yaml file you can edit to define your user cluster. Then you can use the kubectl command to apply that configuration and create the user cluster.

Keeping workloads off the admin cluster protects sensitive administrative data, like SSH keys stored in the admin cluster, from those who don't need access to that information. Additionally, keeping user clusters separate from each other provides good general security for your workloads.

Prerequisites

  • bmctl downloaded from gs://anthos-baremetal-release/bmctl/1.6.2/linux-amd64/bmctl
  • Working admin cluster with access to the cluster API server (the controlPlaneVIP)
  • Admin cluster nodes need to have network connectivity to all nodes on the target user cluster
  • SSH key used to create user cluster available as a root or SUDO user on all nodes in the user cluster

Create a user cluster config file

The config file for creating a user cluster is almost exactly the same as the one used for creating an admin cluster. The only difference is that you remove the local credentials configuration section to make the config a valid collection of Kubernetes resources. The configuration section is at the top of the file under the bmctl configuration variables section.

By default, user clusters inherit their credentials from the admin cluster that manages them. You can selectively override some or all of these credentials. See the sample user cluster config file for more details.

In the steps below, double check your editing of the config file. Because you are creating the user cluster with the kubectl command, there are limited preflight checks on the user cluster config.

  1. Create a user cluster config file with the bmctl create configcommand:

    bmctl create config -c USER_CLUSTER_NAME
    

    For example, issue the following to create a config file for a user cluster called user1:

    bmctl create config -c user1
    

    The file is written to bmctl-workspace/user1/user1.yaml. The generic path to the file is bmctl-workspace/CLUSTER NAME/CLUSTER_NAME.yaml

  2. Edit the config file with the following changes:

    • Remove the local credentials file paths from the config. They are not needed for a user cluster, and won't work with the kubectl command. Remove the following items:
      ....
        gcrKeyPath: {path to GCR service account key}
        sshPrivateKeyPath: (path to SSH private key, used for node access)
        gkeConnectAgentServiceAccountKeyPath: (path to Connect agent service account key)
        gkeConnectRegisterServiceAccountKeyPath: (path to Hub registration service account key)
        cloudOperationsServiceAccountKeyPath: (path to Cloud Operations service account key)
      ....
            
    • Change the config to specify a cluster type of user instead of admin:
      ....
      spec:
        # Cluster type. This can be:
        #   1) admin:  to create an admin cluster. This can later be used to create
        #   user clusters.
        #   2) user:   to create a user cluster. Requires an existing admin cluster.
        #   3) hybrid: to create a hybrid cluster that runs admin cluster 
        #   components and user workloads.
        #   4) standalone: to create a cluster that manages itself, runs user
        #   workloads, but does not manage other clusters.
        type: user
      ....
      
    • Ensure the admin and user cluster specifications for the load balancer VIPs and address pools are complementary, and do not overlap existing clusters. A sample pair of admin and user cluster configurations, specifying load balancing and address pools, is shown below:
    • ....
      # Sample admin cluster config for load balancer and address pools
        loadBalancer:
          vips:
            controlPlaneVIP: 10.200.0.49
            ingressVIP: 10.200.0.50
          addressPools:
          - name: pool1
            addresses:
            - 10.200.0.50-10.200.0.70
      ....
      ....
      # Sample user cluster config for load balancer and address pools
      loadBalancer:
          vips:
            controlPlaneVIP: 10.200.0.71
            ingressVIP: 10.200.0.72
          addressPools:
          - name: pool1
            addresses:
            - 10.200.0.72-10.200.0.90
      ....
      

      Note that the rest of the user cluster config files is the same as the admin cluster config.

  3. Double-check your user cluster config file. There are limited GKE on Bare Metal preflight checks when you create a user cluster, and they only cover machine level checks (such as OS version, conflicting software versions, and available resources).

Create the user cluster

Issue the kubectl command to apply the user cluster config and create the cluster:

kubectl --kubeconfig ADMIN_KUBECONFIG apply -f USER_CLUSTER_CONFIG

ADMIN_KUBECONFIG specifies the path to the admin cluster kubeconfig file, and USER_CLUSTER_CONFIG specifies the path to the user cluster yaml file you edited in the previous section.

For example, for an admin cluster named admin, and a user cluster config named user1, the command would be:

kubectl --kubeconfig bmctl-workspace/admin/admin-kubeconfig apply /
  -f bmctl-workspace/user1/user1.yaml

Waiting for the user cluster to be ready

To verify the user cluster is ready, use the kubectl wait command to test for a condition. The following waits for 30 minutes to check that the cluster state has finished reconciling, and fetches the created user cluster kubeconfig file:

kubectl --kubeconfig ADMIN_KUBECONFIG wait \
  cluster USER_CLUSTER_NAME -n cluster-USER_CLUSTER_NAME \
  --for=condition=Reconciling=False --timeout=30m && \
  kubectl --kubeconfig ADMIN_KUBECONFIG get secret USER_CLUSTER_NAME-kubeconfig \
  -n cluster-USER_CLUSTER_NAME \
  -o 'jsonpath={.data.value}' | base64 -d > USER_KUBECONFIG

Where:

  • ADMIN_KUBECONFIG specifies the path to the admin cluster kubeconfig file.
  • USER_KUBECONFIG specifies the path to the user kubeconfig file you want to create.
  • USER_CLUSTER_NAME is the name of your user cluster.

For example, for an admin cluster named admin, and a user cluster config named user1, the command would be:

kubectl --kubeconfig bmctl-workspace/admin/admin-kubeconfig wait  \
  cluster user1 -n cluster-user1 --for=condition=Reconciling=False --timeout=30m &&  \
  kubectl --kubeconfig bmctl-workspace/admin/admin-kubeconfig get secret  \
  user1-kubeconfig -n cluster-user1 -o 'jsonpath={.data.value}' | base64  \
  -d > bmctl-workspace/user1/user1-kubeconfig

Waiting for worker node pools to become ready

Most of the time, you will also need to wait for the worker node pools to become ready. Typically, you use worker node pools for your user clusters to run workloads.

To verify that the worker node pools are ready, use the kubectl wait command to test for a condition. In this case, the command again waits for 30 minutes for the nodepools to become ready:

kubectl --kubeconfig ADMIN_KUBECONFIG wait cluster USER_CLUSTER_NAME \
  -n cluster-USER_CLUSTER_NAME --for=condition=Reconciling=False --timeout=30m && \
  kubectl --kubeconfig ADMIN_KUBECONFIG wait nodepool NODE_POOL_NAME \
  -n cluster-USER_CLUSTER_NAME --for=condition=Reconciling=False --timeout=30m && \
  kubectl --kubeconfig ADMIN_KUBECONFIG get secret \
  USER_CLUSTER_NAME-kubeconfig -n cluster-USER_CLUSTER_NAME -o \
  'jsonpath={.data.value}' | base64 -d >  USER_KUBECONFIG
  

NODE_POOL_NAME specifies a node pool you create with the user cluster.

For example, for an admin cluster named admin, a user cluster config named user1, and a node pool named node-pool-1, the command would be:

kubectl --kubeconfig bmctl-workspace/admin/admin-kubeconfig wait \
  cluster user1 -n cluster-user1 --for=condition=Reconciling=False --timeout=30m && \
  kubectl --kubeconfig bmctl-workspace/admin/admin-kubeconfig wait \
  nodepool node-pool-1 -n cluster-user1 --for=condition=Reconciling=False --timeout=30m && \
  kubectl --kubeconfig bmctl-workspace/admin/admin-kubeconfig get secret \
  user1-kubeconfig -n cluster-user1 -o \
  'jsonpath={.data.value}' | base64 -d > bmctl-workspace/user1/user1-kubeconfig

Sample complete user cluster config

The following is a sample user cluster config file created by the bmctl command. Note that in this sample config, placeholder cluster names, VIPs and addresses are used. They may not work for your network.

# Sample user cluster config:

apiVersion: v1
kind: Namespace
metadata:
  name: cluster-user1
---
apiVersion: baremetal.cluster.gke.io/v1
kind: Cluster
metadata:
  name: user1
  namespace: cluster-user1
spec:
  # Cluster type. This can be:
  #   1) admin:  to create an admin cluster. This can later be used to create user clusters.
  #   2) user:   to create a user cluster. Requires an existing admin cluster.
  #   3) hybrid: to create a hybrid cluster that runs admin cluster components and user workloads.
  #   4) standalone: to create a cluster that manages itself, runs user workloads, 
  #   but does not manage other clusters.
  type: user
  # Anthos cluster version.
  anthosBareMetalVersion: 1.6.2
  # GKE connect configuration
  gkeConnect:
    projectID: GOOGLE_PROJECT_ID
    # To override default credentials inherited from the admin cluster:
    # 1. Create a new secret in the admin cluster
    # 2. Uncomment the section below and refer to the secret created above
    # # Optionally override default secret inherited from the admin cluster.
    # connectServiceAccountSecret:
    #  name: GKE_CONNECT_AGENT_SA_SECRET
    #  namespace: cluster-user1
    # # Optionally override default secret inherited from the admin cluster.
    # registerServiceAccountSecret:
    #  name: GKE_CONNECT_REGISTER_SA_SECRET
    #  namespace: cluster-user1
  # Control plane configuration
  controlPlane:
    nodePoolSpec:
      nodes:
      # Control plane node pools. Typically, this is either a single machine
      # or 3 machines if using a high availability deployment.
      - address: 10.200.0.4
  # Cluster networking configuration
  clusterNetwork:
    # Pods specify the IP ranges from which Pod networks are allocated.
    pods:
      cidrBlocks:
      - 192.168.0.0/16
    # Services specify the network ranges from which service VIPs are allocated.
    # This can be any RFC 1918 range that does not conflict with any other IP range
    # in the cluster and node pool resources.
    services:
      cidrBlocks:
      - 10.96.0.0/12
  # Credentials specify the secrets that hold SSH key and image pull credential for the new cluster.
  # credentials:
  #  # Optionally override default ssh key secret inherited from the admin cluster.
  #  sshKeySecret:
  #    name: SSH_KEY_SECRET
  #    namespace: cluster-user1
  #  # Optionally override default image pull secret inherited from the admin cluster.
  #  imagePullSecret:
  #    name: IMAGE_PULL_SECRET
  #    namespace: cluster-user1
  # Load balancer configuration
  loadBalancer:
    # Load balancer mode can be either 'bundled' or 'manual'.
    # In 'bundled' mode a load balancer will be installed on load balancer nodes during cluster creation.
    # In 'manual' mode the cluster relies on a manually-configured external load balancer.
    mode: bundled
    # Load balancer port configuration
    ports:
      # Specifies the port the LB serves the kubernetes control plane on.
      # In 'manual' mode the external load balancer must be listening on this port.
      controlPlaneLBPort: 443
    # There are two load balancer VIPs: one for the control plane and one for the L7 Ingress
    # service. The VIPs must be in the same subnet as the load balancer nodes.
    vips:
      # ControlPlaneVIP specifies the VIP to connect to the Kubernetes API server.
      # This address must not be in the address pools below.
      controlPlaneVIP: 10.200.0.71
      # IngressVIP specifies the VIP shared by all services for ingress traffic.
      # Allowed only in non-admin clusters.
      # This address must be in the address pools below.
      ingressVIP: 10.200.0.72
    # AddressPools is a list of non-overlapping IP ranges for the data plane load balancer.
    # All addresses must be in the same subnet as the load balancer nodes.
    # Address pool configuration is only valid for 'bundled' LB mode in non-admin clusters.
    addressPools:
    - name: pool1
      addresses:
      # Each address must be either in the CIDR form (1.2.3.0/24)
      # or range form (1.2.3.1-1.2.3.5).
      - 10.200.0.72-10.200.0.90
    # A load balancer nodepool can be configured to specify nodes used for load balancing.
    # These nodes are part of the kubernetes cluster and run regular workloads as well as load balancers.
    # If the node pool config is absent then the control plane nodes are used.
    # Node pool configuration is only valid for 'bundled' LB mode.
    # nodePoolSpec:
    #  nodes:
    #  - address: 
  # Proxy configuration
  # proxy:
  #   url: http://[username:password@]domain
  #   # A list of IPs, hostnames or domains that should not be proxied.
  #   noProxy:
  #   - 127.0.0.1
  #   - localhost
  # Logging and Monitoring
  clusterOperations:
    # Cloud project for logs and metrics.
    projectID: $GOOGLE_PROJECT_ID
    # Cloud location for logs and metrics.
    location: us-central1
    # Optionally override default secret inherited from the admin cluster.
    # serviceAccountSecret:
    #  name: $CLOUD_OPERATIONS_SA_SECRET
    #  namespace: cluster-user1
    # Whether collection of application logs/metrics should be enabled (in addition to
    # collection of system logs/metrics which correspond to system components such as
    # Kubernetes control plane or cluster management agents).
    # enableApplication: false
  # Storage configuration
  storage:
    # lvpNodeMounts specifies the config for local PersistentVolumes backed by mounted disks.
    # These disks need to be formatted and mounted by the user, which can be done before or after
    # cluster creation.
    lvpNodeMounts:
      # path specifies the host machine path where mounted disks will be discovered and a local PV
      # will be created for each mount.
      path: /mnt/localpv-disk
      # storageClassName specifies the StorageClass that PVs will be created with. The StorageClass
      # is created during cluster creation.
      storageClassName: local-disks
    # lvpShare specifies the config for local PersistentVolumes backed by subdirectories in a shared filesystem.
    # These subdirectories are automatically created during cluster creation.
    lvpShare:
      # path specifies the host machine path where subdirectories will be created on each host. A local PV
      # will be created for each subdirectory.
      path: /mnt/localpv-share
      # storageClassName specifies the StorageClass that PVs will be created with. The StorageClass
      # is created during cluster creation.
      storageClassName: local-shared
      # numPVUnderSharedPath specifies the number of subdirectories to create under path.
      numPVUnderSharedPath: 5
  # Authentication; uncomment this section if you wish to enable authentication to the cluster with OpenID Connect.
  # authentication:
  #   oidc:
  #     # issuerURL specifies the URL of your OpenID provider, such as "https://accounts.google.com". The Kubernetes API
  #     # server uses this URL to discover public keys for verifying tokens. Must use HTTPS.
  #     issuerURL: 
  #     # clientID specifies the ID for the client application that makes authentication requests to the OpenID
  #     # provider.
  #     clientID: 
  #     # clientSecret specifies the secret for the client application.
  #     clientSecret: 
  #     # kubectlRedirectURL specifies the redirect URL (required) for the gcloud CLI, such as
  #     # "http://localhost:[PORT]/callback".
  #     kubectlRedirectURL: <Redirect URL for the gcloud CLI; optional, default is "http://kubectl.redirect.invalid">
  #     # username specifies the JWT claim to use as the username. The default is "sub", which is expected to be a
  #     # unique identifier of the end user.
  #     username: <JWT claim to use as the username; optional, default is "sub">
  #     # usernamePrefix specifies the prefix prepended to username claims to prevent clashes with existing names.
  #     usernamePrefix: 
  #     # group specifies the JWT claim that the provider will use to return your security groups.
  #     group: 
  #     # groupPrefix specifies the prefix prepended to group claims to prevent clashes with existing names.
  #     groupPrefix: 
  #     # scopes specifies additional scopes to send to the OpenID provider as a comma-delimited list.
  #     scopes: 
  #     # extraParams specifies additional key-value parameters to send to the OpenID provider as a comma-delimited
  #     # list.
  #     extraParams: 
  #     # certificateAuthorityData specifies a Base64 PEM-encoded certificate authority certificate of your identity
  #     # provider. It's not needed if your identity provider's certificate was issued by a well-known public CA.
  #     certificateAuthorityData: 
  # Node access configuration; uncomment this section if you wish to use a non-root user
  # with passwordless sudo capability for machine login.
  # nodeAccess:
  #   loginUser: 
---
# Node pools for worker nodes
apiVersion: baremetal.cluster.gke.io/v1
kind: NodePool
metadata:
  name: node-pool-1
  namespace: cluster-user1
spec:
  clusterName: user1
  nodes:
  - address: 10.200.0.5