This page shows how to create a user cluster for Anthos clusters on VMware (GKE on-prem).
The instructions here are complete. For a shorter introduction to creating a user cluster, see Create a user cluster (quickstart).
Before you begin
Get an SSH connection to your admin workstation
Get an SSH connection to your admin workstation.
Recall that gkeadm
activated your
component access service account on the admin workstation.
Do all the remaining steps in this topic on your admin workstation in the home directory.
Credentials configuration file
When you used gkeadm
to create your admin workstation, you filled in a
credentials configuration file named credential.yaml
. This file holds the
username and password for your vCenter server.
Admin cluster configuration file
When gkeadm
created your admin workstation, it generated a configuration file
named user-cluster.yaml
. This configuration file is for creating your user
cluster.
Filling in your configuration file
name
Set the
name
field to a name of your choice for the user cluster.
gkeOnPremVersion
This field is already filled in for you.
vCenter
The values you set in the vCenter
section of your
admin cluster configuration file
are global. That is, they apply to your admin cluster and your user clusters.
For each user cluster that you create, you have the option of overriding some of
the global vCenter
values.
If you want to override any of the global vCenter
values, fill in the relevant
fields in the
vCenter
section of your user cluster configuration file.
network
Set network.ipMode.type
to the same value that you set in your
admin cluster configuration file:
either "dhcp"
or "static"
.
If you set ipMode.type
to "static"
, create an
IP block file
that provides the static IP addresses for the nodes in your user cluster. Then
set
network.ipMode.ipBlockFilePath
to the path of your IP block file.
Provide values for the remaining fields in the
network
section.
Regardless of whether you rely on a DHCP server or specify a list of static IP addresses, you need enough IP addresses to satisfy the following:
The nodes in your user cluster
An additional node in the user cluster to be used temporarily during upgrades
As mentioned previously, if you want to use static IP addresses, then you need to provide an IP block file. Here is an example of an IP block file with six hosts. This is enough addresses for a cluster that has five nodes and an occasional sixth node for upgrades:
blocks: - netmask: 255.255.252.0 gateway: 172.16.23.254 ips: - ip: 172.16.20.21 hostname: user-host1 - ip: 172.16.20.22 hostname: user-host2 - ip: 172.16.20.23 hostname: user-host3 - ip: 172.16.20.24 hostname: user-host4 - ip: 172.16.20.25 hostname: user-host5 - ip: 172.16.20.26 hostname: user-host6
loadBalancer
Set aside a VIP for the Kubernetes API server of your user cluster. Set aside
another VIP for the ingress service of your user cluster. Provide your VIPs as
values for
loadBalancer.vips.controlPlaneVIP
and
loadBalancer.vips.ingressVIP
.
Set loadBalancer.kind
to the same value that you set in your
admin cluster configuration file:
"ManualLB"
, "F5BigIP"
, or "Seesaw"
. Then fill in the corresponding
section:
manualLB
,
f5BigIP
,
or
seesaw
.
proxy
If the network that will have your user cluster nodes is behind a proxy server,
fill in the
proxy
section.
masterNode
Fill in the
masterNode
section.
nodePools
Fill in the
nodePools
section.
antiAffinityGroups
Set
antiAffinityGroups.enabled
to true
or false
.
authentication
If you want to use OpenID Connect (OIDC) to authenticate users, fill in the
authentication.oidc
section.
If you want to provide an additional serving certificate for your user cluster's
vCenter server, fill in the
authentication.sni
section.
stackdriver
Fill in the
stackdriver
section.
gkeConnect
Fill in the
gkeConnect
section.
cloudRun
Set
cloudRun.enabled
to true
or false
.
usageMetering
If you want to enable usage metering for your cluster, then fill in the
usageMetering
section.
cloudAuditLogging
If you want to integrate the audit logs from your cluster's Kubernetes API
server with Cloud Audit Logs, fill in the
cloudAuditLogging
section.
Validating your configuration file
After you've filled in your user cluster configuration file, run
gkectl check-config
to verify that the file is valid:
gkectl check-config --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] --config [CONFIG_PATH]
where:
[ADMIN_CLUSTER_KUBECONFIG] is the path of the kubeconfig file for your admin cluster.
[CONFIG_PATH] is the path of your user cluster configuration file.
If the command returns any failure messages, fix the issues and validate the file again.
If you want to skip the more time-consuming validations, pass the --fast
flag.
To skip individual validations, use the --skip-validation-xxx
flags. To
learn more about the check-config
command, see
Running preflight checks.
Creating a Seesaw load balancer for your user cluster
If you have chosen to use the bundled Seesaw load balancer, do the step in this section. Otherwise, skip this section.
Create and configure the VMs for your Seesaw load balancer:
gkectl create loadbalancer --kubeconfig kubeconfig --config user-cluster.yaml
Creating the user cluster
Create the user cluster:
gkectl create cluster --config [CONFIG_PATH] --skip-validation-all
where [CONFIG_PATH] is the path of your user cluster configuration file.
The gkectl create cluster
command creates a kubeconfig file named
[USER_CLUSTER_NAME]-kubeconfig
in the current directory. You will need this
kubeconfig file later to interact with your user cluster.
Verifying that your user cluster is running
Verify that your user cluster is running:
kubectl get nodes --kubeconfig [USER_CLUSTER_KUBECONFIG]
where [USER_CLUSTER_KUBECONFIG] is the path of your kubeconfig file.
The output shows the user cluster nodes.
Troubleshooting
Diagnosing cluster issues using gkectl
Use gkectl diagnose
commands to identify cluster issues
and share cluster information with Google. See
Diagnosing cluster issues.
Default logging behavior
For gkectl
and gkeadm
it is sufficient to use the
default logging settings:
-
By default, log entries are saved as follows:
-
For
gkectl
, the default log file is/home/ubuntu/.config/gke-on-prem/logs/gkectl-$(date).log
, and the file is symlinked with thelogs/gkectl-$(date).log
file in the local directory where you rungkectl
. -
For
gkeadm
, the default log file islogs/gkeadm-$(date).log
in the local directory where you rungkeadm
.
-
For
- All log entries are saved in the log file, even if they are not printed in
the terminal (when
--alsologtostderr
isfalse
). - The
-v5
verbosity level (default) covers all the log entries needed by the support team. - The log file also contains the command executed and the failure message.
We recommend that you send the log file to the support team when you need help.
Specifying a non-default location for the log file
To specify a non-default location for the gkectl
log file, use
the --log_file
flag. The log file that you specify will not be
symlinked with the local directory.
To specify a non-default location for the gkeadm
log file, use
the --log_file
flag.
Locating Cluster API logs in the admin cluster
If a VM fails to start after the admin control plane has started, you can try debugging this by inspecting the Cluster API controllers' logs in the admin cluster:
Find the name of the Cluster API controllers Pod in the
kube-system
namespace, where [ADMIN_CLUSTER_KUBECONFIG] is the path to the admin cluster's kubeconfig file:kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] -n kube-system get pods | grep clusterapi-controllers
Open the Pod's logs, where [POD_NAME] is the name of the Pod. Optionally, use
grep
or a similar tool to search for errors:kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] -n kube-system logs [POD_NAME] vsphere-controller-manager
Debugging F5 BIG-IP issues using the admin cluster control plane node's kubeconfig
After an installation, Anthos clusters on VMware generates a kubeconfig file in
the home directory of your admin workstation named
internal-cluster-kubeconfig-debug
. This kubeconfig file is
identical to your admin cluster's kubeconfig, except that it points directly at
the admin cluster's control plane node, where the admin control plane runs. You can use
the internal-cluster-kubeconfig-debug
file to debug F5 BIG-IP
issues.
gkectl check-config
validation fails: can't find F5 BIG-IP partitions
- Symptoms
Validation fails because F5 BIG-IP partitions can't be found, even though they exist.
- Potential causes
An issue with the F5 BIG-IP API can cause validation to fail.
- Resolution
Try running
gkectl check-config
again.
gkectl prepare --validate-attestations
fails: could not validate build attestation
- Symptoms
Running
gkectl prepare
with the optional--validate-attestations
flag returns the following error:could not validate build attestation for gcr.io/gke-on-prem-release/.../...: VIOLATES_POLICY
- Potential causes
An attestation might not exist for the affected image(s).
- Resolution
Try downloading and deploying the admin workstation OVA again, as instructed in Creating an admin workstation. If the issue persists, reach out to Google for assistance.
Debugging using the bootstrap cluster's logs
During installation, Anthos clusters on VMware creates a temporary bootstrap cluster. After a successful installation, Anthos clusters on VMware deletes the bootstrap cluster, leaving you with your admin cluster and user cluster. Generally, you should have no reason to interact with this cluster.
If something goes wrong during an installation, and you did pass
--cleanup-external-cluster=false
to gkectl create cluster
,
you might find it useful to debug using the bootstrap cluster's logs. You can
find the Pod, and then get its logs:
kubectl --kubeconfig /home/ubuntu/.kube/kind-config-gkectl get pods -n kube-system
kubectl --kubeconfig /home/ubuntu/.kube/kind-config-gkectl -n kube-system get logs [POD_NAME]
For more information, refer to Troubleshooting.