Kubernetes GitOps best practices with Config Sync

This page provides a starting point to help you plan and architect CI/CD GitOps pipelines for Kubernetes, which can help you make the most of Config Sync.

This page is for Admins and architects and Operators who want to implement GitOps in their environment. To learn more about common roles and example tasks that we reference in Google Cloud content, see Common GKE Enterprise user roles and tasks.

GitOps itself is a universal best practice for organizations managing Kubernetes configuration as scale. But when it comes to architecting that solution, you have many choices. Understanding your options and the benefits and trade-offs of those decisions can help you avoid rewriting your architecture in the future.

You don't need to use every best practice listed on this page. Which best practices you choose to adopt will depend on your unique situation. The goal of this page is to help you make informed decisions when setting up your GitOps architecture.

Use a centralized, private package repository

Using a central repository for public or internal packages (such as Helm or kpt) can help teams find packages more easily. You can use services like Artifact Registry or Git repositories.

The platform team can implement policies where application teams can use packages only from the central repository. Alternatively, they could use the central repository as a set of vetted packages.

You can limit write permissions to the repository to only a small number of engineers. The rest of the organization can have read access. We recommend implementing a process for promoting packages into the central repository and broadcasting updates.

The following table lists the benefits and downsides of using a centralized, private package repository:

Benefits

Downsides

  • Ingest public packages at a defined cadence, which helps avoid getting broken by connectivity or upstream churn.
  • Review and scan shared packages.
  • Provides an easy way to discover what is in use and supported. For example, teams can more easily find the standard Redis deployment stored in the central repository.
  • Make changes to upstream packages to ensure that they meet internal standards such as default values, adding labels, and container image repositories.
  • Someone must maintain the central repository.
  • Adds more process for application teams.

Create wet repositories

Create repositories with the YAML output that matches the desired state of your cluster or namespace. The changes to the wet or fully-hydrated repository should be easy to review by using a diff. Good practice is to make changes to only the wet repository through a review process (for example, in GitHub, this would be a pull request).

The following table lists the benefits and downsides of creating wet repositories:

Benefits

Downsides

  • Easier to examine the diff.
  • No processing needed to see what the intended state of the configuration is.
  • Fully hydrating configuration can lead to repeated YAML.

Shift left for validating configs

Waiting until Config Sync starts syncing to check for issues can create unnecessary Git commits and a long feedback loop. Many issues can be found before a config is applied to a cluster by using kpt validator functions.

The following table lists the benefits and downsides of checking for issues before applying a config:

Benefits

Downsides

  • Surfacing config changes in a change request can help prevent errors from making it into a repository.
  • Reduces the impact of issues in shared configurations.
  • Must add tooling and logic to your commit process to help catch issues.

Use folders instead of branches

Use folders for variants of the configuration instead of branches. With folders, you can use the tree command to see variants. For example, with branches, you can't tell if the delta between a prod and stage branch is an upcoming change in configuration or a permanent difference between what stage and prod should look like.

The following table lists the benefits and downsides of using folders instead of branches:

Benefits

Downsides

  • Discovery of folders is easier than branches.
  • Doing a diff on folders is possible with many CLI and GUI tools, while branch diff is less common outside of Git providers.
  • Differentiating between permanent differences and unpromoted differences is easier with folders.
  • You can roll out changes to multiple clusters and namespaces in one change request whereas branches require several change requests to different branches.
  • Promoting config changes using a change request to the same files is not possible.

Minimize use of ClusterSelectors

ClusterSelectors let you apply certain parts of a configuration to a subset of clusters. Instead of configuring a RootSync or RepoSync, you can instead modify either the resource that is being applied or add labels to the clusters. Over time, however, as the number of ClusterSelectors grows, it can become complicated to understand the final state of the cluster.

Config Sync lets you sync multiple RootSyncs and RepoSyncs at once, meaning you can add the relevant configuration to a separate repository and then sync it to the clusters you want.

The following table lists the benefits and downsides of not using ClusterSelectors:

Benefits

Downsides

  • Easier to assemble the configuration that will be on the cluster into a folder instead of making that decision on the cluster.
  • Reduces the mental load of understanding what will actually be applied to the cluster.
  • Labels are a lightweight way to add a trait to a cluster and it's more complex to create a new `RepoSync`.

Avoid managing Jobs with Config Sync

While Config Sync can apply Jobs for you, Jobs are not well suited for GitOps deployment for the following reasons:

  • Immutable fields: Many Job fields are immutable. To change an immutable field, the object must be deleted and recreated. However, Config Sync doesn't delete your object unless you remove it from the source.

  • Unintended running of Jobs: If you sync a Job with Config Sync and then that Job is deleted from the cluster, Config Sync considers that drift from your chosen state and re-creates the Job. If you specify a Job time to live (TTL), the Job is automatically deleted and Config Sync automatically re-creates it, restarting the Job, until you delete the Job from the source of truth. Often, this is not what was intended, because Config Sync runs the Job again.

  • Reconciliation issues: Config Sync normally waits for objects to reconcile after being applied. However, Jobs are considered reconciled when they have started running. This means that Config Sync doesn't wait for the Job to complete before continuing to apply other objects. However, if the Job later fails, that is considered a failure to reconcile. In some cases, this can block other resources from being synced and cause errors until you fix it. In other cases, the syncing might succeed and only reconciling fails.

For these reasons, we don't recommend syncing Jobs with Config Sync.

In most cases, Jobs and other situational tasks should be managed by a service that handles their lifecycle management. You can then manage that service with Config Sync, instead of the Jobs themselves.

The following table lists the benefits and downsides of not using Config Sync to manage Jobs:

Benefits

Downsides

  • Increases GitOps compatibility. Jobs don't work well with the declarative, version-controlled approach of GitOps due to their immutable fields.
  • Reduces unintended consequences. Eliminates the risk of Config Sync automatically recreating deleted Jobs, potentially causing them to run unexpectedly.
  • Fewer sync errors. Potential sync conflicts and errors triggered by failed Jobs are avoided.
  • Manual Job management. You need to find another service to manage Jobs.

Use unstructured repositories

Config Sync supports two structures for organizing a repository: unstructured and hierarchical. Unstructured is the recommended approach because it lets you organize a repository in the way that's most convenient for you. Hierarchical repositories, by comparison, enforce a specific structure. For example, CRDs have to be in a specific directory. This can cause issues when you need to share configs. For example, if one team publishes a package that contains a CRD, another team that needs to use that package would have to move the CRD into a cluster directory, adding more overhead to the process.

The following table lists the benefit and downside of using unstructured repositories:

Benefits

Downsides

  • You can reuse shared configuration packages even if they contain CRDs or other cluster-wide definitions in them.
  • Without a process or guidelines, repository structures may vary across teams which can make it harder to implement fleet-wide tools.

To learn how to convert a hierarchical repository, see Convert a hierarchical repository to an unstructured repository.

Separate code and config repositories

When scaling up a mono-repository, it requires a build specific to each folder. Permissions and concerns for people working on the code and working on the cluster configuration are generally different. By keeping code and config repositories separate, each repository can have its own permissions and structure.

The following table lists the benefits and downsides of separating code and config repositories:

Benefits

Downsides

  • Avoids "looping" commits. For example committing to a code repo might trigger a CI request, which might produce an image, which then requires a code commit, and so on.
  • You can use different permissions for people working on application code and cluster configuration.
  • Reduces discovery for app configuration since it's not in the same repository as application code.
  • Managing many repositories can be time-consuming.

Use separate repositories to insulate changes

When scaling up a mono-repository, different permissions are required on different folders. Because of this, separating repositories allows for security boundaries between security, platform, and application configuration. It's also a good idea to separate production and non-production repositories.

The following table lists the benefits and downsides of insulating changes in separate repositories:

Benefits

Downsides

  • In an organization with platform, security, and application teams, the cadence of changes and permissions are different.
  • Permissions remain at the repository level. CODEOWNERS files let organizations to limit write permission while still allowing read permission.
  • Config Sync supports multiple syncs per namespace which can achieve a "mix-in effect" from multiple repositories.
  • Managing many repositories is a task in and of its own. So in a case where you create a new repo per cluster the problem of setup/teardown of the cluster is now needs to include repo management.

Pin package versions

Whether using Helm or Git, you should pin the configuration package version to something that doesn't accidentally get moved forward without an explicit rollout.

The following table lists the benefit and downside of pinning package versions:

Benefits

Downsides

  • Shared configuration update can have a larger-than-intended impact if the package version is not pinned.
  • Rollouts require checks when shared packages are updated.

Use Workload Identity Federation for GKE

You can enable Workload Identity Federation for GKE on GKE clusters, which allows Kubernetes workloads to access Google services in a secure and manageable way.

The following table lists the benefit and downside of using Workload Identity Federation for GKE:

Benefits

Downsides

  • Reduces complexity and potential issues with secrets and passwords.
  • Services outside of Google Cloud (such as GitHub and GitLab) do not support Workload Identity Federation for GKE.

High-level architecture

At a high level, you likely want at least four types of repositories:

  1. A package repository where shared configuration is stored. This could also be a Helm chart stored in Artifact Registry.
  2. A platform repository where the platform team stores fleet-wide configuration for clusters and namespaces.
  3. An application configuration repository.
  4. An application code repository.

The following diagram shows the layout of these repositories:

Suggested architecture for a package and platform repository that flows
  into the application configuration and application code repositories.

The following diagram shows the flow of configuration from application code into an application configuration repository. Development teams push code for applications and application configurations into a repository. The code for both apps and configs is stored in the same place and application teams has control over these repositories. App teams can then push code into a build.

Suggested application build that shows application code and application
  configurations that are pushed into a build.

What's next