Best practices for policy management with Anthos Config Management and GitLab

This document highlights how to use Anthos Config Management and GitLab to manage multiple Kubernetes clusters in a production environment. Securing the Anthos Config Management repository is an important deployment step, and this document can help you through that process. This document is useful to you if you're in the process of deploying Anthos Config Management for production use and assumes that you are familiar with Kubernetes and Git.

If you're operating a platform that hosts apps for your organization, it's important that you have policies in place to help you protect your platform. Policies are rules aimed at protecting the platform, and the apps and data on the platform. The platform enforces these policies based on configurations describing the policies. Policy enforcement helps to improve the security and stability of the platform.

Anthos Config Management helps you manage policies at scale for platforms built on top of Google Kubernetes Engine (GKE), Anthos GKE, or other Kubernetes distributions. Anthos Config Management lets you create and manage Kubernetes objects across multiple clusters at once. You can manage any Kubernetes object with Anthos Config Management, but it's especially useful to enforce policies such as the following:

  • PodSecurityPolicies: Prevent Pods from using the root Linux user.
  • NetworkPolicies: Control the network traffic inside your clusters.
  • ClusterRoles and ClusterRoleBindings: Control permissions within a cluster.

As shown in the following diagram, Anthos Config Management is a GitOps-style tool; it uses a Git repository as its storage mechanism and source of truth.

Architecture of Kubernetes configuration in Git.

With Anthos Config Management, you deploy and manage Kubernetes objects by interacting with this Git repository. Getting write access to the master branch of the Anthos Config Management Git repository means being an administrator of the clusters that Anthos Config Management manages. The Anthos Config Management repository contains information that is useful to a potential attacker, so it's important to secure the Anthos Config Management repository.

This document uses GitLab Source Code Management (SCM) as a Git repository hosting service, and outlines the best practices for using Anthos Config Management and GitLab together to manage multiple Kubernetes clusters.

Anthos Config Management and GitLab architecture

The following diagram shows the deployment architecture of Anthos Config Management with GitLab to manage three Kubernetes clusters: one in GKE, one in GKE on-prem, and one in another cloud provider.

Deployment architecure of Anthos Config Management with GitLab.

The preceding diagram illustrates the following steps in the pipeline:

  1. To make a configuration change to one or all of those clusters, a user submits a modification, called a merge request (MR), that must be validated in GitLab.
  2. The MR triggers an automated pipeline by GitLab CI/CD, the continuous integration and delivery system built into GitLab, to test and validate the configuration.
  3. The MR is either approved or rejected by an administrator. After approval, the change is merged into the Git repository.
  4. After the approval, the Anthos Config Management agents that are running in each cluster read this modification from GitLab and apply it to their cluster.

GitLab hosting best practices in the context of Anthos Config Management

In this section, we recommend best practices for you to follow when using GitLab SCM to host and manage Anthos Config Management repositories. The general best practices for hosting GitLab, such as setting up a highly available architecture and regular backups also apply, but they're out of scope for this document.

Restrict administrative access

Anyone with root access to the virtual machine (or Kubernetes cluster) hosting GitLab can bypass GitLab security features and therefore should be considered an administrator for Anthos Config Management. This assumption is also true of anyone who is an administrator in GitLab, or anyone with write access to the GitLab database. If you give people GitLab administrator privilege, root access to the virtual machine (or Kubernetes cluster) hosting GitLab, or access to the GitLab database, they are also able to make changes in Anthos Config Management. For more information, see the Permissions documentation of GitLab.

If you're using Compute Engine, use Identity and Access Management (IAM) and the OS Login service to control who has access to the instance. In particular, the Compute OS Admin Login role grants root access to the instance.

If you're using GKE, use IAM and RBAC to control who can access the cluster.

Be diligent with updates

GitLab has one release per month, and multiple patch releases between each release. Like any software, GitLab provides security patches in some of these releases. For this reason, you should keep your GitLab instance up to date as much as possible. For more information, see Updating GitLab.

Follow Google Cloud-specific best practices for hosting GitLab on Google Cloud

If you host GitLab on Google Cloud, you should dedicate a Cloud project to GitLab. You should place this Cloud project directly under your Organization node, not in a folder. These recommendations can reduce the likelihood of the following IAM misconfigurations:

  • If GitLab shares a Cloud project with other apps, then you're at risk of inadvertently granting administrative access to GitLab when granting access to the other apps.
  • If you place that Cloud project in a folder, then you're at risk of inadvertently granting access to that Cloud project by granting access to the folder.

If you're deploying GitLab on Google Cloud, we also recommend that you take advantage of the following managed services:

  • Use Cloud SQL for PostgreSQL to host GitLab's database. Cloud SQL takes care of high availability and backups for you.
  • Use Memorystore for Redis as the GitLab Redis server. Memorystore takes care of high availability for you.
  • Use Cloud Storage to store backups, build artifacts, and user uploads.

For more information, see Deploying production-ready GitLab on Google Kubernetes Engine.

Authentication and access control for GitLab

For auditing or debugging purposes, it's important that you're able to identify who has made a policy change, and that the identities are also used by your IT staff to manage various resources.

Use unified identities

By design, you interact with Anthos Config Management through a Git repository where you create, update, and delete resources. The Git repository is where you control which users can do what, and where you can audit all activities. Implementing access control and auditing is easier if the identities used by your employees are the same everywhere: on Google Cloud, in GitLab, and in your on-premises systems. Unified identities let you cross-reference permissions and audit logs across different systems.

In GitLab, you can audit events across groups, projects, and instances and query the system logs for GitLab service-level events. GitLab Audit Events features span enterprise licensed tiers.

If you're not using Google as an identity provider

You can federate Google Cloud with many of the popular IdPs such as Active Directory, Azure Active Directory, or Okta. You can configure GitLab to use Active Directory or Okta.

If you're using Google as an identity provider

If you're using Google as an identity provider, you can either use the Google OAuth 2.0 OmniAuth Provider or Secure LDAP. However, if you want to synchronize groups with GitLab, as recommended by the next section, then you need to use Secure LDAP. Secure LDAP is available with Cloud Identity Premium and G Suite Enterprise. For more information, see About the Secure LDAP service.

Synchronize groups with GitLab

Regardless of the technology used to identify your users, it's useful to group users to configure their access. Groups let you think of these users at a higher level when granting permissions: "I give administrative access to the production environments to my operations group" instead of "I give administrative access to the production environments to Alice, Bob, Claudia, and Dinesh." If your Human Resources processes are well integrated with your IT systems, group membership is also automatically managed when an employee is hired, leaves the company, or changes roles. If you're using groups, this integration means that you don't usually have to update your access control settings.

Because Anthos Config Management is a sensitive system, relying on user groups to control access to Anthos Config Management is important. In GitLab, you can share a project with a group of users and specify what access level members of this group get.

Enforce two-factor authentication

You should use two-factor authentication (2FA), also known as 2-step verification, to improve your organisation's security. Stolen credentials and phishing attacks are two common attack vectors, and 2FA helps to protect against both. Because of the sensitive nature of the Anthos Config Management Git repository, the accounts of the users interacting with Anthos Config Management should be as protected as much as possible, which means enabling 2FA for those users.

If you configure single sign-on (SSO) for GitLab, you should enforce 2FA with your SSO provider. If you use Google as an SSO provider, you can enable 2FA.

If you haven't configured SSO for GitLab, then you can enforce 2FA on GitLab directly.

Use deploy keys or tokens to connect Anthos Config Management agents to GitLab

As described in the Installing Anthos Config Management document, you can connect Anthos Config Management agents to the repository using the usual methods for Git: HTTP(s) or SSH. If you use HTTP(s) to access repositories hosted on GitLab, don't use a user account, which can cause problems such as licensing and users using their own accounts to configure Anthos Config Management. Deploy keys and deploy tokens from GitLab are a better alternative. A deploy key is an SSH key that isn't linked to a particular user, but that automated systems can use to perform Git operations, which is the type of secure access that Anthos Config Management needs. A deploy token has the same usage as a deploy key, but you can use it to authenticate over HTTP(s) instead of SSH.

Unique deploy keys and tokens let you control and revoke the access of the Anthos Config Management agents to the Git repository on a per-cluster basis. Avoid using global deploy keys for Anthos Config Management and group deploy tokens. You can use them for other purposes than Anthos Config Management, leading to unintended side effects if their configuration is changed.

Manage who can approve merge requests

By default, GitLab uses roles to grant permissions on GitLab projects. For example, a user with the Developer role can open a merge request, and a user with the Maintainer role can approve the request. While this permission system can work well at first for Anthos Config Management, you might run into problems as your Kubernetes and Anthos Config Management footprint grows. The maintainers of the Anthos Config Management repository might get overwhelmed by the number of merge requests they have to process and approve.

You can take advantage of GitLab premium features to help with this problem:

  • You can use Code Owners to delegate approval permissions on a file or directory basis. This feature is useful because Anthos Config Management uses a directory structure to describe clusters and namespaces. By using Code Owners, you can delegate approval permissions on a specific namespace across all your clusters, for example.
  • You can use the merge request approvals to require different people from different teams to approve a request before it's merged. For example, you can require two approvers, someone from the operations team and someone from the security team to approve a merge request.

The following diagram illustrates the technical division of an organization and demonstrates an example of how approval delegation can work in a production environment.

Example architecture of the hierarchy in an organization.

The preceding diagram has a Chief Technical Officer (CTO) with multiple teams reporting to them: a security team, a platform team, and multiple app teams.

The Anthos Config Management Git repository of this organisation has the following structure (showing only directories, not files):

.
├─ system
├─ clusterregistry
├─ cluster
└─ namespaces
   ├─ cicd
   ├─ audit
   └─ applications
      ├─ team-a
      └─ team-b

For more information about each directory, see Using the Anthos Config Management repo.

With this structure, the organization can use the previously mentioned GitLab features mentioned to implement the following process:

  • Any modification to the root of the repository or the system directory must be approved by the CTO.
  • Any modification to the clusterregistry directory must be approved by a member of the platform team.
  • Any modification to the cluster directory must be approved by both a member of the platform team and a member of the security team.
  • Any modification to the namespaces directory must be approved by a member of the platform team.
  • Any modifications to the subdirectories in the namespaces directory must be approved by the following teams:

    • The cicd subdirectory represents a namespace dedicated to continuous integration and continuous delivery (CI/CD) tooling. Any change must be approved by a member of the platform team.
    • The audit subdirectory represents a namespace dedicated to auditing tools. Any change must be approved by a member of the security team.
    • The applications subdirectory contains all the resources created for every app namespace. Any change must be approved by both a member of the platform team and a member of the security team.
    • The team-a and team-b subdirectories represent the namespaces dedicated to team A and team B. Any change must be approved by the lead of that team.

Implementing this process is easier if the groups from your identity provider are synchronized with GitLab than if the groups are not synchronized. You can require various approvals from different groups for a merge request. For more information, see Synchronize groups with GitLab and Editing approvals.

Disable shared Runners

You can use GitLab CI to automatically test your Anthos Config Management policies before deploying them. GitLab CI uses Runners to run the jobs you want. Those Runners can either be shared with the whole GitLab instance, in which case they're maintained by the same team as the GitLab instance itself, or dedicated to GitLab groups or GitLab projects.

Because of the sensitive nature of the Anthos Config Management repository, you should avoid using shared Runners for testing Anthos Config Management code. Instead, use runners that are dedicated to the Anthos Config Management projects and that are maintained by the people who can approve merge requests.

Summary of the recommendations

This section summarizes the recommendations from the previous sections. If you use GitLab to host the Anthos Config Management Git repository, we recommend the following:

  • Establish control over who has administrative access to GitLab because those people also get administrative access to Anthos Config Management.
  • Update GitLab regularly, and whenever a security patch is released.
  • Dedicate a Cloud project to GitLab under your Organization node if you host GitLab on Google Cloud.
  • Configure all your systems, including GitLab, to use the same identity provider.
  • If possible, synchronize your existing user groups with GitLab using the LDAP Group Sync feature, which is a licensed feature of GitLab Enterprise.
  • Enforce 2FA for the users interacting with Anthos Config Management to harden against issues with stolen and phished credentials.
  • When configuring Anthos Config Management agents, do the following:
    • Configure Anthos Config Management agents to use SSH and deploy keys or HTTPS and deploy tokens to connect to GitLab.
    • Avoid using non-encrypted HTTP.
    • Create a unique deploy key or token per Kubernetes cluster in your Anthos Config Management repository.
    • Configure the deploy keys and tokens directly in the Anthos Config Management repository and avoid using global deploy keys and group deploy tokens for Anthos Config Management.
  • Watch out for delays in the approval of merge requests on the Anthos Config Management repository. If you see that they're unreasonably increasing, use Code Owners or the advanced merge request approvals to delegate approval permissions to more people in your organization.
  • Disable shared Runners for your Anthos Config Management project and use only runners dedicated to this project.

What's next