Time-constrained role binding for Kubernetes with GitOps

Last reviewed 2023-01-12 UTC

This document shows you how to manage granting temporary privileges in Kubernetes clusters, using a GitOps workflow and a new Kubernetes CustomResourceDefinition (CRD) controller. This document is for platform operations teams, security operations teams, and developers who want to grant temporary roles in Kubernetes. It assumes that you're familiar with Google Kubernetes Engine (GKE), Anthos, and working with Git repositories.

When you use Anthos Config Management, you can ensure that the configurations that are stored in Git repositories are consistent with the connected Kubernetes clusters. This GitOps approach lets you manage and deploy common configurations with a process that is auditable, transactional, reviewable, and version-controlled.

When you use GitOps methods, your operations team and the development team don't need persistent write privileges to Kubernetes clusters. Instead, they write to the Git repository and use a Git workflow to control the write operations. A GitOps operator like Config Sync periodically pulls the state from the Git repository to the clusters.

However, there are still scenarios in which the operations team or development team members need to perform operations on the Kubernetes cluster, other than read operations. For example, if a development team member wants to use the kubectl exec command to run diagnostic commands inside the Pod, they might need a privilege that allows the write action to the pod/exec resource. Instead of granting a privilege permanently, you can grant the privilege in a time-constrained or transient manner. Granting transient privileges ensures that the grantee can only perform the actions allowed by the privileges during the allotted time.

The Git repository that you clone in this tutorial includes the Kubernetes custom resources TransientRoleBinding and TransientClusterRoleBinding, which let you grant privileges in a transient manner. Compared to the Kubernetes RBAC objects RoleBinding and ClusterRoleBinding, the transient objects have the following additional fields:

  • validFrom: indicates that the role binding is effective only after the specified time.
  • validUntil: indicates that the role binding is effective only before this time.

When the TransientRoleBinding or TransientClusterRoleBinding Kubernetes custom resources are created in the cluster, the grantee can perform the operations allowed during the specified time.

GitOps workflow

The following diagram shows the GitOps workflow:

A developer follows the GitOps workflow to get temporary privileges from a platform administrator.

The diagram illustrates a scenario in which a developer wants an escalated privilege. The platform administrator is willing to review and approve the developer's request. The diagram shows the following steps:

  1. The developer prepares the request by creating a new TransientRoleBinding object in the Git repository.
  2. The developer submits a GitHub pull request (PR). Although not shown in the diagram, you could alternatively use GitLab and a merge request.
  3. The platform administrator reviews the request and decides whether or not to accept the request.
    1. If the platform administrator denies the request, the developer doesn't gain access to any resources.
    2. If the platform administrator accepts and merges the request, the Config Management Operator in the Kubernetes cluster creates the TransientRoleBinding object in the cluster. The TransientRoleBinding controller creates a corresponding RoleBinding object when current time reaches the validFrom time.
      1. Between the validFrom and validUntil times that are set on the TransientRoleBinding object, the developer can access the resources granted by the TransientRoleBinding object.
      2. When the TransientRoleBinding object expires, the TransientRoleBinding controller automatically deletes the RoleBinding object and the developer can no longer access the resource.

Objectives

  • Install Metacontroller and the TransientRoleBinding controller.
  • Use Config Sync to set up transient Kubernetes permissions in the GitOps style.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

Before you begin

Before you begin this tutorial, complete the following steps:

  1. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

  2. Create a GKE Standard or Autopilot cluster.
  3. Register the cluster to a Google Cloud fleet.
  4. If you don't have a GitHub account, create a GitHub account.
  5. Fork the Anthos Config Management samples repository in your account.
  6. Add a new deploy key to the repository that you just forked. You need the location of your private key to complete this tutorial. Be mindful about how you download the file and where you store it because it can be used to authenticate to the Git repository.

Clone the repository

Create a local clone of the forked Anthos Config Management samples repository to use as your working copy:

  1. In Cloud Shell, clone your fork:

    git clone git@github.com:GIT_USERNAME/anthos-config-management-samples.git
    

    Replace GIT_USERNAME with your GitHub username.

  2. Configure your local system to authenticate to GitHub so that you can push to the repository.

Configure your cluster

In this section, you create Kubernetes Secrets and install Config Sync to configure your cluster.

Create Kubernetes Secrets for the Git credentials

  1. Get the private key that you registered with your GitHub repository.
  2. In Cloud Shell, create a Secret for the root repository using an SSH key pair:

    kubectl create ns config-management-system && \
     kubectl create secret generic git-creds \
      --namespace=config-management-system \
      --from-file=ssh=/PATH_TO_PRIVATE_KEY
    

    Replace PATH_TO_PRIVATE_KEY with the name of the private key file. The correct file doesn't have a .pub extension.

  3. Protect the private key on your local disk, or otherwise delete it.

Install Config Sync

In this section, you configure Config Sync in your GKE cluster.

  1. In Cloud Shell, create a file named apply-spec.yaml and copy the following content into it:

    # apply-spec.yaml
    
    applySpecVersion: 1
    spec:
      configSync:
        # Set to true to install and enable Config Sync
        enabled: true
        sourceFormat: unstructured
        syncRepo: git@github.com:GIT_USERNAME/anthos-config-management-samples.git
        syncBranch: main
        secretType: ssh
        policyDir: config-sync-quickstart/multirepo/root
    

    Replace GIT_USERNAME with your GitHub username.

  2. Apply the apply-spec.yaml file to your GKE cluster:

    gcloud beta container hub config-management apply \
      --membership=MEMBERSHIP \
      --config=PATH_TO_APPLY_SPEC \
      --project=PROJECT_ID
    

    Replace the following:

    • MEMBERSHIP: the membership name that you chose when you registered your cluster. To get the name, run gcloud container hub memberships list.
    • PATH_TO_APPLY_SPEC: the path to the apply-spec.yaml file that you created in the preceding step.
    • PROJECT_ID: your Google Cloud project ID.

    When the configuration is complete and Config Sync is installed on your cluster, the following message is displayed:

    Waiting for Feature Config Management to be updated...done.
    

    If you want to verify that the Config Management Operator is running, you can list all Pods running in the config-management-system namespace:

    kubectl get pods -n config-management-system
    

    The output is similar to the following:

    NAME                                       READY   STATUS    RESTARTS   AGE
    admission-webhook-7dbc55cbf5-9thcj         1/1     Running   0          6d18h
    admission-webhook-7dbc55cbf5-pmrxt         1/1     Running   0          6d18h
    ns-reconciler-gamestore-67ff4dcbc4-x4vnh   3/3     Running   0          14m
    reconciler-manager-7cdb699bf8-8lvll        2/2     Running   0          6d18h
    root-reconciler-84f976b74d-mh6zd           3/3     Running   0          14m
    

If you want to check whether objects are synchronized to the Kubernetes cluster, you can Use kubectl to examine Config Sync resources or Monitor RootSync and RepoSync objects.

Install controllers

The controller that manages the transient objects is created using Metacontroller. Metacontroller is an add-on for Kubernetes that helps you to write and deploy custom controllers by calling a webhook on Kubernetes object actions. The controller's webhook uses OpenPolicyAgent (OPA). The logic in this webhook is written using the Rego language, which is the same language used by the Policy Controller component of Anthos Config Management.

In this section, you install Metacontroller and the TransientRoleBinding and TransientClusterRoleBinding controllers to your cluster.

  1. In Cloud Shell, add the Metacontroller version v2.1.3 manifest files to your repository:

    mkdir -p config-sync-quickstart/multirepo/root
    # Create a remote in current git repo
    git remote add metacontroller git@github.com:metacontroller/metacontroller.git
    # Then fetch objects related to the desired tag from the remote
    git fetch --no-write-fetch-head -n metacontroller \
      refs/tags/v2.1.3:refs/metacontroller/tags/v2.1.3
    # Extract only necessary file from the commit corresponding to the tag
    # - the git archive will export files related to the commit as a tar archive
    # - the tar command extracts files except for the "--exclude" files to a desired path
    # - the "--transform" argument renames the file
    git archive 8ad5709134ae1eba02483d4126d57d1be92dd627|\
      tar --transform='flags=r;s|manifests/production\(.*\)|.\1|g' \
      --exclude=kustomization.yaml \
      --exclude=metacontroller-crds-v1beta1.yaml -x \
      -C config-sync-quickstart/multirepo/root manifests/production
    

    The preceding commands are deterministic, which means wherever or whenever you run the commands, the effect will be the same. The git and tar commands are used because they are shipped with major distributions, and because their behavior is stable across versions. Therefore, the output is always the same.

  2. Prune the .status field from the metacontroller-crds-v1.yaml and metacontroller-crds-v1beta1.yaml files.

    YQDIGEST="50f1c495254af578c16bdb7d9df164a72fffa2928186cf3c53c67a7303e90c50"
    docker run --rm -v "$(pwd)":/workdir -u "$UID" \
      --security-opt=no-new-privileges --cap-drop all --network none \
      mikefarah/yq@sha256:"$YQDIGEST" \
      -i eval-all 'del(.status)'  \
      config-sync-quickstart/multirepo/root/metacontroller-crds-v1.yaml
    
  3. Add the TransientRoleBinding and TransientClusterRoleBinding controllers manifest files to your repository:

    mkdir -p config-sync-quickstart/multirepo/root
    # Create a remote in current git repository
    git remote add transient-role-binding \
      https://github.com/GoogleCloudPlatform/k8s-transient-role-binding.git
    # Then fetch objects related to the desired reference from the remote
    git fetch --no-write-fetch-head -n transient-role-binding \
      refs/heads/main:refs/transient-role-binding/main
    # Extract only the necessary file from the commit corresponding to the reference
    # - the git archive will export files related to the commit as a tar archive
    # - the tar command only extracts the files to a desired path
    # - the "--transform" argument renames the path
    git archive cfaec879c55cb129a6877967bbbdd10874c4b1cb|\
      tar --transform='flags=r;s|controller\(.*\)|.\1|g' -x \
      -C config-sync-quickstart/multirepo/root controller
    
  4. Commit the changes to your root repository:

    git add config-sync-quickstart/multirepo/root/metacontroller-crds-v1.yaml \
            config-sync-quickstart/multirepo/root/metacontroller-namespace.yaml\
            config-sync-quickstart/multirepo/root/metacontroller-rbac.yaml \
            config-sync-quickstart/multirepo/root/metacontroller.yaml \
            config-sync-quickstart/multirepo/root/opa-webhook/ \
            config-sync-quickstart/multirepo/root/transient-clusterrolebinding-metacontroller.yaml \
            config-sync-quickstart/multirepo/root/transient-clusterrolebinding.yaml \
            config-sync-quickstart/multirepo/root/transient-rolebinding-metacontroller.yaml \
            config-sync-quickstart/multirepo/root/transient-rolebinding.yaml
    git commit -m "Install metacontroler and transient controller"
    git push
    

    If the commands succeed, Config Sync will install the manifests in your cluster. To verify, run the following command:

    kubectl get CompositeController
    

    The output is similar to the following:

    NAME                                       AGE
    transient-clusterrolebindings-controller   12h
    transient-rolebinding-controller           12h
    

Test the controller

The following table shows a mapping between the transient objects and Kubernetes RBAC objects:

The Transient objects Kubernetes RBAC objects
TransientRoleBinding
RoleBinding
TransientClusterRoleBinding
ClusterRoleBinding

The TransientRoleBinding and TransientClusterRoleBinding objects each have the following fields:

  • validUntil: A timestamp in the RFC 3339 format. The RoleBinding or ClusterRoleBinding will only take effect before this time.
  • validFrom: A timestamp in the RFC 3339 format. The RoleBinding or ClusterRoleBinding will only take effect after this time.
  • roleRef: The same value as in RoleBinding and ClusterRoleBinding.
  • subjects: The same value as in RoleBinding and ClusterRoleBinding.

For each valid Transient* object, if the current time is between the validFrom and validUntil values, the controller creates a corresponding *RoleBinding object. Otherwise, the corresponding *RoleBinding or ClusterRoleBinding object will be deleted.

In this section, you use a pull request workflow to test the TransientRoleBinding controller.

  1. In Cloud Shell, create a new branch in the Git repository and switch to it:

    git checkout -b proposal-new-rolebinding
    
  2. Create a new file config-sync-quickstart/multirepo/root/test1-trb.yaml with the following content:

    apiVersion: example.com/v1
    kind: TransientRoleBinding
    metadata:
      name: test1
      namespace: default
    validUntil: VALID_UNTIL_TIME
    validFrom: VALID_FROM_TIME
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: view
    subjects:
    - kind: User
      name: GRANTEE_NAME
    

    Replace the following:

    • VALID_FROM_TIME: the RFC 3339 format time when the role binding should start to be valid. For example, 2022-02-22T22:22:22Z.
    • VALID_UNTIL_TIME: the RFC 3339 format time when the role binding should stop being valid.
    • GRANTEE_NAME: the username that you want to bind the role to.
  3. Commit and push the change:

    git add config-sync-quickstart/multirepo/root/test1-trb.yaml
    git commit -m 'Role Granting Proposal'
    git push origin proposal-new-rolebinding
    
  4. On GitHub, create a pull request for the change, and then merge the pull request

  5. Verify that the object TransientRoleBinding/test1 has been created in the cluster:

    kubectl get -n default TransientRoleBinding
    

    The output is similar to the following:

    NAME    AGE
    test1   104s
    
  6. During the time between the validFrom and validUntil times, the RoleBinding/test1 object should exist in the cluster. To verify the object exists, run the following command:

    kubectl get -n default RoleBinding/test1
    

    The output is similar to the following:

    NAME    ROLE               AGE
    test1   ClusterRole/view   2m11s
    

    After the validUntil time, the RoleBinding/test1 object should no longer exist. To verify that the object doesn't exist, wait until after the validUntil time, and then run the preceding command again. The output is the following:

    Error from server (NotFound): rolebindings.rbac.authorization.k8s.io "test1" not found
    

For troubleshooting information, see the Troubleshooting page of the MetaController documentation.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, delete the project that contains the resources and delete the individual resources.

Delete the project

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Delete individual resources

  • Delete the cluster that you used for testing.
  • Delete the local clone of your fork.
  • Delete the local private SSH key, if applicable.
  • Delete your fork of the example repository. This action deletes any deploy keys associated with the repository.

What's next