This document shows you how to manage granting temporary privileges in
Kubernetes clusters, using a GitOps workflow and a new Kubernetes
CustomResourceDefinition
(CRD) controller. This document is for platform
operations teams, security operations teams, and developers who want to grant
temporary roles in Kubernetes. It assumes that you're familiar with
Google Kubernetes Engine (GKE), Anthos, and working with Git repositories.
When you use Anthos Config Management, you can ensure that the configurations that are stored in Git repositories are consistent with the connected Kubernetes clusters. This GitOps approach lets you manage and deploy common configurations with a process that is auditable, transactional, reviewable, and version-controlled.
When you use GitOps methods, your operations team and the development team don't need persistent write privileges to Kubernetes clusters. Instead, they write to the Git repository and use a Git workflow to control the write operations. A GitOps operator like Config Sync periodically pulls the state from the Git repository to the clusters.
However, there are still scenarios in which the operations team or development
team members need to perform operations on the Kubernetes cluster, other than
read operations. For example, if a development team member wants to use the
kubectl exec
command to run diagnostic commands inside the Pod, they might
need a privilege that allows the write
action to the pod/exec
resource.
Instead of granting a privilege permanently, you can grant the privilege in a
time-constrained or transient manner. Granting transient privileges ensures
that the grantee can only perform the actions allowed by the privileges during
the allotted time.
The Git repository that you clone in this tutorial includes the Kubernetes
custom resources
TransientRoleBinding
and TransientClusterRoleBinding
, which let you grant
privileges in a transient manner. Compared to the Kubernetes RBAC objects
RoleBinding
and ClusterRoleBinding
, the transient objects have the following
additional fields:
validFrom
: indicates that the role binding is effective only after the specified time.validUntil
: indicates that the role binding is effective only before this time.
When the TransientRoleBinding
or TransientClusterRoleBinding
Kubernetes
custom resources are created in the cluster, the grantee can perform the
operations allowed during the specified time.
GitOps workflow
The following diagram shows the GitOps workflow:
The diagram illustrates a scenario in which a developer wants an escalated privilege. The platform administrator is willing to review and approve the developer's request. The diagram shows the following steps:
- The developer prepares the request by creating a new
TransientRoleBinding
object in the Git repository. - The developer submits a GitHub pull request (PR). Although not shown in the diagram, you could alternatively use GitLab and a merge request.
- The platform administrator reviews the request and decides whether or
not to accept the request.
- If the platform administrator denies the request, the developer doesn't gain access to any resources.
- If the platform administrator accepts and merges the request, the Config
Management Operator in the Kubernetes cluster creates the
TransientRoleBinding
object in the cluster. TheTransientRoleBinding
controller creates a correspondingRoleBinding
object when current time reaches thevalidFrom
time.- Between the
validFrom
andvalidUntil
times that are set on theTransientRoleBinding
object, the developer can access the resources granted by theTransientRoleBinding
object. - When the
TransientRoleBinding
object expires, theTransientRoleBinding
controller automatically deletes theRoleBinding
object and the developer can no longer access the resource.
- Between the
Objectives
- Install
Metacontroller
and the
TransientRoleBinding
controller. - Use Config Sync to set up transient Kubernetes permissions in the GitOps style.
Costs
In this document, you use the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage,
use the pricing calculator.
Before you begin
Before you begin this tutorial, complete the following steps:
-
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
- Create a GKE Standard or Autopilot cluster.
- Register the cluster to a Google Cloud fleet.
- If you don't have a GitHub account, create a GitHub account.
- Fork the Anthos Config Management samples repository in your account.
- Add a new deploy key to the repository that you just forked. You need the location of your private key to complete this tutorial. Be mindful about how you download the file and where you store it because it can be used to authenticate to the Git repository.
Clone the repository
Create a local clone of the forked Anthos Config Management samples repository to use as your working copy:
In Cloud Shell, clone your fork:
git clone git@github.com:GIT_USERNAME/anthos-config-management-samples.git
Replace
GIT_USERNAME
with your GitHub username.Configure your local system to authenticate to GitHub so that you can push to the repository.
Configure your cluster
In this section, you create Kubernetes Secrets and install Config Sync to configure your cluster.
Create Kubernetes Secrets for the Git credentials
- Get the private key that you registered with your GitHub repository.
In Cloud Shell, create a Secret for the root repository using an SSH key pair:
kubectl create ns config-management-system && \ kubectl create secret generic git-creds \ --namespace=config-management-system \ --from-file=ssh=/PATH_TO_PRIVATE_KEY
Replace
PATH_TO_PRIVATE_KEY
with the name of the private key file. The correct file doesn't have a.pub
extension.Protect the private key on your local disk, or otherwise delete it.
Install Config Sync
In this section, you configure Config Sync in your GKE cluster.
In Cloud Shell, create a file named
apply-spec.yaml
and copy the following content into it:# apply-spec.yaml applySpecVersion: 1 spec: configSync: # Set to true to install and enable Config Sync enabled: true sourceFormat: unstructured syncRepo: git@github.com:GIT_USERNAME/anthos-config-management-samples.git syncBranch: main secretType: ssh policyDir: config-sync-quickstart/multirepo/root
Replace
GIT_USERNAME
with your GitHub username.Apply the
apply-spec.yaml
file to your GKE cluster:gcloud beta container hub config-management apply \ --membership=MEMBERSHIP \ --config=PATH_TO_APPLY_SPEC \ --project=PROJECT_ID
Replace the following:
MEMBERSHIP
: the membership name that you chose when you registered your cluster. To get the name, rungcloud container hub memberships list
.PATH_TO_APPLY_SPEC
: the path to theapply-spec.yaml
file that you created in the preceding step.PROJECT_ID
: your Google Cloud project ID.
When the configuration is complete and Config Sync is installed on your cluster, the following message is displayed:
Waiting for Feature Config Management to be updated...done.
If you want to verify that the Config Management Operator is running, you can list all Pods running in the config-management-system namespace:
kubectl get pods -n config-management-system
The output is similar to the following:
NAME READY STATUS RESTARTS AGE admission-webhook-7dbc55cbf5-9thcj 1/1 Running 0 6d18h admission-webhook-7dbc55cbf5-pmrxt 1/1 Running 0 6d18h ns-reconciler-gamestore-67ff4dcbc4-x4vnh 3/3 Running 0 14m reconciler-manager-7cdb699bf8-8lvll 2/2 Running 0 6d18h root-reconciler-84f976b74d-mh6zd 3/3 Running 0 14m
If you want to check whether objects are synchronized to the Kubernetes cluster, you can Use kubectl to examine Config Sync resources or Monitor RootSync and RepoSync objects.
Install controllers
The controller that manages the transient objects is created using Metacontroller. Metacontroller is an add-on for Kubernetes that helps you to write and deploy custom controllers by calling a webhook on Kubernetes object actions. The controller's webhook uses OpenPolicyAgent (OPA). The logic in this webhook is written using the Rego language, which is the same language used by the Policy Controller component of Anthos Config Management.
In this section, you install Metacontroller and the TransientRoleBinding
and
TransientClusterRoleBinding
controllers to your cluster.
In Cloud Shell, add the Metacontroller version v2.1.3 manifest files to your repository:
mkdir -p config-sync-quickstart/multirepo/root # Create a remote in current git repo git remote add metacontroller git@github.com:metacontroller/metacontroller.git # Then fetch objects related to the desired tag from the remote git fetch --no-write-fetch-head -n metacontroller \ refs/tags/v2.1.3:refs/metacontroller/tags/v2.1.3 # Extract only necessary file from the commit corresponding to the tag # - the git archive will export files related to the commit as a tar archive # - the tar command extracts files except for the "--exclude" files to a desired path # - the "--transform" argument renames the file git archive 8ad5709134ae1eba02483d4126d57d1be92dd627|\ tar --transform='flags=r;s|manifests/production\(.*\)|.\1|g' \ --exclude=kustomization.yaml \ --exclude=metacontroller-crds-v1beta1.yaml -x \ -C config-sync-quickstart/multirepo/root manifests/production
The preceding commands are deterministic, which means wherever or whenever you run the commands, the effect will be the same. The
git
andtar
commands are used because they are shipped with major distributions, and because their behavior is stable across versions. Therefore, the output is always the same.Prune the
.status
field from themetacontroller-crds-v1.yaml
andmetacontroller-crds-v1beta1.yaml
files.YQDIGEST="50f1c495254af578c16bdb7d9df164a72fffa2928186cf3c53c67a7303e90c50" docker run --rm -v "$(pwd)":/workdir -u "$UID" \ --security-opt=no-new-privileges --cap-drop all --network none \ mikefarah/yq@sha256:"$YQDIGEST" \ -i eval-all 'del(.status)' \ config-sync-quickstart/multirepo/root/metacontroller-crds-v1.yaml
Add the
TransientRoleBinding
andTransientClusterRoleBinding
controllers manifest files to your repository:mkdir -p config-sync-quickstart/multirepo/root # Create a remote in current git repository git remote add transient-role-binding \ https://github.com/GoogleCloudPlatform/k8s-transient-role-binding.git # Then fetch objects related to the desired reference from the remote git fetch --no-write-fetch-head -n transient-role-binding \ refs/heads/main:refs/transient-role-binding/main # Extract only the necessary file from the commit corresponding to the reference # - the git archive will export files related to the commit as a tar archive # - the tar command only extracts the files to a desired path # - the "--transform" argument renames the path git archive cfaec879c55cb129a6877967bbbdd10874c4b1cb|\ tar --transform='flags=r;s|controller\(.*\)|.\1|g' -x \ -C config-sync-quickstart/multirepo/root controller
Commit the changes to your root repository:
git add config-sync-quickstart/multirepo/root/metacontroller-crds-v1.yaml \ config-sync-quickstart/multirepo/root/metacontroller-namespace.yaml\ config-sync-quickstart/multirepo/root/metacontroller-rbac.yaml \ config-sync-quickstart/multirepo/root/metacontroller.yaml \ config-sync-quickstart/multirepo/root/opa-webhook/ \ config-sync-quickstart/multirepo/root/transient-clusterrolebinding-metacontroller.yaml \ config-sync-quickstart/multirepo/root/transient-clusterrolebinding.yaml \ config-sync-quickstart/multirepo/root/transient-rolebinding-metacontroller.yaml \ config-sync-quickstart/multirepo/root/transient-rolebinding.yaml git commit -m "Install metacontroler and transient controller" git push
If the commands succeed, Config Sync will install the manifests in your cluster. To verify, run the following command:
kubectl get CompositeController
The output is similar to the following:
NAME AGE transient-clusterrolebindings-controller 12h transient-rolebinding-controller 12h
Test the controller
The following table shows a mapping between the transient objects and Kubernetes RBAC objects:
The Transient objects | Kubernetes RBAC objects |
---|---|
TransientRoleBinding |
RoleBinding |
TransientClusterRoleBinding |
ClusterRoleBinding |
The TransientRoleBinding
and TransientClusterRoleBinding
objects each have
the following fields:
validUntil
: A timestamp in the RFC 3339 format. TheRoleBinding
orClusterRoleBinding
will only take effect before this time.validFrom
: A timestamp in the RFC 3339 format. TheRoleBinding
orClusterRoleBinding
will only take effect after this time.roleRef
: The same value as inRoleBinding
andClusterRoleBinding
.subjects
: The same value as inRoleBinding
andClusterRoleBinding
.
For each valid Transient*
object, if the current time is between the
validFrom
and validUntil
values, the controller creates a corresponding
*RoleBinding
object. Otherwise, the corresponding *RoleBinding
or
ClusterRoleBinding
object will be deleted.
In this section, you use a pull request workflow to test the
TransientRoleBinding
controller.
In Cloud Shell, create a new branch in the Git repository and switch to it:
git checkout -b proposal-new-rolebinding
Create a new file
config-sync-quickstart/multirepo/root/test1-trb.yaml
with the following content:apiVersion: example.com/v1 kind: TransientRoleBinding metadata: name: test1 namespace: default validUntil: VALID_UNTIL_TIME validFrom: VALID_FROM_TIME roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: view subjects: - kind: User name: GRANTEE_NAME
Replace the following:
VALID_FROM_TIME
: the RFC 3339 format time when the role binding should start to be valid. For example,2022-02-22T22:22:22Z
.VALID_UNTIL_TIME
: the RFC 3339 format time when the role binding should stop being valid.GRANTEE_NAME
: the username that you want to bind the role to.
Commit and push the change:
git add config-sync-quickstart/multirepo/root/test1-trb.yaml git commit -m 'Role Granting Proposal' git push origin proposal-new-rolebinding
On GitHub, create a pull request for the change, and then merge the pull request
Verify that the object
TransientRoleBinding/test1
has been created in the cluster:kubectl get -n default TransientRoleBinding
The output is similar to the following:
NAME AGE test1 104s
During the time between the
validFrom
andvalidUntil
times, theRoleBinding/test1
object should exist in the cluster. To verify the object exists, run the following command:kubectl get -n default RoleBinding/test1
The output is similar to the following:
NAME ROLE AGE test1 ClusterRole/view 2m11s
After the
validUntil
time, theRoleBinding/test1
object should no longer exist. To verify that the object doesn't exist, wait until after thevalidUntil
time, and then run the preceding command again. The output is the following:Error from server (NotFound): rolebindings.rbac.authorization.k8s.io "test1" not found
For troubleshooting information, see the Troubleshooting page of the MetaController documentation.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, delete the project that contains the resources and delete the individual resources.
Delete the project
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Delete individual resources
- Delete the cluster that you used for testing.
- Delete the local clone of your fork.
- Delete the local private SSH key, if applicable.
- Delete your fork of the example repository. This action deletes any deploy keys associated with the repository.
What's next
- Learn about validating configs.