Use cases for troubleshooting access problems on Google Cloud

Last reviewed 2022-09-29 UTC

This document describes how to use Google Cloud tools to troubleshoot use cases related to problems accessing Google Cloud resources. This document doesn't describe how to troubleshoot end-user access to your applications. This document assumes that you're familiar with Troubleshooting policy and access problems on Google Cloud. The troubleshooting policy and access problems document describes Google Cloud services that can enforce access policies and the troubleshooting tools that Google Cloud provides.

Troubleshooting approach

The first step in troubleshooting an access-related problem is deciding how to troubleshoot the issue. The following diagram provides a flowchart of one approach to troubleshooting access problems. The flowchart assumes that you have the appropriate permissions to complete the troubleshooting steps or that you can work with someone who has the required permissions.

A flowchart of one approach to troubleshooting access problems.

The preceding diagram outlines the following steps:

  1. Verify user access in the Google Cloud console and in Cloud Shell. If all access is denied, then check audit logs for error severity entries.
    1. If there are error severity entries, check required permissions.
      1. If you can grant permissions to resolve access issues, resolve the issue.
      2. If you can't resolve access issues, then contact Cloud Customer Care.
    2. If there aren't error severity entries, contact Customer Care.
  2. If there aren't access issues, then check for network problems. If you find network problems, resolve the issue.
  3. If there aren't network issues, then check quota allocation. If you find quota allocation problems, follow your process to increase quota and then resolve the issue.
  4. If there aren't quota allocation issues, then check audit logs for error severity entries.
    1. If there are error severity entries, check required permissions.
      1. If you can grant permissions to resolve access issues, resolve the issue.
      2. If you can't resolve access issues, then contact Customer Care.
    2. If there aren't error severity entries, contact Customer Care.

The following sections provide details about how to complete each troubleshooting step.

Verify user access

Check whether user access is denied at both the Google Cloud console and Google Cloud CLI:

  1. Log into the Google Cloud console as the affected user.
  2. Try to access the resource; for example, if the user reported that they can't start a VM, try starting a VM.
  3. In the Google Cloud console, open Cloud Shell, and run the following gcloud CLI command from a session that the user is logged in to. This command helps to verify whether the user is logged in to the correct identity and whether they can access the resource using the gcloud CLI.

    gcloud auth list
    

    The output returns the account that the user is logged in to.

  4. Check whether the preceding command returns the correct identity.

    • If the preceding command returns the wrong identity, ask the user to log in to the correct identity. Then determine whether access is still a problem when they are using the correct identity.
    • If the preceding command returns the correct identity and you get a permission denied message, run the gcloud CLI command for the action that user wants to complete. To get more detail about the denial, add the flags --log-http and --verbosity=debug.
  5. If you identify a permissions-related issue, skip to Check required permissions.

Check for network problems

  1. Check for network problems by using the VPC Service Controls troubleshooting guidance. If you see a VPC Service Controls denial error message, resolve the issue.
  2. Check the network paths from source to destination by using Connectivity Tests. For information about how to test connectivity between two VM instances in the same or peered networks, see Testing within VPC networks.
  3. Check the firewall configuration by using Firewall Insights to show any shadowed firewall rules and any deny rules that might be affecting access paths.

Check quota allocation

  • If you don't find any network-related issues, then check your quota allocation. If there appears to be a quota-related problem, then follow your defined process to increase quota if appropriate.

Check audit logs

  • Check the audit log files by using the Logs Explorer. Logs Explorer provides a summary of the severity of a log entry. An error log severity is recorded when an API call fails; for example, an error is recorded if a user tries to create a Cloud Storage bucket but doesn't have the permissions to call storage.buckets.create.

    The summary of a log entry provides the following details:

    • Target resource name
    • Principal (who is trying to access the resource)
    • API call that the principal tried to execute

Check required permissions

To debug why the principal doesn't have the required permissions, you use Policy Troubleshooter:

  1. If the checks indicate that access is not granted, review which roles Policy Troubleshooter indicates contain the permission.
  2. Use the Policy Analyzer to see what other principals have access to the resource that the principal is denied access to.
  3. Add the principal's identity to the Google group that has a binding to the appropriate role.

Contact Customer Care

If you have completed the preceding troubleshooting sections and you're unable to resolve the issue, then contact Customer Care for help. Provide as much information as possible, as described in the troubleshooting guide section Escalating to Customer Care.

Example use cases for troubleshooting

This section provides in-depth walkthroughs for how to troubleshoot specific use cases using the preceding troubleshooting steps. For all the use cases, you must have the appropriate permissions to use the troubleshooting tools that are described in Troubleshooting policy and access problems on Google Cloud.

The following use cases assume that you are using Google Groups to manage user access. Using Google Groups to grant permissions lets you manage access at scale. Each member of a Google Group inherits the Identity and Access Management (IAM) roles that are granted to that group. This inheritance means that you can use a group's membership to manage users' roles instead of granting IAM roles to individual users.

Role delegator troubleshoots developer access to a Compute Admin role

As a role delegator, I need to understand why I cannot grant a certain role to a developer. I regularly grant Compute Admin roles to new developers when they join my team. Today, I tried to grant the Compute Instance Admin role and was denied.

Following the flowchart to verify user access and check audit logs, you can confirm that this is a permission problem.

To be able to grant roles, you need the resourcemanager.projects.setIamPolicy permission. This permission can be granted as part of the following roles:

  • Organization Administrator role(roles/resourcemanager.organizationAdmin)
  • Folder IAM Admin role (roles/resourcemanager.folderIamAdmin)
  • Project IAM Admin role (roles/resourcemanager.projectIamAdmin)

To determine whether the role delegator has the resourcemanager.projects.setIamPolicy permission assigned, you use Policy Troubleshooter. If the permission is no longer assigned, check the following:

  1. Check whether an IAM recommendation was applied that might have rescinded the policy.
  2. If you know the last time that you were able to grant roles, check the logs between then and now to see if any setIam calls were made that might have changed the policies applied.
  3. Use the Policy Analyzer to check which principals have the resourcemanager.projects.setIamPolicy. The Policy Analyzer can help to verify whether the role delegator was removed from a group.

Cloud administrator troubleshoots developer access to BigQuery

As a cloud administrator, I need to understand why one of the developers can no longer run a query against a BigQuery dataset.

To troubleshoot this use case, first you verify user access and resolve any related issues. Then you check for network problems. This example assumes that you have determined there isn't an identity or network issue, but there is a permissions problem.

To troubleshoot the permissions problem, first you check team member permissions. If you don't find any discrepancies, you check logs to identify potential issues. If you don't find any issues from the logs, you can contact Customer Care for help.

Check team member permissions

To check team member permissions, ask the developer when they were last able to successfully run the query. Then determine whether anyone on the developer's team was previously able to run the query, and if that person can still successfully run the query. If no team members can run the query, proceed to the Check logs section.

If a team member can still run the query, complete the following steps:

  1. Check the IAM permissions that are granted to both developers and determine whether the permissions differ. When you review permissions, look for the following:
  2. If the permissions don't differ, proceed to the next section, Check logs. If the permissions do differ, complete the following steps:
    1. Check whether both team members are in the same Google group.
      • If they aren't in the same Google group, determine whether they should be.
      • If they were previously in the same Google group, check with the group administrator to determine why changes were made.
  3. After you address the permissions issue, check whether the developer is able to run the query.
    • If the developer can run the query, resolve the issue.
    • If the developer can't run the query, proceed to the next section, Check logs.

Check logs

If no team members can complete the query, or if addressing permissions issues didn't resolve the problem, you check logs to determine what might have changed since the developer was last able to complete the query.

  1. Determine where to view the logs for the last successfully completed task. In this example, the logs are exported to BigQuery.
  2. Run queries against the exported logs in BigQuery:
    1. Run one query that includes the last successful date that the developer had access so that you can see what success looks like.
    2. Run the same query for a time when the request failed.
  3. If there is something identifiable in the logs, resolve the issue using Policy Troubleshooter and the Policy Analyzer as described in the Check required permissions section.
  4. If you're still unable to resolve the issue, Contact Customer Care.

Developer needs permissions to GKE

As a developer, I need to understand why I cannot start, delete, or update a Pod or create a deployment in the Google Kubernetes Engine (GKE) cluster that I have access to. I'm not sure which principal I am when I make the call with the kubectl command-line tool, or what permissions I have.

The IAM role that lets a developer start, delete, or update a Pod or create a deployment in the GKE cluster is the Google Kubernetes Engine Developer role (roles/container.developer). The role should be granted in the project where the GKE cluster resides.

To troubleshoot this use case, first you verify user access and resolve any related issues. After you validate identity, you ensure that the kubectl tool is configured to point to the right cluster. For information about how to ensure that the identity used by the kubectl tool is correct and that the kubectl tool is pointing to the correct cluster, see Configuring cluster access for kubectl. This example assumes that you have determined that there isn't a network issue or a quota-related issue, but there is a permissions problem.

To begin troubleshooting the permissions problem, you check the audit logs to see what has changed between the last successful action from the developer and the time the issue was first reported.

  1. If the developer had access before, check whether a team member who also has permissions to do the same actions can still complete the actions. If the team member has access, use the Policy Analyzer to help determine what access the team member has. If you're following best practices, both developers should have the same group membership and permissions.

    1. If their permissions are the same and neither developer can carry out the actions against the resource, check whether IAM recommendations were applied that could affect access.
    2. If their permissions are different, investigate why the difference occurred:
      1. Check the audit logs for the last time the developer could successfully carry out the task. Compare the logs to when they most recently tried and couldn't complete the task.
      2. Check IAM recommendations and apply any recommendations.
  2. If there isn't another team member to validate with, use Policy Troubleshooter and the Policy Analyzer as described in Check required permissions. For more information, see the following resources:

  3. If you're still unable to resolve the issue, contact Customer Care.

Security administrator troubleshoots developer access

As a security administrator, I need to understand why a developer couldn't perform an action. What is the best role and the location to assign to that role so that the role doesn't provide more access than the user needs?

In this scenario, the developer needs to be able to do the following:

  • Upload objects to a Cloud Storage bucket. The developer shouldn't be able to view, delete, or overwrite existing objects in the bucket.
  • Start instances in their development project.

To understand what permissions are required in order to carry out the task that your developer needs to undertake, you use Policy Troubleshooter and the IAM understanding roles reference page. In this example, you need to grant your developer a role that includes the following permissions:

  • To allow the developer to stop and start instances: compute.instances.start and compute.instances.stop
  • To allow the developer to upload objects to Cloud Storage buckets: storage.objects.create

The following roles include the preceding permissions and adhere to the principle of least privilege:

  • At the bucket level for the bucket that the developer is allowed to upload objects to, grant the Storage Object Creator role (roles/storage.objectCreator).
  • At the project level of the developer's assigned project or at the instance that the developer needs to be able to restart, grant the Compute Instance Admin role (roles/compute.instanceAdmin).

Typically, managing instances might also require actions such as adding disks. In that case, the roles/compute.instanceAdmin role might be an appropriate way to grant the required permissions while still adhering to the principle of least privilege.

Cloud administrator troubleshoots why an application can't write to Cloud Storage

As a cloud administrator, I need to understand why an application running on GKE can no longer write to Cloud Storage.

In this scenario, an application running on GKE needs to be configured as follows:

  • On a specified bucket, the application can add, update, and delete objects.
  • The application can't have access to any other buckets in the organization.

The following troubleshooting approach assumes that you're using Workload Identity, which we recommend. Using Workload Identity, you can configure a Kubernetes service account to act as a Google service account. Pods running as the Kubernetes service account automatically authenticate as the Google service account when they access Google Cloud APIs.

In this example, you validate that you've granted appropriate permissions to the Google service account that you're using for Workload Identity for your cluster. To understand the permissions that are required to complete your application's tasks, you use Policy Troubleshooter and the IAM understanding roles reference page. To configure and verify permissions, do the following:

  1. Assign the following permissions to the Google service account that you're using for Workload Identity:

    1. At the bucket for which the application is allowed to have full control of objects, including listing, creating, viewing, and deleting objects, grant the Storage Object Admin role (roles/storage.objectAdmin).
    2. To configure the Kubernetes service account to impersonate the Google service account, set an IAM policy binding:

      gcloud iam service-accounts add-iam-policy-binding \
        --role roles/iam.workloadIdentityUser \
        --member "serviceAccount:PROJECT_ID.svc.id.goog[KUBERNETES_NAMESPACE/KSA_NAME]" \
        GSA_NAME@PROJECT_ID.iam.gserviceaccount.com
      

      Replace the following values:

      • PROJECT_ID: your project ID
      • KSA_NAME: the Kubernetes service account that is making the request
      • KUBERNETES_NAMESPACE: the Kubernetes namespace where the Kubernetes service account is defined
      • GSA_NAME: the Google service account
  2. Set the iam.serviceAccounts.setIamPolicy permission on the project:

    • Add the following annotation to the Kubernetes service account:

      iam.gke.io/gcp-service-account=GSA_NAME@PROJECT_ID
      
  3. Verify that the Google service account has the right permissions and that Workload Identity is configured correctly:

    1. At the bucket for which the application is allowed to have full control of objects, view the IAM policy for the bucket and verify that the Google service account has the roles/storage.objectAdmin role.
    2. If the permissions aren't correct, amend the policy to grant the Google service account the required permission.
  4. Verify that Workload Identity is configured correctly by checking that there is a binding to the Kubernetes service account:

    gcloud iam service-accounts get-iam-policy \
      GSA_NAME@PROJECT_ID.iam.gserviceaccount.com
    

    The output looks like the following:

    - members:
      - serviceAccount:PROJECT_ID.svc.id.goog[KUBERNETES_NAMESPACE/KSA_NAME]
      role: roles/iam.workloadIdentityUser
    

    If the binding is incorrect, repeat the preceding steps to assign permissions to the service account.

What's next

  • Explore reference architectures, diagrams, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.