Authenticate to Dataproc

Stay organized with collections Save and categorize content based on your preferences.

This document describes how to authenticate to Dataproc programmatically.

For more information about Google Cloud authentication, see the authentication overview.

API access

Dataproc supports programmatic access. How you authenticate to Dataproc depends on how you access the API. You can access the API in the following ways:

Client libraries

The Dataproc client libraries provide high-level language support for authenticating to Dataproc programmatically. Client libraries support Application Default Credentials (ADC); the libraries look for credentials in a set of defined locations and use those credentials to authenticate requests to the API. With ADC, you can make credentials available to your application in a variety of environments, such as local development or production, without needing to modify your application code.

Google Cloud CLI

When you use the gcloud CLI to access Dataproc, you log in to the gcloud CLI with a Google Account, which provides the credentials used by the gcloud CLI commands.

If your organization's security policies prevent user accounts from having the required permissions, you can impersonate a service account, either by using the impersonate_service_account property or by using the --impersonate-service-account flag, which affects only the command for which you use it.

For more information about using the gcloud CLI with Dataproc, see the gcloud CLI reference pages.


You can authenticate to Dataproc from the command line by using Application Default Credentials. For more information, see Authenticate using REST.

If you want to use the API without using a client library, you can use Google's authentication library for your programming language. Alternatively, you can implement authentication in your code.

Set up authentication for Dataproc

How you set up authentication depends on the environment where your code is running.

The following options for setting up authentication are the most commonly used. For more options and information about authentication, see Authentication at Google.

Before you can complete these instructions, you must have completed basic setup for Dataproc, including enabling the API and installing gcloud CLI.

For a local development environment

If you plan to use client libraries, run code snippets, or use third-party tools such as Terraform in a local development environment, you must set up Application Default Credentials (ADC) in that environment. For REST requests from the command line, you use your gcloud credentials.

For information about the difference between your local ADC credentials and your gcloud credentials, see ADC credentials and gcloud credentials.

Client libraries or third-party tools

Set up Application Default Credentials (ADC) in your local environment:

  1. Create authentication credentials for your Google Account:

    gcloud auth application-default login

A login screen is displayed. After you log in, your credentials are stored in the local credential file used by ADC.

For more information about working with ADC in a local environment, see Local development environment.

REST requests from the command line

When you make a REST request from the command line, you can use your gcloud credentials by including gcloud auth print-access-token as part of the command that sends the request. The following example lists service accounts for the specified project. You can use the same pattern for any REST request.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your Google Cloud project ID.

To send your request, expand one of these options:


On Google Cloud

To authenticate a workload running on Google Cloud, you use the credentials of the service account attached to the compute resource where your code is running. For example, you can attach a service account to a Compute Engine virtual machine (VM) instance, a Cloud Run service, or a Dataflow job. This approach is the preferred authentication method for code running on a Google Cloud compute resource.

For most services, you must attach the service account when you create the resource that will run your code; you cannot add or replace the service account later. Compute Engine is an exception—it lets you attach a service account to a VM instance at any time.

  1. Set up authentication:

    1. Create the service account:

      gcloud iam service-accounts create SERVICE_ACCOUNT_NAME

      Replace SERVICE_ACCOUNT_NAME with a name for the service account.

    2. To provide access to your project and your resources, grant a role to the service account:

      gcloud projects add-iam-policy-binding PROJECT_ID --member="" --role=ROLE

      Replace the following:

      • SERVICE_ACCOUNT_NAME: the name of the service account
      • PROJECT_ID: the project ID where you created the service account
      • ROLE: the role to grant
    3. To grant another role to the service account, run the command as you did in the previous step.
    4. Grant your Google Account a role that lets you use the service account's roles and attach the service account to other resources:

      gcloud iam service-accounts add-iam-policy-binding --member="user:USER_EMAIL" --role=roles/iam.serviceAccountUser

      Replace the following:

      • SERVICE_ACCOUNT_NAME: the name of the service account
      • PROJECT_ID: the project ID where you created the service account
      • USER_EMAIL: the email address for your Google Account
  2. Create the resource that will run your code, and attach the service account to that resource. For example, if you use Compute Engine:

    Create a Compute Engine instance. Configure the instance as follows:
    • Replace INSTANCE_NAME with your preferred instance name.
    • Set the --zone flag to the zone in which you want to create your instance.
    • Set the --service-account flag to the email address for the service account that you created.
    gcloud compute instances create INSTANCE_NAME --zone=ZONE --service-account=SERVICE_ACCOUNT_EMAIL

On-premises or on a different cloud provider

The preferred method to set up authentication from outside of Google Cloud is to use workload identity federation. For more information, see On-premises or another cloud provider in the authentication documentation.

What's next