Stay organized with collections Save and categorize content based on your preferences.

Connect to Blob Storage

As a BigQuery administrator, you can create a connection to let data analysts access data stored in Azure Blob Storage.

BigQuery Omni accesses Blob Storage data through connections. There are two methods for securely accessing data from Blob Storage. You can either use identity federation by granting a Google Cloud service account access to your Azure application, or you can directly grant access to your Azure Active Directory (AD) application in your tenant:

  • Use an Azure federated identity. BigQuery Omni supports Azure workload identity federation. With BigQuery Omni supporting Azure workload identity federation, you can grant a Google service account access to the Azure application that's within your tenant. Federated identity access is more secure than a non-federated access because the application client exists within your Azure tenant. The application client's secrets are not managed by you or Google.

    With identity federation, you have more control over the application because your application to which you have granted access to your data exists within your tenant.

  • Use a non-federated identity. Each connection has its own unique Azure Active Directory (Azure AD) application. Permissions are granted to the applications through Azure Identity and Access Management (IAM) roles. The granted Azure IAM roles determine what data BigQuery can access for each connection.

After you create a BigQuery Azure connection, you can either query the Blob Storage data or export query results to Blob Storage.

Before you begin

Quotas

Your project can create up to 50 Azure connections for each project. The quota includes both federated and non-federated Azure connections. For more information about quotas, see BigQuery Connection API.

Use an Azure federated identity

To access data using a federated identity, follow these steps:

  1. Create an application in your Azure tenant.
  2. Create the BigQuery Azure connection.
  3. Add a federated credential.
  4. Assign a role to BigQuery Azure AD applications.

For more information about using federated identity credentials to access data in Azure, see Workload identity federation.

Create an application in your Azure tenant

To create an application in your Azure tenant, follow these steps:

  1. In the Azure portal, go to App registrations, and then click New registration.

  2. For Names, enter a name for your application.

  3. For Supported account types, select Accounts in this organizational directory only.

  4. To register the new application, click Register.

  5. Make a note of the Application (client) ID. You need to provide this ID when you create the connection.

    Azure portal for creating applications

For more information, see how to register an application in Azure.

Create a connection

Console

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Add data menu, select External data source.

  3. In the External data source pane, enter the following information:

    • For Connection type, select BigLake on Azure (via BigQuery Omni).
    • For Connection ID, enter an identifier for the connection resource. You can use letters, numbers, dashes, and underscores.
    • For Connection location, select azure-eastus2.
    • Optional: For Friendly name, enter a user-friendly name for the connection, such as My connection resource. The friendly name can be any value that helps you identify the connection resource if you need to modify it later.
    • Optional: For Description, enter a description for the connection resource.
    • For Azure tenant id, enter the Azure tenant ID, which is also referred to as the Directory (tenant) ID.
    • Enable the Use federated identity checkbox and then enter the Azure federated application (client) ID.

      To learn how to get Azure IDs, see Create an application in your Azure tenant.

  4. Click Create connection.

  5. Click Go to connection.

  6. In the Connection info section, note the value of Identity, which is the service account ID. This ID is for the Google Cloud service account that you authorize to access your application.

bq

Use the bq mk command. To get the output in JSON format, use the --format=json parameter.

bq mk --connection --connection_type='Azure' \
  --tenant_id=TENANT_ID \
  --location=AZURE_LOCATION \
  --federated_azure=true \
  --federated_app_client_id=APP_ID \
  CONNECTION_NAME

Replace the following:

  • TENANT_ID: the tenant ID of the Azure directory that contains the Azure Storage account.
  • AZURE_LOCATION: the Azure region where your Azure Storage data is located. BigQuery Omni supports the azure-eastus2 region.
  • APP_ID: the Azure Application (client) ID. To learn how to get this ID, see Create application in Azure tenant.
  • CONNECTION_NAME: the name of the connection.

The output is similar to the following:

Connection CONNECTION_NAME successfully created
Please add the following identity to your Azure application APP_ID
Identity: SUBJECT_ID

This output includes the following values:

  • APP_ID: the ID of the application that you created.

  • SUBJECT_ID: the ID of the Google Cloud service account that the user authorizes to access their application. This value is required when you create a federated credential in Azure.

Note the APP_ID and the SUBJECT_ID values for use in the next steps.

Next, add a federated credential for your application.

Add a federated credential

To create a federated credential, follow these steps:

  1. In the Azure portal, go to App registrations, and then click your application.

  2. Select Certificates & secrets > Federated credentials > Add credentials. Then, do the following:

    1. From the Federated credential scenario list, select Other issuer.

    2. For Issuer, enter https://accounts.google.com.

    3. For Subject identifier, enter the ID of the Google Cloud service account that you got when you created the connection.

    4. For Name, enter a name for the credential.

    5. Click Add.

For more information, see Configure an app to trust an external identity provider.

Assign a role to BigQuery's Azure applications

To assign a role to BigQuery's Azure application, use the Azure Portal, the Azure PowerShell, or the Microsoft Management REST API:

Azure Portal

You can perform role assignments in the Azure Portal by logging in as a user with the Microsoft.Authorization/roleAssignments/write permission. The role assignment lets the BigQuery Azure connection access the Azure Storage data as specified in the roles policy.

To add role assignments using the Azure Portal, follow these steps:

  1. From your Azure Storage account, enter IAM in the search bar.

  2. Click Access Control (IAM).

  3. Click Add and select Add role assignments.

  4. To provide read-only access, select the Storage Blob Data Reader role. To provide read-write access, select the Storage Blob Data Contributor role.

  5. Set Assign access to to User, group, or service principal.

  6. Click Select members.

  7. In the Select field, enter the Azure application name that you gave when you created the application in the Azure tenant.

  8. Click Save.

For more information, see Assign Azure roles using the Azure portal.

Azure PowerShell

To add a role assignment for a service principal at a resource scope, you can use the New-AzRoleAssignment command:

  New-AzRoleAssignment`
   -SignInName APP_NAME`
   -RoleDefinitionName ROLE_NAME`
   -ResourceName RESOURCE_NAME`
   -ResourceType RESOURCE_TYPE`
   -ParentResource PARENT_RESOURCE`
   -ResourceGroupName RESOURCE_GROUP_NAME

Replace the following:

  • APP_NAME: the application name.
  • ROLE_NAME: the role name you want to assign.
  • RESOURCE_NAME: the resource name.
  • RESOURCE_TYPE: the resource type.
  • PARENT_RESOURCE: the parent resource.
  • RESOURCE_GROUP_NAME: the resource group name.

For more information about using Azure PowerShell to add a new service principal, see the Assign Azure roles using Azure PowerShell.

Azure CLI

To add a role assignment for a service principal at a resource scope, you can use the Azure command-line tool. You must have the Microsoft.Authorization/roleAssignments/write permission for the storage account to grant roles.

To assign a role, such as the Storage Blob Data Contributor role, to the service principal, run the az role assignment create command:

  az role assignment create --role "Storage Blob Data Contributor" \
    --assignee-object-id ${SP_ID} \
    --assignee-principal-type ServicePrincipal \
    --scope   subscriptions/SUBSCRIPTION_ID/resourcegroups/RESOURCE_GROUP_NAME/providers/Microsoft.Storage/storageAccounts/STORAGE_ACCOUNT_NAME

Replace the following:

  • SP_ID: the service principal ID. This service principal is for the application that you created. To get the service principal for a federated connection, see Service principal object.
  • STORAGE_ACCOUNT_NAME: the storage account name.
  • RESOURCE_GROUP_NAME: the resource group name.
  • SUBSCRIPTION_ID: the subscription ID.

For more information, see Assign Azure roles using Azure CLI.

Microsoft REST API

To add role assignments for a service principal, you can send an HTTP request to Microsoft Management.

To call the Microsoft Graph REST API, retrieve an OAuth token for an application. For more information, see Get access without a user. The application that called the Microsoft Graph REST API must have the Application.ReadWrite.All application permission.

To generate an OAuth token, run the following command:

  export TOKEN=$(curl -X POST \
    https://login.microsoftonline.com/TENANT_ID/oauth2/token \
    -H 'cache-control: no-cache' \
    -H 'content-type: application/x-www-form-urlencoded' \
    --data-urlencode "grant_type=client_credentials" \
    --data-urlencode "resource=https://graph.microsoft.com/" \
    --data-urlencode "client_id=CLIENT_ID" \
    --data-urlencode "client_secret=CLIENT_SECRET" \
  | jq --raw-output '.access_token')

Replace the following:

  • TENANT_ID: the tenant ID matching the ID of the Azure directory that contains the Azure Storage account.
  • CLIENT_ID: the Azure client ID.
  • CLIENT_SECRET: the Azure client secret.

Get the ID of the Azure built-in roles that you want to assign to the service principal.

These are some common roles:

To assign a role to the service principal, call the Microsoft Graph REST API to the Azure Resource Management REST API:

  export ROLE_ASSIGNMENT_ID=$(uuidgen)
  curl -X PUT \
'https://management.azure.com/subscriptions/SUBSCRIPTION_ID/resourcegroups/RESOURCE_GROUP_NAME/providers/Microsoft.Storage/storageAccounts/STORAGE_ACCOUNT_NAME/providers/Microsoft.Authorization/roleAssignments/ROLE_ASSIGNMENT_ID?api-version=2018-01-01-preview' \
    -H "authorization: Bearer ${TOKEN?}" \
    -H 'cache-control: no-cache' \
    -H 'content-type: application/json' \
    -d '{
        "properties": {
            "roleDefinitionId": "subscriptions/SUBSCRIPTION_ID/resourcegroups/RESOURCE_GROUP_NAME/providers/Microsoft.Storage/storageAccounts/STORAGE_ACCOUNT_NAME/providers/Microsoft.Authorization/roleDefinitions/ROLE_ID",
            "principalId": "SP_ID"
        }
    }'

Replace the following:

  • ROLE_ASSIGNMENT_ID: the role ID.
  • SP_ID: the service principal ID. This service principal is for the application that you created. To get the service principal for a federated connection, see Service principal object.
  • SUBSCRIPTION_ID: the subscription ID.
  • RESOURCE_GROUP_NAME: the resource group name.
  • STORAGE_ACCOUNT_NAME: the storage account name.
  • SUBSCRIPTION_ID: the subscription ID.

The connection is now ready to use. However, there might be a propagation delay for a role assignment in Azure. If you are not able to use the connection due to permission issues, then retry after some time.

Use a non-federated identity

To access data using a non-federated identity, follow these steps:

  1. Create the BigQuery Azure connection.
  2. Create an Azure AD service principal.
  3. Assign a role to BigQuery's Azure AD applications.

For more information about using non-federated identity to access data in Azure, see Azure AD application.

Create a connection

To connect to Blob Storage, use the Google Cloud console or the bq command-line tool:

Console

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Add data menu, select External data source.

  3. In the External data source pane, enter the following information:

    • For Connection type, select BigLake on Azure (via BigQuery Omni).
    • For Connection ID, enter an identifier for the connection resource. You can use letters, numbers, dashes, and underscores.
    • For Connection location, select azure-eastus2.
    • Optional: For Friendly name, enter a user-friendly name for the connection, such as My connection resource. The friendly name can be any value that helps you identify the connection resource if you need to modify it later.
    • Optional: For Description, enter a description for the connection resource.
    • For Azure tenant id, enter the Azure tenant ID, which is also referred to as the Directory (tenant) ID.

  4. Click Create connection.

  5. Click Go to connection.

  6. In the Connection info section, note the values of Azure app ID and Azure app name. These values are required when you assign a role to BigQuery's Azure application.

bq

Use the bq mk command. To get the output in JSON format, use the --format=json parameter.

bq mk --connection --connection_type='Azure' \
  --tenant_id=TENANT_ID \
  --location=AZURE_LOCATION \
  CONNECTION_NAME

Replace the following:

  • TENANT_ID: the tenant ID of the Azure directory that contains the Azure Storage account.
  • AZURE_LOCATION: the Azure region where your Azure Storage data is located. BigQuery Omni supports the azure-eastus2 region.
  • CONNECTION_NAME: the name of the connection.

The output is similar to the following:

  Please create a Service Principal in your directory for
  appId: APP_ID,
  and perform role assignment to
  app: APP_NAME
  to allow BigQuery to access your Azure data.

This output includes the following values:

  • APP_ID: the ID of the application that you created.
  • APP_NAME: the app name that you need to assign roles to, so BigQuery can access your Azure data.

Note the APP_ID and the APP_NAME values for use in the next steps.

For more information, see Create a connection.

Create an Azure AD service principal

To create an Azure AD service principal, use the Google Cloud console, Azure PowerShell, the Azure command-line tool, or the Microsoft Graph REST API:

Console

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer panel, click the connection that you created.

  3. On the Connection info pane, click Login to Azure account.

  4. Log in to your Azure account.

  5. On the Permissions Requested page, click Accept.

Azure PowerShell

To create the service principal for the app ID APP_ID that was returned previously, the user must have the Microsoft.directory/servicePrincipals/create permission.

To create the service principal, run the New-AzADServicePrincipal command:

New-AzADServicePrincipal`
-ApplicationId APP_ID`
-SkipAssignment

Replace APP_ID with the app ID that was returned previously.

Azure CLI

To create the service principal for the app ID APP_ID that was returned previously, you can use the Azure command-line tool. You must have the Microsoft.directory/servicePrincipals/create permission to create a service principal.

To create the service principal, run the az ad sp command:

export SP_ID=$(az ad sp create --id APP_ID | jq --raw-output '.objectId')

Replace APP_ID with the app ID that was returned previously.

Microsoft REST API

To create the service principal for the app ID APP_ID that was returned previously, you can send an HTTP request to the Microsoft Graph REST API.

To call the Microsoft Graph REST API, retrieve an OAuth token for an application. For more information, see Get access without a user. The application used to call the Microsoft Graph REST API should have the Application.ReadWrite.All application permission.

The TENANT_ID should match the ID of the Azure directory that contains the Azure Storage account.

To generate an OAuth token, run the following command:

export TOKEN=$(curl -X POST \
https://login.microsoftonline.com/TENANT_ID/oauth2/token \
-H 'cache-control: no-cache' \
-H 'content-type: application/x-www-form-urlencoded' \
--data-urlencode "grant_type=client_credentials" \
--data-urlencode "resource=https://graph.microsoft.com/" \
--data-urlencode "client_id=CLIENT_ID" \
--data-urlencode "client_secret=CLIENT_SECRET" \
| jq --raw-output '.access_token')

Replace the following:

  • TENANT_ID: the tenant ID of the Azure directory that contains the Azure Storage account.
  • CLIENT_ID: the Azure client ID.
  • CLIENT_SECRET: the Azure client secret.

Run the following command:

export APP_ID=APP_ID

Replace APP_ID with the app ID that was returned.

To create a service principal by calling the Microsoft Graph REST API, run the following command:

export SP_ID=$(curl -X POST \
https://graph.microsoft.com/v1.0/serviceprincipals \
-H "authorization: Bearer ${TOKEN?}" \
-H 'cache-control: no-cache' \
-H 'content-type: application/json' \
-d "{
  \"appId\": \"${APP_ID?}\"
}" | jq --raw-output '.id')

Replace the following:

  • TOKEN: the OAuth token for the application.
  • APP_ID: the app ID that was returned previously.

Assign a role to BigQuery's Azure AD applications

Azure Portal

You can perform role assignments in the Azure Portal by logging in as a user with the Microsoft.Authorization/roleAssignments/write permission. The role assignment lets the BigQuery Azure connection access the Azure Storage data as specified in the roles policy.

To add role assignments using the Azure Portal, follow these steps:

  1. From your Azure Storage account, enter IAM in the search bar.

  2. Click Access Control (IAM).

  3. Click Add and select Add role assignments.

  4. To provide read-only access, select the Storage Blob Data Reader role. To provide read-write access, select the Storage Blob Data Contributor role.

  5. Set Assign access to to User, group, or service principal.

  6. Click Select members.

  7. In the Select field, enter the Azure app name that was returned when you created the BigQuery Azure connection.

  8. Click Save.

For more information, see Assign Azure roles using the Azure portal.

Azure PowerShell

To add a role assignment for a service principal at a resource scope, you can use the New-AzRoleAssignment command:

  New-AzRoleAssignment`
   -SignInName APP_NAME`
   -RoleDefinitionName ROLE_NAME`
   -ResourceName RESOURCE_NAME`
   -ResourceType RESOURCE_TYPE`
   -ParentResource PARENT_RESOURCE`
   -ResourceGroupName RESOURCE_GROUP_NAME

Replace the following:

  • APP_NAME: the application name.
  • ROLE_NAME: the role name you want to assign.
  • RESOURCE_NAME: the resource name.
  • RESOURCE_TYPE: the resource type.
  • PARENT_RESOURCE: the parent resource.
  • RESOURCE_GROUP_NAME: the resource group name.

For more information about using Azure PowerShell to add a new service principal, see the Assign Azure roles using Azure PowerShell.

Azure CLI

To add a role assignment for a service principal at a resource scope, you can use the Azure command-line tool. You must have the Microsoft.Authorization/roleAssignments/write permission for the storage account to grant roles.

To assign a role, such as the Storage Blob Data Contributor role, to the service principal, run the az role assignment create command:

  az role assignment create --role "Storage Blob Data Contributor" \
    --assignee-object-id ${SP_ID} \
    --assignee-principal-type ServicePrincipal \
    --scope   subscriptions/SUBSCRIPTION_ID/resourcegroups/RESOURCE_GROUP_NAME/providers/Microsoft.Storage/storageAccounts/STORAGE_ACCOUNT_NAME

Replace the following:

  • SP_ID: the service principal ID.
  • STORAGE_ACCOUNT_NAME: the storage account name.
  • RESOURCE_GROUP_NAME: the resource group name.
  • SUBSCRIPTION_ID: the subscription ID.

For more information, see Assign Azure roles using Azure CLI.

Microsoft REST API

To add role assignments for a service principal, you can send an HTTP request to Microsoft Management.

To call the Microsoft Graph REST API, retrieve an OAuth token for an application. For more information, see Get access without a user. The application that called the Microsoft Graph REST API must have the Application.ReadWrite.All application permission.

To generate an OAuth token, run the following command:

  export TOKEN=$(curl -X POST \
    https://login.microsoftonline.com/TENANT_ID/oauth2/token \
    -H 'cache-control: no-cache' \
    -H 'content-type: application/x-www-form-urlencoded' \
    --data-urlencode "grant_type=client_credentials" \
    --data-urlencode "resource=https://graph.microsoft.com/" \
    --data-urlencode "client_id=CLIENT_ID" \
    --data-urlencode "client_secret=CLIENT_SECRET" \
  | jq --raw-output '.access_token')

Replace the following:

  • TENANT_ID: the tenant ID matching the ID of the Azure directory that contains the Azure Storage account.
  • CLIENT_ID: the Azure client ID.
  • CLIENT_SECRET: the Azure client secret.

Get the ID of the Azure built-in roles that you want to assign to the service principal.

These are some common roles:

To assign a role to the service principal, call the Microsoft Graph REST API to the Azure Resource Management REST API:

  export ROLE_ASSIGNMENT_ID=$(uuidgen)
  curl -X PUT \
'https://management.azure.com/subscriptions/SUBSCRIPTION_ID/resourcegroups/RESOURCE_GROUP_NAME/providers/Microsoft.Storage/storageAccounts/STORAGE_ACCOUNT_NAME/providers/Microsoft.Authorization/roleAssignments/ROLE_ASSIGNMENT_ID?api-version=2018-01-01-preview' \
    -H "authorization: Bearer ${TOKEN?}" \
    -H 'cache-control: no-cache' \
    -H 'content-type: application/json' \
    -d '{
        "properties": {
            "roleDefinitionId": "subscriptions/SUBSCRIPTION_ID/resourcegroups/RESOURCE_GROUP_NAME/providers/Microsoft.Storage/storageAccounts/STORAGE_ACCOUNT_NAME/providers/Microsoft.Authorization/roleDefinitions/ROLE_ID",
            "principalId": "SP_ID"
        }
    }'

Replace the following:

  • ROLE_ASSIGNMENT_ID: the role ID.
  • SP_ID: the service principal ID.
  • SUBSCRIPTION_ID: the subscription ID.
  • RESOURCE_GROUP_NAME: the resource group name.
  • STORAGE_ACCOUNT_NAME: the storage account name.
  • SUBSCRIPTION_ID: the subscription ID.

The connection is now ready to use. However, there might be a propagation delay for a role assignment in Azure. If you are not able to use the connection due to permission issues, then retry after some time.

Share connections with users

You can grant the following roles to let users query data and manage connections:

  • roles/bigquery.connectionUser: enables users to use connections to connect with external data sources and run queries on them.

  • roles/bigquery.connectionAdmin: enables users to manage connections.

For more information about IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Select one of the following options:

Console

  1. Go to the BigQuery page.

    Go to BigQuery

    Connections are listed in your project, in a group called External connections.

  2. In the Explorer pane, click your project name > External connections > connection.

  3. In the Details pane, click Share to share a connection. Then do the following:

    1. In the Connection permissions dialog, share the connection with other principals by adding or editing principals.

    2. Click Save.

bq

You cannot share a connection with the bq command-line tool. To share a connection, use the Google Cloud console or the BigQuery Connections API method to share a connection.

API

Use the projects.locations.connections.setIAM method in the BigQuery Connections REST API reference section, and supply an instance of the policy resource.

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

import com.google.api.resourcenames.ResourceName;
import com.google.cloud.bigquery.connection.v1.ConnectionName;
import com.google.cloud.bigqueryconnection.v1.ConnectionServiceClient;
import com.google.iam.v1.Binding;
import com.google.iam.v1.Policy;
import com.google.iam.v1.SetIamPolicyRequest;
import java.io.IOException;

// Sample to share connections
public class ShareConnection {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "MY_PROJECT_ID";
    String location = "MY_LOCATION";
    String connectionId = "MY_CONNECTION_ID";
    shareConnection(projectId, location, connectionId);
  }

  public static void shareConnection(String projectId, String location, String connectionId)
      throws IOException {
    try (ConnectionServiceClient client = ConnectionServiceClient.create()) {
      ResourceName resource = ConnectionName.of(projectId, location, connectionId);
      Binding binding =
          Binding.newBuilder()
              .addMembers("group:example-analyst-group@google.com")
              .setRole("roles/bigquery.connectionUser")
              .build();
      Policy policy = Policy.newBuilder().addBindings(binding).build();
      SetIamPolicyRequest request =
          SetIamPolicyRequest.newBuilder()
              .setResource(resource.toString())
              .setPolicy(policy)
              .build();
      client.setIamPolicy(request);
      System.out.println("Connection shared successfully");
    }
  }
}

What's next