Connect to Blob Storage
As a BigQuery administrator, you can create a connection to let data analysts access data stored in Azure Blob Storage.
BigQuery Omni accesses Blob Storage data through connections. BigQuery Omni supports Azure workload identity federation. BigQuery Omni support of Azure workload identity federation lets you grant access for an Azure application in your tenant to a Google service account. There are no application client secrets to be managed by you or Google.
After you create a BigQuery Azure connection, you can either query the Blob Storage data or export query results to Blob Storage.
Before you begin
Ensure that you have created the following resources:
A Google Cloud project with BigQuery Connection API enabled.
If you are on the capacity-based pricing model, then ensure that you have enabled BigQuery Reservation API for your project. For information about pricing, see BigQuery Omni pricing.
An Azure tenant with an Azure subscription.
An Azure Storage account that meets the following specifications:
It's a general-purpose V2 account or a Blob Storage account.
It uses a hierarchical namespace. For more information, see Create a storage account to use with Azure Data Lake Storage Gen2.
Data is populated in one of the supported formats.
Data is in the
azure-eastus2
region.
Required roles
-
To get the permissions that you need to create a connection to access Azure Blob Storage data, ask your administrator to grant you the BigQuery Connection Admin (
roles/bigquery.connectionAdmin
) IAM role on the project. For more information about granting roles, see Manage access to projects, folders, and organizations.You might also be able to get the required permissions through custom roles or other predefined roles.
-
Ensure that you have the following Azure IAM permissions on your tenant:
Application.ReadWrite.All
AppRoleAssignment.ReadWrite.All
Quotas
For more information about quotas, see BigQuery Connection API.
Create an Azure connection
To create an Azure connection, follow these steps:
- Create an application in your Azure tenant.
- Create the BigQuery Azure connection.
- Add a federated credential.
- Assign a role to BigQuery Azure AD applications.
For more information about using federated identity credentials to access data in Azure, see Workload identity federation.
Create an application in your Azure tenant
To create an application in your Azure tenant, follow these steps:
Azure Portal
In the Azure portal, go to App registrations, and then click New registration.
For Names, enter a name for your application.
For Supported account types, select Accounts in this organizational directory only.
To register the new application, click Register.
Make a note of the Application (client) ID. You need to provide this ID when you create the connection.
Terraform
Add the following to your Terraform configuration file:
resource "azuread_application" "example" { display_name = "bigquery-omni-connector" } resource "azuread_service_principal" "example" { application_id = azuread_application.example.application_id app_role_assignment_required = false }
For more information, see how to register an application in Azure.
Create a connection
Console
In the Google Cloud console, go to the BigQuery page.
In the
Add data menu, select External data source.In the External data source pane, enter the following information:
- For Connection type, select BigLake on Azure (via BigQuery Omni).
- For Connection ID, enter an identifier for the connection resource. You can use letters, numbers, dashes, and underscores.
- Select the location where you want to create the connection.
- Optional: For Friendly name, enter a user-friendly name for
the connection, such as
My connection resource
. The friendly name can be any value that helps you identify the connection resource if you need to modify it later. - Optional: For Description, enter a description for the connection resource.
- For Azure tenant id, enter the Azure tenant ID, which is also referred to as the Directory (tenant) ID.
Enable the Use federated identity checkbox and then enter the Azure federated application (client) ID.
To learn how to get Azure IDs, see Create an application in your Azure tenant.
Click Create connection.
Click Go to connection.
In the Connection info section, note the value of BigQuery Google identity, which is the service account ID. This ID is for the Google Cloud service account that you authorize to access your application.
Terraform
resource "google_bigquery_connection" "connection" { connection_id = "omni-azure-connection" location = "azure-eastus2" description = "created by terraform" azure { customer_tenant_id = "TENANT_ID" federated_application_client_id = azuread_application.example.application_id } }
Replace TENANT_ID
with the tenant ID of the Azure
directory that contains the Blob Storage account.
bq
Use the bq mk
command. To
get the output in JSON format, use the --format=json
parameter.
bq mk --connection --connection_type='Azure' \ --tenant_id=TENANT_ID \ --location=AZURE_LOCATION \ --federated_azure=true \ --federated_app_client_id=APP_ID \ CONNECTION_ID
Replace the following:
TENANT_ID
: the tenant ID of the Azure directory that contains the Azure Storage account.AZURE_LOCATION
: the Azure region where your Azure Storage data is located. BigQuery Omni supports theazure-eastus2
region.APP_ID
: the Azure Application (client) ID. To learn how to get this ID, see Create application in Azure tenant.CONNECTION_ID
: the name of the connection.
The output is similar to the following:
Connection CONNECTION_ID successfully created Please add the following identity to your Azure application APP_ID Identity: SUBJECT_ID
This output includes the following values:
APP_ID
: the ID of the application that you created.SUBJECT_ID
: the ID of the Google Cloud service account that the user authorizes to access their application. This value is required when you create a federated credential in Azure.
Note the APP_ID
and the SUBJECT_ID
values for use in the next steps.
Next, add a federated credential for your application.
Add a federated credential
To create a federated credential, follow these steps:
Azure Portal
In the Azure portal, go to App registrations, and then click your application.
Select Certificates & secrets > Federated credentials > Add credentials. Then, do the following:
From the Federated credential scenario list, select Other issuer.
For Issuer, enter
https://accounts.google.com
.For Subject identifier, enter the BigQuery Google identity of the Google Cloud service account that you got when you created the connection.
For Name, enter a name for the credential.
Click Add.
Terraform
Add the following to your Terraform configuration file:
resource "azuread_application" "example" { display_name = "bigquery-omni-connector" } resource "azuread_service_principal" "example" { application_id = azuread_application.example.application_id app_role_assignment_required = false } resource "azuread_application_federated_identity_credential" "example" { application_object_id = azuread_application.example.object_id display_name = "omni-federated-credential" description = "BigQuery Omni federated credential" audiences = ["api://AzureADTokenExchange"] issuer = "https://accounts.google.com" subject = google_bigquery_connection.connection.azure[0].identity }
For more information, see Configure an app to trust an external identity provider.
Assign a role to BigQuery's Azure applications
To assign a role to BigQuery's Azure application, use the Azure Portal, the Azure PowerShell, or the Microsoft Management REST API:
Azure Portal
You can perform role assignments in the Azure Portal by logging in as a user
with the Microsoft.Authorization/roleAssignments/write
permission. The role
assignment lets the BigQuery Azure connection access the
Azure Storage data as specified in the roles policy.
To add role assignments using the Azure Portal, follow these steps:
From your Azure Storage account, enter
IAM
in the search bar.Click Access Control (IAM).
Click Add and select Add role assignments.
To provide read-only access, select the Storage Blob Data Reader role. To provide read-write access, select the Storage Blob Data Contributor role.
Set Assign access to to User, group, or service principal.
Click Select members.
In the Select field, enter the Azure application name that you gave when you created the application in the Azure tenant.
Click Save.
For more information, see Assign Azure roles using the Azure portal.
Terraform
Add the following to your Terraform configuration file:
resource "azurerm_role_assignment" "data-contributor-role" { scope = data.azurerm_storage_account.example.id # Read-write permission for Omni on the storage account role_definition_name = "Storage Blob Data Contributor" principal_id = azuread_service_principal.example.id }
Azure PowerShell
To add a role assignment for a service principal at a resource scope, you can
use the New-AzRoleAssignment
command:
New-AzRoleAssignment` -SignInName APP_NAME` -RoleDefinitionName ROLE_NAME` -ResourceName RESOURCE_NAME` -ResourceType RESOURCE_TYPE` -ParentResource PARENT_RESOURCE` -ResourceGroupName RESOURCE_GROUP_NAME
Replace the following:
APP_NAME
: the application name.ROLE_NAME
: the role name you want to assign.RESOURCE_NAME
: the resource name.RESOURCE_TYPE
: the resource type.PARENT_RESOURCE
: the parent resource.RESOURCE_GROUP_NAME
: the resource group name.
For more information about using Azure PowerShell to add a new service principal, see the Assign Azure roles using Azure PowerShell.
Azure CLI
To add a role assignment for a service principal at a resource scope, you can
use the Azure command-line tool. You must have the
Microsoft.Authorization/roleAssignments/write
permission for the storage
account to grant roles.
To assign a role, such as the Storage Blob Data Contributor role, to the
service principal, run the az role assignment create
command:
az role assignment create --role "Storage Blob Data Contributor" \ --assignee-object-id ${SP_ID} \ --assignee-principal-type ServicePrincipal \ --scope subscriptions/SUBSCRIPTION_ID/resourcegroups/RESOURCE_GROUP_NAME/providers/Microsoft.Storage/storageAccounts/STORAGE_ACCOUNT_NAME
Replace the following:
SP_ID
: the service principal ID. This service principal is for the application that you created. To get the service principal for a federated connection, see Service principal object.STORAGE_ACCOUNT_NAME
: the storage account name.RESOURCE_GROUP_NAME
: the resource group name.SUBSCRIPTION_ID
: the subscription ID.
For more information, see Assign Azure roles using Azure CLI.
Microsoft REST API
To add role assignments for a service principal, you can send an HTTP request to Microsoft Management.
To call the Microsoft Graph REST API,
retrieve an OAuth token for an application. For more information, see Get
access without a user.
The application that called the Microsoft Graph REST API must have
the Application.ReadWrite.All
application permission.
To generate an OAuth token, run the following command:
export TOKEN=$(curl -X POST \ https://login.microsoftonline.com/TENANT_ID/oauth2/token \ -H 'cache-control: no-cache' \ -H 'content-type: application/x-www-form-urlencoded' \ --data-urlencode "grant_type=client_credentials" \ --data-urlencode "resource=https://graph.microsoft.com/" \ --data-urlencode "client_id=CLIENT_ID" \ --data-urlencode "client_secret=CLIENT_SECRET" \ | jq --raw-output '.access_token')
Replace the following:
TENANT_ID
: the tenant ID matching the ID of the Azure directory that contains the Azure Storage account.CLIENT_ID
: the Azure client ID.CLIENT_SECRET
: the Azure client secret.
Get the ID of the Azure built-in roles that you want to assign to the service principal.
These are some common roles:
- Storage Blob Data Contributor:
ba92f5b4-2d11-453d-a403-e96b0029c9fe
- Storage Blob Data Reader:
2a2b9908-6ea1-4ae2-8e65-a410df84e7d1
To assign a role to the service principal, call the Microsoft Graph REST API to the Azure Resource Management REST API:
export ROLE_ASSIGNMENT_ID=$(uuidgen) curl -X PUT \ 'https://management.azure.com/subscriptions/SUBSCRIPTION_ID/resourcegroups/RESOURCE_GROUP_NAME/providers/Microsoft.Storage/storageAccounts/STORAGE_ACCOUNT_NAME/providers/Microsoft.Authorization/roleAssignments/ROLE_ASSIGNMENT_ID?api-version=2018-01-01-preview' \ -H "authorization: Bearer ${TOKEN?}" \ -H 'cache-control: no-cache' \ -H 'content-type: application/json' \ -d '{ "properties": { "roleDefinitionId": "subscriptions/SUBSCRIPTION_ID/resourcegroups/RESOURCE_GROUP_NAME/providers/Microsoft.Storage/storageAccounts/STORAGE_ACCOUNT_NAME/providers/Microsoft.Authorization/roleDefinitions/ROLE_ID", "principalId": "SP_ID" } }'
Replace the following:
ROLE_ASSIGNMENT_ID
: the role ID.SP_ID
: the service principal ID. This service principal is for the application that you created. To get the service principal for a federated connection, see Service principal object.SUBSCRIPTION_ID
: the subscription ID.RESOURCE_GROUP_NAME
: the resource group name.STORAGE_ACCOUNT_NAME
: the storage account name.SUBSCRIPTION_ID
: the subscription ID.
The connection is now ready to use. However, there might be a propagation delay for a role assignment in Azure. If you are not able to use the connection due to permission issues, then retry after some time.
Share connections with users
You can grant the following roles to let users query data and manage connections:
roles/bigquery.connectionUser
: enables users to use connections to connect with external data sources and run queries on them.roles/bigquery.connectionAdmin
: enables users to manage connections.
For more information about IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Select one of the following options:
Console
Go to the BigQuery page.
Connections are listed in your project, in a group called External connections.
In the Explorer pane, click your project name > External connections > connection.
In the Details pane, click Share to share a connection. Then do the following:
In the Connection permissions dialog, share the connection with other principals by adding or editing principals.
Click Save.
bq
You cannot share a connection with the bq command-line tool. To share a connection, use the Google Cloud console or the BigQuery Connections API method to share a connection.
API
Use the
projects.locations.connections.setIAM
method
in the BigQuery Connections REST API reference section, and
supply an instance of the policy
resource.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
What's next
- Learn about different connection types.
- Learn about managing connections.
- Learn more about BigQuery Omni.
- Learn about BigLake tables.
- Learn how to query Blob Storage data.
- Learn how to export query results to Blob Storage.