Work with connections to external data sources
Overview
The BigQuery Connection API enables users to set up a connection from BigQuery to an external data source. You can use the connection to submit queries against the external data source from BigQuery without moving or copying data into BigQuery.
You must complete an initial one-time setup to create a connection resource in BigQuery. After that you can perform the following tasks:
- Query a connection resource by writing a query
with the SQL function
EXTERNAL_QUERY
- Set up Cloud SQL database connections
- Set up Cloud Spanner database connections
- Create the BigQuery AWS connection
Before you begin
Before you work with a connection resource, you must perform the tasks explained in the following sections.
Enable the BigQuery connection service
- Open the BigQuery Connection API page in the API library.
- From the drop-down menu, select the project that contains your external data source.
Click the ENABLE button.
Service Account
BigQuery uses a service account to connect to your external data source. When you enable the BigQuery Connection API, a Google Cloud–managed Identity and Access Management (IAM) service account is automatically created on your behalf.
To view a list of service accounts in your project, see Listing service accounts.
Required permissions
To create and maintain a connection resource, you need the following IAM permissions:
bigquery.connections.create
bigquery.connections.get
bigquery.connections.list
bigquery.connections.update
bigquery.connections.use
bigquery.connections.delete
The predefined IAM role roles/bigquery.admin
includes the
permissions that you need in order to create and maintain a connection resource.
To grant permissions to another user so they can use the connection resource, see Share a connection resource.
Granting bigquery.admin
access
To grant the bigquery.admin
role:
Console
Open the IAM page in the Google Cloud console
Click Select a project.
Select a project and click Open.
Click Add to add new members to the project and set their permissions.
In the Add members dialog:
- For Members, enter the email address of the user or group.
- In the Select a role drop-down, click BigQuery > BigQuery Admin.
Click Add.
gcloud
You can use the Google Cloud CLI to grant a user or group the
bigquery.admin
role.
To add a single binding to your project's IAM policy, type
the following command. To add a user, supply the --member
flag in the
format user:user@example.com
. To add a group, supply the --member
flag
in the format group:group@example.com
.
gcloud projects add-iam-policy-binding project_id \ --member principal:address \ --role roles/bigquery.admin
Where:
- project_id is your project ID.
- principal is either
group
oruser
. - address is the user or group's email address.
For example:
gcloud projects add-iam-policy-binding myproject \
--member group:group@example.com \
--role roles/bigquery.admin
The command outputs the updated policy:
bindings: - members: - group:group@example.com role: roles/bigquery.admin
For more information on IAM roles in BigQuery, see Predefined roles and permissions.
Create a connection resource
To avoid writing database credentials as cleartext in a federated query, you need to first create a database connection resource per database in BigQuery and then reference the connection resource in your federated query.
The connection resource has a set of IAM permissions, which you can grant to other users. The connection resource is encrypted and stored securely in the BigQuery connection service, and it can only be used for federated queries. For more information about IAM roles and permissions in BigQuery, see Predefined roles and permissions.
You can set up a connection resource for Cloud SQL, Cloud Spanner, and AWS (via BigQuery Omni). To create a connection resource for Cloud SQL, follow these steps:
Console
To create a connection resource, go to the BigQuery page in the Google Cloud console.
In the
Add data menu, select External data source.In the External data source dialog, enter the following information:
- For Connection type, select the type of source, for example MySQL or Postgres.
- For Connection ID, enter an identifier for the connection
resource. Letter, numbers, and underscores are allowed. For example,
bq_sql_connection
. - For Data location, select a BigQuery location (or region) that is compatible with your external data source region.
- Optional: For Friendly name, enter a user-friendly name for
the connection, such as
My connection resource
. The friendly name can be any value that helps you identify the connection resource if you need to modify it later. - Optional: For Description, enter a description for this connection resource.
- If you chose Cloud SQL MySQL or Postgres for the connection type,
for Cloud SQL instance ID, enter the full
name of the Cloud SQL instance,
usually in the format
project-id:location-id:instance-id
. You can find the instance ID on the detail page of the Cloud SQL instance you want to query. - For Database name, enter the name of the database.
- For Database username, enter the username for the database.
For Database password, enter the password for the database.
- Optional: To see the password, click Show password.
Click Create connection.
bq
Enter the bq mk
command and supply the connection flag:
--connection
. The following flags are also required:
--connection_type
--properties
--connection_credential
--project_id
--location
The following flags are optional:
--display_name
The friendly name for the connection.--description
A description of the connection.
The connection_id
is an optional parameter that can be added as the last argument of the command which
is used for storage internally. If a connection ID is not provided a unique ID is automatically generated.
The connection_id
can contain letters, numbers, and underscores.
bq mk --connection --display_name='friendly name' --connection_type=TYPE \
--properties=PROPERTIES --connection_credential=CREDENTIALS \
--project_id=PROJECT_ID --location=LOCATION \
CONNECTION_ID
Replace the following:
TYPE
: the type of the external data source.PROPERTIES
: the parameters for the created connection in JSON format. For example:--properties='{"param":"param_value"}'
. For creating a connection resource, you must supply theinstanceID
,database
, andtype
parameters.CREDENTIALS
: the parametersusername
andpassword
.PROJECT_ID
: your project ID.LOCATION
: the region your Cloud SQL instance is located in.CONNECTION_ID
: the connection identifier.
For example, the following command creates a new connection resource
named my_new_connection (friendly name: "My new connection") in a project
with the ID federation-test
.
bq mk --connection --display_name='friendly name' --connection_type='CLOUD_SQL' \
--properties='{"instanceId":"federation-test:us-central1:mytestsql","database":"mydatabase","type":"MYSQL"}' \
--connection_credential='{"username":"myusername", "password":"mypassword"}' \
--project_id=federation-test --location=us my_connection_id
API
Within the BigQuery Connection API, you can invoke CreateConnection
within
ConnectionService
to instantiate a connection. For more information,
see the BigQuery Connection API Client Libraries.
Query a connection resource
Once a connection is established, you can use the
EXTERNAL_QUERY
function to execute a federated query.
Query another database through the same connection resource
When you create a connection to an external data source, you specify credentials for that data source. If the same user credentials are valid for other data in the source, the same connection resource can be used.
Get information about a connection resource
After you create a connection resource, you can get information about the connection resource's configuration. The configuration includes the values you supplied when you created the transfer.
Required permissions
To get information about a connection resource, you need the bigquery.connections.get
IAM permission.
Each of the following predefined IAM roles includes the permissions that you need in order to get information about a connection resource:
roles/bigquery.admin
roles/bigquery.connectionAdmin
roles/bigquery.connectionUser
For more information about IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Viewing a connection resource
Console
To view the status of your connection resources, go to the BigQuery page in the Google Cloud console.
Connection resources are listed in your project, in a group called External connections.
In the Explorer panel, click your project name > External connections > a connection resource to see information about a connection resource.
bq
Enter the bq show
command and supply the connection flag:
--connection
. The fully qualified connection_id is required.
bq show --connection project.location.connection_id
For example, the following command gets information about a connection
resource named my_new_connection
in a project with the ID
federation-test
located in the region us
.
bq show --connection federation-test.us.my_new_connection
API
Use the
projects.locations.connections.get
method
in the REST API reference section.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
List all connection resources
Required permissions
To list all the connection resources in a project, you need the bigquery.connections.list
IAM permission.
Each of the following predefined IAM roles includes the permissions that you need in order to list all the connection resources:
roles/bigquery.admin
roles/bigquery.connectionAdmin
roles/bigquery.connectionUser
For more information about IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Listing connection resources
To list all the connection resources in a project:
Console
To view the status of a connection resource, go to the BigQuery page in the Google Cloud console.
Connection resources are listed in your project, in a group called External connections.
In the Explorer panel, click your project name > External connections to see a list of all connection resources.
bq
Enter the bq show
command and supply the connection flag:
--connection
. The fully qualified connection_id
is required.
bq ls --connection --project_id=[project_id] --location=[location]
For example, the following command lists the connection resources in a
project with the ID federation-test
located in the region us
.
bq ls --connection --project_id=bigquery-federation-test --location=us
API
Use the projects.locations.connections.list
method
in the REST API reference section.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
Share a connection resource
Required permissions
To allow another user to use a connection resource for Cloud SQL federated
queries, the bigquery.admin
user can grant the two different
roles in IAM:
roles/bigquery.connectionUser
The BigQuery Connection User can run queries with the connection resource.
To run queries with the connection resource, you need the following IAM permissions:
bigquery.connections.get
bigquery.connections.list
bigquery.connections.use
bigquery.connections.getIamPolicy
The predefined IAM role
roles/bigquery.connectionUser
includes the permissions that you need in order to run queries with the connection resource.roles/bigquery.connectionAdmin
The BigQuery Connection Admin can manage the connection resources. In addition to the permissions included in the BigQuery Connection User role, this role includes permissions to create, update, and delete the connection resources, and to set IAM policy on the connection resources.
To manage the connection resources, you need the following IAM permissions:
bigquery.connections.get
bigquery.connections.list
bigquery.connections.use
bigquery.connections.getIamPolicy
bigquery.connections.create
bigquery.connections.update
bigquery.connections.setIamPolicy
bigquery.connections.delete
The predefined IAM role
roles/bigquery.connectionAdmin
includes the permissions that you need in order to manage the connection resources.For more information about IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Sharing a connection resource
Console
To share a connection resource, go to the BigQuery page in the Google Cloud console.
Connection resources are listed in your project, in a group called External connections.
In the Explorer panel, click your project name > External connections > a connection resource to view the connection's details.
In the Details panel, click Share to share a connection resource. Then do the following:
In the Connection permissions dialog, users with the role BigQuery Admin or BigQuery Connection Admin can share the connection resource with other principals by adding or editing principals.
Click Save.
bq
Currently, you cannot share a resource with the bq
command-line tool.
To share a connection resource, use the Google Cloud console or
the BigQuery Connections API method to share a connection.
API
See the
projects.locations.connections.setIAM
method
in the BigQuery Connections REST API reference section, and
supply an instance of the policy
resource.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
Edit or update a connection resource
Once the connection resource is added, you can edit the connection resource. You can edit most of the fields populated during connection resource creation, including the username credentials.
A connection uses the credentials of the user who created it. If you need to change the user attached to a connection resource, you can update the user's credentials. This is useful if the user who created the connection is no longer with your organization.
You cannot edit the following elements of a connection resource:
- Connection type
- Connection ID
- Location
Required permissions
To update a connection resource, you need the bigquery.connections.update
IAM permission.
Each of the following predefined IAM roles includes the permissions that you need in order to update a connection resource:
roles/bigquery.admin
roles/bigquery.connectionAdmin
For more information about IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Editing a connection resource
To edit a connection:
Console
In the Google Cloud console, go to the BigQuery page.
Connection resources are listed in your project, in a group called External connections.
In the Explorer panel, click your project name > External connections > a connection resource to view the connection's details.
In the Details panel, to edit details, click
Edit details. Then do the following:In the Edit connection dialog, edit the connection details including the user credentials.
Click Update connection.
bq
Enter the bq update
command and supply the connection flag:
--connection
. The fully qualified connection_id
is required.
bq update --connection --connection_type='CLOUD_SQL'
--properties='{"instanceId" : "instance",
"database" : "db", "type" : "MYSQL" }'
--connection_credential='{"username":"u", "password":"p"}'
project.location.connection_id
For example, the following command updates the connection resources in a
project with the ID federation-test
and connection ID test-mysql
.
bq update --connection --connection_type='CLOUD_SQL'
--properties='{"instanceId" : "federation-test:us-central1:new-mysql",
"database" : "imdb2", "type" : "MYSQL" }'
--connection_credential='{"username":"my_username",
"password":"my_password"}' federation-test.us.test-mysql
API
See the
projects.locations.connections.patch
method
in the REST API reference section, and supply an instance of the connection
resource.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
Delete a connection resource
You can delete a connection resource if you have the correct permissions to do so.
Required permissions
To delete a connection resource, you need the bigquery.connections.delete
IAM permission.
Each of the following predefined IAM roles includes the permissions that you need in order to delete a connection resource:
roles/bigquery.admin
roles/bigquery.connectionAdmin
For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Deleting a connection resource
To delete a connection:
Console
In the Google Cloud console, go to the BigQuery page.
Connection resources are listed in your project, in a group called External connections.
In the Explorer panel, click your project name > External connections > a connection resource to view the connection's details.
In the Details panel, click
Delete to delete the connection resource.In the Delete connection? dialog box, enter
delete
to confirm deletion.Click Delete.
bq
Enter the bq rm
command and supply the connection flag:
--connection
. The fully qualified connection_id
is required.
bq rm --connection project.location.connection_id
For example, the following command updates the connection resources in a
project with the ID federation-test
and connection ID test-mysql
.
bq rm --connection federation-test.us.test-mysql
API
See the
projects.locations.connections.delete
method
in the REST API reference section.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
Audit logging
For audit logging of connection resources, see BigQuery audit logs overview.
Troubleshooting
This section helps you troubleshoot issues you might encounter when setting up a new connection. This section does not encompass all possible error messages or issues.
When diagnosing general connection issues, verify the following:
- You have completed all the steps in the Before you begin section.
- The connection configuration properties are correct.
- You have the appropriate permissions to create a connection.
If your connection properties are correct, and the appropriate permissions are granted, refer to the following for solutions to common issues.
- Issue: BigQuery and the external data source are not colocated.
- Resolution: Federated querying is only supported in regions
that support both BigQuery and the external data source. The
BigQuery dataset and the data source instance must be in the same
region, or, the BigQuery dataset must be in a
multi-region location such as
US
andEU
in the same geographic area that is compatible with the data source region. For information about regions and region compatibility, see Supported regions. - Issue: Performance is slower than expected.
- Resolution: Performance of federated queries is not as high as querying data stored in BigQuery because the federated query has to externally query the source data, return the data to a temporary BigQuery table, map the data to a BigQuery data type, then execute the query in BigQuery. Although the query performance is not as high, the data does not need to be copied, moved, or stored again.
- Issue: How to format the connection name.
- Resolution: The connection name should include project, location,
and connection ID. The connection ID should conform to this pattern:
project_id.location_id.connection_id
, for example,federation-test.us.my_new_connection