Work with connections to external data sources

Overview

The BigQuery Connection API enables users to set up a connection from BigQuery to an external data source. You can use the connection to submit queries against the external data source from BigQuery without moving or copying data into BigQuery.

You must complete an initial one-time setup to create a connection resource in BigQuery. After that you can perform the following tasks:

Before you begin

Enable the BigQuery connection service

  1. Open the BigQuery Connection API page in the API library.
  2. From the drop-down menu, select the project that contains your external data source.
  3. Click the ENABLE button.

    BigQuery Connection API

Service Account

BigQuery uses a service account to connect to your external data source. When you enable the BigQuery Connection API, a Google Cloud–managed Identity and Access Management (IAM) service account is automatically created on your behalf.

Required permissions

Grant IAM roles that give users the necessary permissions to perform each task in this document. The permissions required to perform a task (if any) are listed in the Required permissions section of the task.

To create and maintain a connection resource, you need the following IAM permissions:

  • bigquery.connections.create
  • bigquery.connections.get
  • bigquery.connections.list
  • bigquery.connections.update
  • bigquery.connections.use
  • bigquery.connections.delete

The predefined IAM role roles/bigquery.admin includes the permissions that you need in order to create and maintain a connection resource.

To grant permissions to another user so they can use the connection resource, see Share a connection resource.

Granting bigquery.admin access

To grant the bigquery.admin role:

Console

  1. Open the IAM page in the Cloud Console

    Open the IAM page

  2. Click Select a project.

  3. Select a project and click Open.

  4. Click Add to add new members to the project and set their permissions.

  5. In the Add members dialog:

    • For Members, enter the email address of the user or group.
    • In the Select a role drop-down, click BigQuery > BigQuery Admin.
    • Click Add.

      Grant admin

gcloud

You can use the gcloud command-line tool to grant a user or group the bigquery.admin role.

To add a single binding to your project's IAM policy, type the following command. To add a user, supply the --member flag in the format user:user@example.com. To add a group, supply the --member flag in the format group:group@example.com.

gcloud projects add-iam-policy-binding project_id \
--member principal:address \
--role roles/bigquery.admin

Where:

  • project_id is your project ID.
  • principal is either group or user.
  • address is the user or group's email address.

For example:

gcloud projects add-iam-policy-binding myproject \
--member group:group@example.com \
--role roles/bigquery.admin

The command outputs the updated policy:

    bindings:
    - members:
      - group:group@example.com
        role: roles/bigquery.admin
    

For more information on IAM roles in BigQuery, see Predefined roles and permissions.

Create a connection resource

To avoid writing database credentials as cleartext in a federated query, you need to first create a database connection resource per database in BigQuery and then reference the connection resource in your federated query.

The connection resource has a set of IAM permissions, which you can grant to other users. The connection resource is encrypted and stored securely in the BigQuery connection service, and it can only be used for federated queries.

For more information about IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Console

  1. To create a connection resource, go to the BigQuery page in the Cloud Console.

    Go to the BigQuery page

  2. In the Add data menu, select External data source.

  3. In the External data source dialog, enter the following information:

    • For Connection type, select the type of source, for example MySQL or Postgres.
    • For Connection ID, enter an identifier for the connection resource. Letter, numbers, and underscores are allowed.
    • For Data location, select a BigQuery location (or region) that is compatible with your external data source region.
    • Optional: For Friendly name, enter a user-friendly name for the connection, such as My connection resource. The friendly name can be any value that helps you identify the connection resource if you need to modify it later.
    • Optional: For Description, enter a description for this connection resource.
    • If you chose Cloud SQL MySQL or Postgres for the connection type, for Cloud SQL instance ID, enter the full name of the Cloud SQL instance, usually in the format project-id:location-id:instance-id. You can find the instance ID on the detail page of the Cloud SQL instance you want to query.
    • For Database name, enter the name of the database.
    • For Database username, enter the username for the database.
    • For Database password, enter the password for the database.

      • Optional: To see the password, click Show password.
  4. Click Create connection.

bq

Enter the bq mk command and supply the connection flag: --connection. The following flags are also required:

  • --connection_type
  • --properties
  • --connection_credential
  • --project_id
  • --location

The following flags are optional:

  • --display_name The friendly name for the connection.
  • --description A description of the connection.

The connection_id is an optional parameter that can be added as the last argument of the command which is used for storage internally. If a connection ID is not provided a unique ID is automatically generated. The connection_id can contain letters, numbers, and underscores.

    bq mk --connection --display_name='friendly name' --connection_type=TYPE \
      --properties=PROPERTIES --connection_credential=CREDENTIALS \
      --project_id=PROJECT_ID --location=LOCATION \
      CONNECTION_ID

Replace the following:

  • TYPE: the type of the external data source.
  • PROPERTIES: the parameters for the created connection in JSON format. For example: --properties='{"param":"param_value"}'. For creating a connection resource, you must supply the instanceID, database, and type parameters.
  • CREDENTIALS: the parameters username and password.
  • PROJECT_ID: your project ID.
  • LOCATION: the region your Cloud SQL instance is located in.
  • CONNECTION_ID: the connection identifier.

For example, the following command creates a new connection resource named my_new_connection (friendly name: "My new connection") in a project with the ID federation-test.

bq mk --connection --display_name='friendly name' --connection_type='CLOUD_SQL' \
  --properties='{"instanceId":"federation-test:us-central1:mytestsql","database":"mydatabase","type":"MYSQL"}' \
  --connection_credential='{"username":"myusername", "password":"mypassword"}' \
  --project_id=federation-test --location=us my_connection_id

API

Within the BigQuery Connection API, you can invoke CreateConnection within the ConnectionService to instantiate a connection. See the client library page for more details.

You can also create a connection using the programming language specific client libraries. See the connection type topics for language specific examples.

Query a connection resource

Once a connection is established, you can use the EXTERNAL_QUERY() function to execute a federated query. See Federated query syntax for more information and sample queries.

Query another database through the same connection resource

When you create a connection to an external data source, you specify credentials for that data source. If the same user credentials are valid for other data in the source, the same connection resource can be used.

Get information about a connection resource

After you create a connection resource, you can get information about the connection resource's configuration. The configuration includes the values you supplied when you created the transfer.

Required permissions

To get information about a connection resource, you need the bigquery.connections.get IAM permission.

Each of the following predefined IAM roles includes the permissions that you need in order to get information about a connection resource:

  • roles/bigquery.admin
  • roles/bigquery.connection.admin
  • roles/bigquery.connection.user

For more information about IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Viewing a connection resource

Console

  1. To view the status of your connection resources, go to the BigQuery page in the Cloud Console.

    Go to BigQuery

    Connection resources are listed in your project, in a group called External connections.

  2. In the Explorer panel, click your project name > External connections > a connection resource to see information about a connection resource.

bq

Enter the bq show command and supply the connection flag: --connection. The fully qualified connection_id is required.

    bq show --connection project.location.connection_id

For example, the following command gets information about a connection resource named my_new_connection in a project with the ID federation-test located in the region us.

    bq show --connection federation-test.us.my_new_connection

API

Use the projects.locations.connections.get method in the REST API reference section.

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

import com.google.cloud.bigquery.connection.v1.Connection;
import com.google.cloud.bigquery.connection.v1.ConnectionName;
import com.google.cloud.bigquery.connection.v1.GetConnectionRequest;
import com.google.cloud.bigqueryconnection.v1.ConnectionServiceClient;
import java.io.IOException;

// Sample to get connection
public class GetConnection {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "MY_PROJECT_ID";
    String location = "MY_LOCATION";
    String connectionId = "MY_CONNECTION_ID";
    getConnection(projectId, location, connectionId);
  }

  public static void getConnection(String projectId, String location, String connectionId)
      throws IOException {
    try (ConnectionServiceClient client = ConnectionServiceClient.create()) {
      ConnectionName name = ConnectionName.of(projectId, location, connectionId);
      GetConnectionRequest request =
          GetConnectionRequest.newBuilder().setName(name.toString()).build();
      Connection response = client.getConnection(request);
      System.out.println("Connection info retrieved successfully :" + response.getName());
    }
  }
}

List all connection resources

Required permissions

To list all the connection resources in a project, you need the bigquery.connections.list IAM permission.

Each of the following predefined IAM roles includes the permissions that you need in order to list all the connection resources:

  • roles/bigquery.admin
  • roles/bigquery.connection.admin
  • roles/bigquery.connection.user

For more information about IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Listing connection resources

To list all the connection resources in a project:

Console

  1. To view the status of a connection resource, go to the BigQuery page in the Cloud Console.

    Go to BigQuery

    Connection resources are listed in your project, in a group called External connections.

  2. In the Explorer panel, click your project name > External connections to see a list of all connection resources.

bq

Enter the bq show command and supply the connection flag: --connection. The fully qualified connection_id is required.

    bq ls --connection --project_id=[project_id] --location=[location]

For example, the following command lists the connection resources in a project with the ID federation-test located in the region us.

    bq ls --connection --project_id=bigquery-federation-test --location=us

API

Use the projects.locations.connections.list method in the REST API reference section.

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

import com.google.cloud.bigquery.connection.v1.ListConnectionsRequest;
import com.google.cloud.bigquery.connection.v1.LocationName;
import com.google.cloud.bigqueryconnection.v1.ConnectionServiceClient;
import java.io.IOException;

// Sample to get list of connections
public class ListConnections {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "MY_PROJECT_ID";
    String location = "MY_LOCATION";
    listConnections(projectId, location);
  }

  public static void listConnections(String projectId, String location) throws IOException {
    try (ConnectionServiceClient client = ConnectionServiceClient.create()) {
      LocationName parent = LocationName.of(projectId, location);
      int pageSize = 10;
      ListConnectionsRequest request =
          ListConnectionsRequest.newBuilder()
              .setParent(parent.toString())
              .setPageSize(pageSize)
              .build();
      client
          .listConnections(request)
          .iterateAll()
          .forEach(con -> System.out.println("Connection Id :" + con.getName()));
    }
  }
}

Share a connection resource

Required permissions

To allow another user to use a connection resource for Cloud SQL federated queries, the bigquery.admin user can grant the two different roles in IAM:

  • roles/bigquery.connectionUser

    The BigQuery Connection User can run queries with the connection resource.

    To run queries with the connection resource, you need the following IAM permissions:

    • bigquery.connections.get
    • bigquery.connections.list
    • bigquery.connections.use
    • bigquery.connections.getIamPolicy

    The predefined IAM role roles/bigquery.connection.user includes the permissions that you need in order to run queries with the connection resource.

  • roles/bigquery.connectionAdmin

    The BigQuery Connection Admin can manage the connection resources. In addition to the permissions included in the BigQuery Connection User role, this role includes permissions to create, update, and delete the connection resources, and to set IAM policy on the connection resources.

    To manage the connection resources, you need the following IAM permissions:

    • bigquery.connections.get
    • bigquery.connections.list
    • bigquery.connections.use
    • bigquery.connections.getIamPolicy
    • bigquery.connections.create
    • bigquery.connections.update
    • bigquery.connections.setIamPolicy
    • bigquery.connections.delete

    The predefined IAM role roles/bigquery.connection.admin includes the permissions that you need in order to manage the connection resources.

    For more information about IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Sharing a connection resource

Console

  1. To share a connection resource, go to the BigQuery page in the Cloud Console.

    Go to BigQuery

    Connection resources are listed in your project, in a group called External connections.

  2. In the Explorer panel, click your project name > External connections > a connection resource to view the connection's details.

  3. In the Details panel, click Share to share a connection resource. Then do the following:

    1. In the Connection permissions dialog, users with the role BigQuery Admin or BigQuery Connection Admin can share the connection resource with other principals by adding or editing principals.

    2. Click Save.

bq

Currently, you cannot share a resource with the bq command-line tool. To share a connection resource, use the Cloud Console or the BigQuery Connections API method to share a connection.

API

See the projects.locations.connections.setIAM method in the BigQuery Connections REST API reference section, and supply an instance of the policy resource.

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

import com.google.api.resourcenames.ResourceName;
import com.google.cloud.bigquery.connection.v1.ConnectionName;
import com.google.cloud.bigqueryconnection.v1.ConnectionServiceClient;
import com.google.iam.v1.Binding;
import com.google.iam.v1.Policy;
import com.google.iam.v1.SetIamPolicyRequest;
import java.io.IOException;

// Sample to share connections
public class ShareConnection {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "MY_PROJECT_ID";
    String location = "MY_LOCATION";
    String connectionId = "MY_CONNECTION_ID";
    shareConnection(projectId, location, connectionId);
  }

  public static void shareConnection(String projectId, String location, String connectionId)
      throws IOException {
    try (ConnectionServiceClient client = ConnectionServiceClient.create()) {
      ResourceName resource = ConnectionName.of(projectId, location, connectionId);
      Binding binding =
          Binding.newBuilder()
              .addMembers("group:example-analyst-group@google.com")
              .setRole("roles/bigquery.connectionUser")
              .build();
      Policy policy = Policy.newBuilder().addBindings(binding).build();
      SetIamPolicyRequest request =
          SetIamPolicyRequest.newBuilder()
              .setResource(resource.toString())
              .setPolicy(policy)
              .build();
      client.setIamPolicy(request);
      System.out.println("Connection shared successfully");
    }
  }
}

Edit or update a connection resource

Once the connection resource is added, you can edit the connection resource. You can edit most of the fields populated during connection resource creation, including the username credentials.

A connection uses the credentials of the user who created it. If you need to change the user attached to a connection resource, you can update the user's credentials. This is useful if the user who created the connection is no longer with your organization.

You cannot edit the following elements of a connection resource:

  • Connection type
  • Connection ID
  • Location

Required permissions

To update a connection resource, you need the bigquery.connections.update IAM permission.

Each of the following predefined IAM roles includes the permissions that you need in order to update a connection resource:

  • roles/bigquery.admin
  • roles/bigquery.connection.admin

For more information about IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Editing a connection resource

To edit a connection:

Console

  1. In the Cloud Console, go to the BigQuery page.

    Go to BigQuery

    Connection resources are listed in your project, in a group called External connections.

  2. In the Explorer panel, click your project name > External connections > a connection resource to view the connection's details.

  3. In the Details panel, to edit details, click Edit details. Then do the following:

    1. In the Edit connection dialog, edit the connection details including the user credentials.

    2. Click Update connection.

bq

Enter the bq update command and supply the connection flag: --connection. The fully qualified connection_id is required.

    bq update --connection --connection_type='CLOUD_SQL'
   --properties='{"instanceId" : "instance",
   "database" : "db", "type" : "MYSQL" }'
   --connection_credential='{"username":"u", "password":"p"}'
   project.location.connection_id

For example, the following command updates the connection resources in a project with the ID federation-test and connection ID test-mysql.

    bq update --connection --connection_type='CLOUD_SQL'
   --properties='{"instanceId" : "federation-test:us-central1:new-mysql",
   "database" : "imdb2", "type" : "MYSQL" }'
   --connection_credential='{"username":"my_username",
   "password":"my_password"}' federation-test.us.test-mysql

API

See the projects.locations.connections.patch method in the REST API reference section, and supply an instance of the connection resource.

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

import com.google.cloud.bigquery.connection.v1.Connection;
import com.google.cloud.bigquery.connection.v1.ConnectionName;
import com.google.cloud.bigquery.connection.v1.UpdateConnectionRequest;
import com.google.cloud.bigqueryconnection.v1.ConnectionServiceClient;
import com.google.protobuf.FieldMask;
import com.google.protobuf.util.FieldMaskUtil;
import java.io.IOException;

// Sample to update connection
public class UpdateConnection {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "MY_PROJECT_ID";
    String location = "MY_LOCATION";
    String connectionId = "MY_CONNECTION_ID";
    String description = "MY_DESCRIPTION";
    Connection connection = Connection.newBuilder().setDescription(description).build();
    updateConnection(projectId, location, connectionId, connection);
  }

  public static void updateConnection(
      String projectId, String location, String connectionId, Connection connection)
      throws IOException {
    try (ConnectionServiceClient client = ConnectionServiceClient.create()) {
      ConnectionName name = ConnectionName.of(projectId, location, connectionId);
      FieldMask updateMask = FieldMaskUtil.fromString("description");
      UpdateConnectionRequest request =
          UpdateConnectionRequest.newBuilder()
              .setName(name.toString())
              .setConnection(connection)
              .setUpdateMask(updateMask)
              .build();
      Connection response = client.updateConnection(request);
      System.out.println("Connection updated successfully :" + response.getDescription());
    }
  }
}

Delete a connection resource

You can delete a connection resource if you have the correct permissions to do so.

Required permissions

To delete a connection resource, you need the bigquery.connections.delete IAM permission.

Each of the following predefined IAM roles includes the permissions that you need in order to delete a connection resource:

  • roles/bigquery.admin
  • roles/bigquery.connection.admin

For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Deleting a connection resource

To delete a connection:

Console

  1. In the Google Cloud Console, go to the BigQuery page.

    Go to BigQuery

    Connection resources are listed in your project, in a group called External connections.

  2. In the Explorer panel, click your project name > External connections > a connection resource to view the connection's details.

  3. In the Details panel, click Delete to delete the connection resource.

  4. In the Delete connection? dialog box, enter delete to confirm deletion.

  5. Click Delete.

bq

Enter the bq rm command and supply the connection flag: --connection. The fully qualified connection_id is required.

    bq rm --connection project.location.connection_id

For example, the following command updates the connection resources in a project with the ID federation-test and connection ID test-mysql.

    bq rm --connection federation-test.us.test-mysql

API

See the projects.locations.connections.delete method in the REST API reference section.

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

import com.google.cloud.bigquery.connection.v1.ConnectionName;
import com.google.cloud.bigquery.connection.v1.DeleteConnectionRequest;
import com.google.cloud.bigqueryconnection.v1.ConnectionServiceClient;
import java.io.IOException;

// Sample to delete a connection
public class DeleteConnection {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "MY_PROJECT_ID";
    String location = "MY_LOCATION";
    String connectionName = "MY_CONNECTION_NAME";
    deleteConnection(projectId, location, connectionName);
  }

  public static void deleteConnection(String projectId, String location, String connectionName)
      throws IOException {
    try (ConnectionServiceClient client = ConnectionServiceClient.create()) {
      ConnectionName name = ConnectionName.of(projectId, location, connectionName);
      DeleteConnectionRequest request =
          DeleteConnectionRequest.newBuilder().setName(name.toString()).build();
      client.deleteConnection(request);
      System.out.println("Connection deleted successfully");
    }
  }
}

Audit logging

For audit logging of connection resources, see BigQuery audit logs overview.

Troubleshooting

This section helps you troubleshoot issues you might encounter when setting up a new connection. This section does not encompass all possible error messages or issues.

When diagnosing general connection issues, verify the following:

  • You have completed all the steps in the Before you begin section.
  • The connection configuration properties are correct.
  • You have the appropriate permissions to create a connection.

If your connection properties are correct, and the appropriate permissions are granted, refer to the following for solutions to common issues.

Issue: BigQuery and the external data source are not colocated.
Resolution: Federated querying is only supported in regions that support both BigQuery and the external data source. The BigQuery dataset and the data source instance must be in the same region, or, the BigQuery dataset must be in a multi-region location such as US and EU in the same geographic area that is compatible with the data source region. For information about regions and region compatibility, see Supported regions.
Issue: Performance is slower than expected.
Resolution: Performance of federated queries is not as high as querying data stored in BigQuery because the federated query has to externally query the source data, return the data to a temporary BigQuery table, map the data to a BigQuery data type, then execute the query in BigQuery. Although the query performance is not as high, the data does not need to be copied, moved, or stored again.
Issue: How to format the connection name.
Resolution: The connection name should include project, location, and connection ID. The connection ID should conform to this pattern: project_id.location_id.connection_id, for example, federation-test.us.my_new_connection