Create Google Drive external tables

This document describes how to create an external table over data stored in Google Drive.

BigQuery supports external tables over both personal Drive files and shared files. For more information on Drive, see Drive training and help.

You can create external tables over files in Drive that have the following formats:

  • Comma-separated values (CSV)
  • Newline-delimited JSON
  • Avro
  • Google Sheets

Before you begin

Before you create an external table, gather some information and make sure you have permission to create the table.

Retrieve Drive URIs

To create an external table for a Google Drive data source, you must provide the Drive URI. You can retrieve the Drive URI directly from the URL of your Drive data:

URI format

  • https://docs.google.com/spreadsheets/d/FILE_ID

    or

  • https://drive.google.com/open?id=FILE_ID

where FILE_ID is the alphanumeric ID for your Drive file.

Enable Drive access

Accessing data hosted within Drive requires an additional OAuth scope. While not enabled by default, it can be included in the Google Cloud console, in the bq command-line tool, or via the API via the following mechanisms:

Console

Follow the web-based authentication steps when creating a permanent table in the Google Cloud console. When you are prompted, click Allow to give BigQuery Client Tools access to Drive.

gcloud

To enable Drive access:

  1. Enter the following command to ensure that you have the latest version of the Google Cloud CLI.

    gcloud components update
    
  2. Enter the following command to authenticate with Drive.

    gcloud auth login --enable-gdrive-access
    

API

If you are using the BigQuery API, request the OAuth scope for Drive in addition to the scope for BigQuery.

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

from google.cloud import bigquery
import google.auth

# Create credentials with Drive & BigQuery API scopes.
# Both APIs must be enabled for your project before running this code.
#
# If you are using credentials from gcloud, you must authorize the
# application first with the following command:
#
# gcloud auth application-default login \
#   --scopes=https://www.googleapis.com/auth/drive,https://www.googleapis.com/auth/cloud-platform
credentials, project = google.auth.default(
    scopes=[
        "https://www.googleapis.com/auth/drive",
        "https://www.googleapis.com/auth/cloud-platform",
    ]
)

# Construct a BigQuery client object.
client = bigquery.Client(credentials=credentials, project=project)

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

GoogleCredentials credentials =
    ServiceAccountCredentials.getApplicationDefault()
        .createScoped(
            ImmutableSet.of(
                "https://www.googleapis.com/auth/bigquery",
                "https://www.googleapis.com/auth/drive"));

// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests.
BigQuery bigquery =
    BigQueryOptions.newBuilder().setCredentials(credentials).build().getService();

Required roles

To create an external table, you need the bigquery.tables.create BigQuery Identity and Access Management (IAM) permission.

Each of the following predefined Identity and Access Management roles includes this permission:

  • BigQuery Data Editor (roles/bigquery.dataEditor)
  • BigQuery Data Owner (roles/bigquery.dataOwner)
  • BigQuery Admin (roles/bigquery.admin)

If you are not a principal in any of these roles, ask your administrator to grant you access or to create the external table for you.

For more information on Identity and Access Management roles and permissions in BigQuery, see Predefined roles and permissions.

Create external tables

You can create a permanent table linked to your external data source by:

To create an external table:

Console

  1. In the Google Cloud console, open the BigQuery page.

Go to BigQuery

  1. In the Explorer panel, expand your project and select a dataset.

  2. Expand the Actions option and click Open.

  3. In the details panel, click Create table .

  4. On the Create table page, in the Source section:

    • For Create table from, select Drive.

    • In the Select Drive URI field, enter the Drive URI. Note that wildcards are not supported for Drive URIs.

    • For File format, select the format of your data. Valid formats for Drive data include:

      • Comma-separated values (CSV)
      • Newline delimited JSON
      • Avro
      • Sheets
  5. (Optional) If you choose Sheets, in the Sheet range (Optional) box, specify the sheet and cell range to query. You can specify a sheet name, or you can specify sheet_name!top_left_cell_id:bottom_right_cell_id for a cell range; for example, "Sheet1!A1:B20". If Sheet range is not specified, the first sheet in the file is used.

  6. On the Create table page, in the Destination section:

    • For Dataset name, choose the appropriate dataset, and in the Table name field, enter the name of the table you're creating in BigQuery.

      Select dataset

    • Verify that Table type is set to External table.

  7. In the Schema section, enter the schema definition.

    • For JSON or CSV files, you can check the Auto-detect option to enable schema auto-detect. Auto-detect is not available for Datastore exports, Firestore exports, and Avro files. Schema information for these file types is automatically retrieved from the self-describing source data.
    • Enter schema information manually by:
      • Enabling Edit as text and entering the table schema as a JSON array. Note: You can view the schema of an existing table in JSON format by entering the following command in the bq command-line tool: bq show --format=prettyjson DATASET.TABLE.
      • Using Add field to manually input the schema.
  8. Click Create table.

  9. If necessary, select your account and then click Allow to give the BigQuery client tools access to Drive.

You can then run a query against the table as if it were a standard BigQuery table, subject to the limitations on external data sources.

After your query completes, you can download the results as CSV or JSON, save the results as a table, or save the results to Sheets. See Download, save, and export data for more information.

bq

You create a table in the bq command-line tool using the bq mk command. When you use the bq command-line tool to create a table linked to an external data source, you can identify the table's schema using:

  • A table definition file (stored on your local machine)
  • An inline schema definition
  • A JSON schema file (stored on your local machine)

To create a permanent table linked to your Drive data source using a table definition file, enter the following command.

bq mk \
--external_table_definition=DEFINITION_FILE \
DATASET.TABLE

Where:

  • DEFINITION_FILE is the path to the table definition file on your local machine.
  • DATASET is the name of the dataset that contains the table.
  • TABLE is the name of the table you're creating.

For example, the following command creates a permanent table named mytable using a table definition file named mytable_def.

bq mk --external_table_definition=/tmp/mytable_def mydataset.mytable

To create a permanent table linked to your external data source using an inline schema definition, enter the following command.

bq mk \
--external_table_definition=SCHEMA@SOURCE_FORMAT=DRIVE_URI \
DATASET.TABLE

Where:

  • SCHEMA is the schema definition in the format FIELD:DATA_TYPE,FIELD:DATA_TYPE.
  • SOURCE_FORMAT is CSV, NEWLINE_DELIMITED_JSON, AVRO, or GOOGLE_SHEETS.
  • DRIVE_URI is your Drive URI.
  • DATASET is the name of the dataset that contains the table.
  • TABLE is the name of the table you're creating.

For example, the following command creates a permanent table named sales linked to a Sheets file stored in Drive with the following schema definition: Region:STRING,Quarter:STRING,Total_sales:INTEGER.

bq mk \
--external_table_definition=Region:STRING,Quarter:STRING,Total_sales:INTEGER@GOOGLE_SHEETS=https://drive.google.com/open?id=1234_AbCD12abCd \
mydataset.sales

To create a permanent table linked to your external data source using a JSON schema file, enter the following command.

bq mk \
--external_table_definition=SCHEMA_FILE@SOURCE_FORMAT=DRIVE_URI \
DATASET.TABLE

Where:

  • SCHEMA_FILE is the path to the JSON schema file on your local machine.
  • SOURCE_FORMAT is CSV, NEWLINE_DELIMITED_JSON, AVRO, or GOOGLE_SHEETS.
  • DRIVE_URI is your Drive URI.
  • DATASET is the name of the dataset that contains the table.
  • TABLE is the name of the table you're creating.

For example, the following command creates a table named sales linked to a CSV file stored in Drive using the /tmp/sales_schema.json schema file.

bq mk \
--external_table_definition=/tmp/sales_schema.json@CSV=https://drive.google.com/open?id=1234_AbCD12abCd \
mydataset.sales

After the permanent table is created, you can then run a query against the table as if it were a standard BigQuery table, subject to the limitations on external data sources.

After your query completes, you can download the results as CSV or JSON, save the results as a table, or save the results to Sheets. See Download, save, and export data for more information.

API

Create an ExternalDataConfiguration when you use the tables.insert API method. Specify the schema property or set the autodetect property to true to enable schema auto detection for supported data sources.

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

from google.cloud import bigquery
import google.auth

# Create credentials with Drive & BigQuery API scopes.
# Both APIs must be enabled for your project before running this code.
#
# If you are using credentials from gcloud, you must authorize the
# application first with the following command:
#
# gcloud auth application-default login \
#   --scopes=https://www.googleapis.com/auth/drive,https://www.googleapis.com/auth/cloud-platform
credentials, project = google.auth.default(
    scopes=[
        "https://www.googleapis.com/auth/drive",
        "https://www.googleapis.com/auth/bigquery",
    ]
)

# Construct a BigQuery client object.
client = bigquery.Client(credentials=credentials, project=project)

# TODO(developer): Set dataset_id to the ID of the dataset to fetch.
# dataset_id = "your-project.your_dataset"

# Configure the external data source.
dataset = client.get_dataset(dataset_id)
table_id = "us_states"
schema = [
    bigquery.SchemaField("name", "STRING"),
    bigquery.SchemaField("post_abbr", "STRING"),
]
table = bigquery.Table(dataset.table(table_id), schema=schema)
external_config = bigquery.ExternalConfig("GOOGLE_SHEETS")
# Use a shareable link or grant viewing access to the email address you
# used to authenticate with BigQuery (this example Sheet is public).
sheet_url = (
    "https://docs.google.com/spreadsheets"
    "/d/1i_QCL-7HcSyUZmIbP9E6lO_T5u3HnpLe7dnpHaijg_E/edit?usp=sharing"
)
external_config.source_uris = [sheet_url]
options = external_config.google_sheets_options
assert options is not None
options.skip_leading_rows = 1  # Optionally skip header row.
options.range = (
    "us-states!A20:B49"  # Optionally set range of the sheet to query from.
)
table.external_data_configuration = external_config

# Create a permanent table linked to the Sheets file.
table = client.create_table(table)  # Make an API request.

# Example query to find states starting with "W".
sql = 'SELECT * FROM `{}.{}` WHERE name LIKE "W%"'.format(dataset_id, table_id)

results = client.query_and_wait(sql)  # Make an API request.

# Wait for the query to complete.
w_states = list(results)
print(
    "There are {} states with names starting with W in the selected range.".format(
        len(w_states)
    )
)

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

import com.google.auth.oauth2.GoogleCredentials;
import com.google.auth.oauth2.ServiceAccountCredentials;
import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.ExternalTableDefinition;
import com.google.cloud.bigquery.Field;
import com.google.cloud.bigquery.GoogleSheetsOptions;
import com.google.cloud.bigquery.QueryJobConfiguration;
import com.google.cloud.bigquery.Schema;
import com.google.cloud.bigquery.StandardSQLTypeName;
import com.google.cloud.bigquery.TableId;
import com.google.cloud.bigquery.TableInfo;
import com.google.cloud.bigquery.TableResult;
import com.google.common.collect.ImmutableSet;
import java.io.IOException;

// Sample to queries an external data source using a permanent table
public class QueryExternalSheetsPerm {

  public static void main(String[] args) {
    // TODO(developer): Replace these variables before running the sample.
    String datasetName = "MY_DATASET_NAME";
    String tableName = "MY_TABLE_NAME";
    String sourceUri =
        "https://docs.google.com/spreadsheets/d/1i_QCL-7HcSyUZmIbP9E6lO_T5u3HnpLe7dnpHaijg_E/edit?usp=sharing";
    Schema schema =
        Schema.of(
            Field.of("name", StandardSQLTypeName.STRING),
            Field.of("post_abbr", StandardSQLTypeName.STRING));
    String query =
        String.format("SELECT * FROM %s.%s WHERE name LIKE 'W%%'", datasetName, tableName);
    queryExternalSheetsPerm(datasetName, tableName, sourceUri, schema, query);
  }

  public static void queryExternalSheetsPerm(
      String datasetName, String tableName, String sourceUri, Schema schema, String query) {
    try {

      // Create credentials with Drive & BigQuery API scopes.
      // Both APIs must be enabled for your project before running this code.
      GoogleCredentials credentials =
          ServiceAccountCredentials.getApplicationDefault()
              .createScoped(
                  ImmutableSet.of(
                      "https://www.googleapis.com/auth/bigquery",
                      "https://www.googleapis.com/auth/drive"));

      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery =
          BigQueryOptions.newBuilder().setCredentials(credentials).build().getService();

      // Skip header row in the file.
      GoogleSheetsOptions sheetsOptions =
          GoogleSheetsOptions.newBuilder()
              .setSkipLeadingRows(1) // Optionally skip header row.
              .setRange("us-states!A20:B49") // Optionally set range of the sheet to query from.
              .build();

      TableId tableId = TableId.of(datasetName, tableName);
      // Create a permanent table linked to the Sheets file.
      ExternalTableDefinition externalTable =
          ExternalTableDefinition.newBuilder(sourceUri, sheetsOptions).setSchema(schema).build();
      bigquery.create(TableInfo.of(tableId, externalTable));

      // Example query to find states starting with 'W'
      TableResult results = bigquery.query(QueryJobConfiguration.of(query));

      results
          .iterateAll()
          .forEach(row -> row.forEach(val -> System.out.printf("%s,", val.toString())));

      System.out.println("Query on external permanent table performed successfully.");
    } catch (BigQueryException | InterruptedException | IOException e) {
      System.out.println("Query not performed \n" + e.toString());
    }
  }
}

Query external tables

For more information, see Query Drive data.

The _FILE_NAME pseudo-column

Tables based on external data sources provide a pseudo column named _FILE_NAME. This column contains the fully qualified path to the file to which the row belongs. This column is available only for tables that reference external data stored in Cloud Storage and Google Drive.

The _FILE_NAME column name is reserved, which means that you cannot create a column by that name in any of your tables.