Querying Google Drive Data

This page describes how to use BigQuery to query data stored in Google Drive.

BigQuery supports queries against both personal Google Drive files and Team Drive files. For more information on Google Drive, see the GSuite Learning Center.

You can query files in Google Drive in the following formats:

To directly query a Google Drive external data source, provide the Google Drive URI path to your data and create an external table that references that data source. The table used to reference the Google Drive data source can be a permanent table or a temporary table.

Retrieving the Google Drive URI

To create an external table for a Google Drive data source, you must provide the Google Drive URI. To retrieve the Google Drive URI:

  1. Go to Google Drive.

  2. Right-click your file and select Get shareable link. The URI should look like this: https://drive.google.com/open?id=[FILE_ID].

    Where:

    • [FILE_ID] is the alphanumeric id for your Google Drive file

Enabling Google Drive access

Web UI


Follow the web-based authentication steps when creating a permanent table in the web UI. When you are prompted, click Allow to give BigQuery Client Tools access to Google Drive.

CLI


To enable Google Drive access:

  1. Enter the following command to ensure that you have the latest version of the command-line tool.

    gcloud components update
    
  2. Enter the following command to authenticate with Google Drive.

    gcloud auth login --enable-gdrive-access
    

API


If you are using the BigQuery API, request the OAuth scope for Google Drive in addition to the scope for BigQuery.

Access controls and scopes

Access controls for permanent external tables

You can share access to a permanent external table linked to a Google Drive data source. You can share access with users (including service accounts) or groups. To query the external table, your users or groups need (at a minimum):

  • READER or bigquery.dataViewer access to the dataset containing the external table
  • bigquery.user access to the project containing the dataset (in order to run query jobs)
  • Can view access to the Google Drive file linked to the external table

Scopes for Compute Engine instances

When you create a Compute Engine instance, you can specify a list of scopes for the instance. The scopes control the instance's access to Google Cloud Platform products, including Google Drive. Applications running on the VM use the service account to call Google Cloud APIs.

If you set up a Compute Engine instance to run as a service account, and that service account accesses an external table linked to a Google Drive data source, you must add the OAuth scope for Google Drive (https://www.googleapis.com/auth/drive) to the instance.

For information on applying scopes to a Compute Engine instance, see Changing the service account and access scopes for an instance. For more information on Compute Engine service accounts, see Service Accounts.

Permanent versus temporary external tables

You can query an external data source in BigQuery using a permanent table or a temporary table. When you use a permanent table, you create a table in a BigQuery dataset that is linked to your external data source. Because the table is permanent, you can use dataset-level access controls to share the table with others who also have access to the underlying external data source, and you can query the table at any time.

When you query an external data source using a temporary table, you submit a command that includes a query and creates a non-permanent table linked to the external data source. When you use a temporary table, you do not create a table in one of your BigQuery datasets. Because the table is not permanently stored in a dataset, it cannot be shared with others. Querying an external data source using a temporary table is useful for one-time, ad-hoc queries over external data, or for extract, transform, and load (ETL) processes.

Querying Google Drive data using permanent external tables

To query an external data source using a permanent table, you create a table in a BigQuery dataset that is linked to your external data source. The data is not stored in the BigQuery table. Because the table is permanent, you can use dataset-level access controls to share the table with others who also have access to the underlying external data source.

There are three ways to specify schema information when you create a permanent external table in BigQuery:

  • If you are using the API to create a permanent external table, you begin by creating a table definition file that defines the schema and metadata for the external data source. When you create a table definition file, you can enable schema auto-detection for supported data sources.
  • If you are using the CLI to create a permanent external table, you can use a table definition file, you can create and use your own schema file, or you can enter the schema inline (on the command line).
  • If you are using the web UI to create a permanent external table, you can enter the table schema manually or use schema auto-detection for supported data sources.

To query Google Drive data using a permanent external table, you:

  • Create a table definition file (for the API and optionally for the CLI)
  • Create a table in BigQuery linked to the external data source
  • Query the table linked to the external data source

Creating a permanent external table

You can create a permanent table linked to your external data source using the web UI, the CLI, or the API.

Web UI

  1. Go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. In the navigation panel, hover on a dataset, click the down arrow icon down arrow icon image, and click Create new table.

  3. On the Create Table page, in the Source Data section:

    • For Location, select Google Drive and in the source field, enter the Google Drive URI. Note that wildcards are not supported for Google Drive URIs.
    • For File format, select the format of your data. Valid formats for Google Drive data include:

      • Comma-separated values (CSV)
      • JSON (newline delimited)
      • Avro
      • Google Sheets (first tab only)
  4. On the Create Table page, in the Destination Table section:

    • For Table name, choose the appropriate dataset, and in the table name field, enter the name of the permanent table you're creating in BigQuery.
    • Verify that Table type is set to External table.
  5. In the Schema section, enter the schema information.

    • For JSON or CSV files, you can check the Auto-detect option to enable schema auto-detect. Schema auto-detection in the web UI is not currently supported for Google Sheets (though it is available using the CLI and API). Also, Auto-detect is not available for Avro data sources. Schema information is automatically retrieved from Avro files.

    • For CSV, JSON, or Google Sheets files, you can enter schema information manually by:

      • Clicking Edit as text and entering the table schema in JSON format
      • Using Add Field to manually input the schema
  6. Select applicable items in the Options section and then click Create Table.

  7. Select your account and then click Allow to give the BigQuery client tools access to Google Drive.

    Client tools access dialog

You can then run a query against the table as if it were a native BigQuery table, subject to the limitations on external data sources.

After your query completes, you can download the results as CSV or JSON, save the results as a table, or save the results to Google Sheets. See Download, save, and export data for more information.

CLI

You create a table in the BigQuery command-line tool using the bq mk command. When you use the CLI to create a table linked to an external data source, you can identify the table's schema using:

  • A table definition file (stored on your local machine)
  • An inline schema definition
  • A JSON schema file (stored on your local machine)

To create a permanent table linked to your Google Drive data source using a table definition file, enter the following command.

bq mk --external_table_definition=[DEFINITION_FILE] [DATASET_ID].[TABLE_NAME]

Where:

  • [DEFINITION_FILE] is the path to the table definition file on your local machine
  • [DATASET_ID] is the name of the dataset that contains the table
  • [TABLE_NAME] is the name of the table you're creating

For example, the following command creates a permanent table named mytable using a table definition file named mytable_def.

bq mk --external_table_definition=/tmp/mytable_def mydataset.mytable

To create a permanent table linked to your external data source using an inline schema definition, enter the following command.

bq mk --external_table_definition=[SCHEMA]@[SOURCE_FORMAT]=[DRIVE_URI] [DATASET_ID].[TABLE_NAME]

Where:

  • [SCHEMA] is the schema definition (in the format [FIELD]:[DATA_TYPE], [FIELD]:[DATA_TYPE])
  • [SOURCE_FORMAT] is CSV, NEWLINE_DELIMITED_JSON, AVRO, or GOOGLE_SHEETS
  • [DRIVE_URI] is your Google Drive URI
  • [DATASET_ID] is the name of the dataset that contains the table
  • [TABLE_NAME] is the name of the table you're creating

For example, the following command creates a permanent table named sales linked to a Google Sheets file stored in Google Drive with the following schema definition: Region:STRING,Quarter:STRING,Total_sales:INTEGER.

bq mk --external_table_definition=Region:STRING,Quarter:STRING,Total_sales:INTEGER@GOOGLE_SHEETS=https://drive.google.com/open?id=1234_AbCD12abCd mydataset.sales

To create a permanent table linked to your external data source using a JSON schema file, enter the following command.

bq mk --external_table_definition=[SCHEMA_FILE]@[SOURCE_FORMAT]=[DRIVE_URI] [DATASET_ID].[TABLE_NAME]

Where:

  • [SCHEMA_FILE] is the path to the JSON schema file on your local machine
  • [SOURCE_FORMAT] is CSV, NEWLINE_DELIMITED_JSON, AVRO, or GOOGLE_SHEETS
  • [DRIVE_URI] is your Google Drive URI
  • [DATASET_ID] is the name of the dataset that contains the table
  • [TABLE_NAME] is the name of the table you're creating

For example, the following command creates a table named sales linked to a CSV file stored in Google Drive using the /tmp/sales_schema.json schema file.

bq mk --external_table_definition=/tmp/sales_schema.json@CSV=https://drive.google.com/open?id=1234_AbCD12abCd mydataset.sales

After the permanent table is created, you can then run a query against the table as if it were a native BigQuery table, subject to the limitations on external data sources.

After your query completes, you can download the results as CSV or JSON, save the results as a table, or save the results to Google Sheets. See Download, save, and export data for more information.

API

Specify the external data source using the externalDataConfiguration properties.

Querying Google Drive data using temporary tables

To query an external data source without creating a permanent table, you run a command to combine:

  • A table definition file with a query
  • An inline schema definition with a query
  • A JSON schema definition file with a query

The table definition file or supplied schema is used to create the temporary external table, and the query runs against the temporary external table. Querying an external data source using a temporary table is supported by the BigQuery CLI and API.

When you use a temporary external table, you do not create a table in one of your BigQuery datasets. Because the table is not permanently stored in a dataset, it cannot be shared with others. Querying an external data source using a temporary table is useful for one-time, ad-hoc queries over external data, or for extract, transform, and load (ETL) processes.

Creating and querying a temporary table

You can create and query a temporary table linked to an external data source using the CLI or the API.

CLI

You query a temporary table linked to an external data source using the bq query command with the --external_table_definition flag. When you use the CLI to query a temporary table linked to an external data source, you can identify the table's schema using:

  • A table definition file (stored on your local machine)
  • An inline schema definition
  • A JSON schema file (stored on your local machine)

To query a temporary table linked to your external data source using a table definition file, enter the following command.

bq query --external_table_definition=[TABLE_NAME]::[DEFINITION_FILE] '[QUERY]'

Where:

  • [TABLE_NAME] is the name of the temporary table you're creating
  • [DEFINITION_FILE] is the path to the table definition file on your local machine
  • [QUERY] is the query you're submitting to the temporary table

For example, the following command creates and queries a temporary table named sales using a table definition file named sales_def.

bq query --external_table_definition=sales::sales_def 'SELECT Region,Total_sales FROM sales;'

To query a temporary table linked to your external data source using an inline schema definition, enter the following command.

bq query --external_table_definition=[TABLE_NAME]::[SCHEMA]@[SOURCE_FORMAT]=[DRIVE_URI] '[QUERY]'

Where:

  • [TABLE_NAME] is the name of the temporary table you're creating
  • [SCHEMA] is the inline schema definition (in the format [FIELD]:[DATA_TYPE],[FIELD]:[DATA_TYPE])
  • [SOURCE_FORMAT] is CSV, NEWLINE_DELIMITED_JSON, AVRO, or GOOGLE_SHEETS
  • [DRIVE_URI] is your Google Drive URI
  • [QUERY] is the query you're submitting to the temporary table

For example, the following command creates and queries a temporary table named sales linked to a CSV file stored in Google Drive with the following schema definition: Region:STRING,Quarter:STRING,Total_sales:INTEGER.

bq query --external_table_definition=sales::Region:STRING,Quarter:STRING,Total_sales:INTEGER@CSV=https://drive.google.com/open?id=1234_AbCD12abCd 'SELECT Region,Total_sales FROM sales;'

To query a temporary table linked to your external data source using a JSON schema file, enter the following command.

bq query --external_table_definition=[SCHEMA_FILE]@[SOURCE_FORMAT]=[DRIVE_URI] '[QUERY]'

Where:

  • [SCHEMA_FILE] is the path to the JSON schema file on your local machine
  • [SOURCE_FORMAT] is CSV, NEWLINE_DELIMITED_JSON, AVRO, or GOOGLE_SHEETS
  • [DRIVE_URI] is your Google Drive URI
  • [QUERY] is the query you're submitting to the temporary table

For example, the following command creates and queries a temporary table named sales linked to a CSV file stored in Google Drive using the /tmp/sales_schema.json schema file.

bq query --external_table_definition=sales::/tmp/sales_schema.json@CSV=https://drive.google.com/open?id=1234_AbCD12abCd 'SELECT Total_sales FROM sales;'

API

The _FILE_NAME pseudo column

Tables based on external data sources provide a pseudo column named _FILE_NAME. This column contains the fully qualified path to the file to which the row belongs. This column is available only for tables that reference external data stored in Google Cloud Storage and Google Drive.

The _FILE_NAME column name is reserved, which means that you cannot create a column by that name in any of your tables. To select the value of _FILE_NAME, you must use an alias. The following example query demonstrates selecting _FILE_NAME by assigning the alias fn to the pseudo column.

bq query --project_id=[PROJECT_ID] --use_legacy_sql=false 'SELECT name, _FILE_NAME AS fn from [DATASET].[TABLE_NAME] where name contains "Alex";' 

Where:

  • [PROJECT_ID] is a valid project ID (this flag is not required if you use Cloud Shell or if you set a default project in the Cloud SDK)
  • [DATASET] is the name of the dataset that stores the permanent external table
  • [TABLE_NAME] is the name of the permanent external table

Send feedback about...