Creating a Table Definition File for an External Data Source

This page describes how to create a table definition file for an external data source. An external data source (also known as a federated data source) is a data source that you can query directly even though the data is not stored in BigQuery.

Table definition files

A table definition file contains an external table's schema definition and metadata, such as the table's data format and related properties. When you create a table definition file, you can use schema auto-detection to define the schema for an external data source, you can provide the schema inline (on the command line), or you can provide a JSON file containing the schema definition.

Table definition files are used with the BigQuery CLI and API. You do not use table definition files when you create an external table using the web UI.

You can create table definition files for the following external data sources:

  • Google Cloud Storage

    • Comma-separated values (CSV)
    • JSON ( newline-delimited )
    • Avro files
    • Google Cloud Datastore backup files
  • Google Drive

    • Comma-separated values (CSV)
    • JSON (newline-delimited)
    • Avro files
    • Google Sheets (first tab only)
  • Google Cloud Bigtable

Before you begin

To create a table definition file, you need the URI for your data source:

Permanent versus temporary external tables

You can query an external data source in BigQuery using a permanent table or a temporary table. When you use a permanent table, you create a table in a BigQuery dataset that is linked to your external data source. Because the table is permanent, you can use dataset-level access controls to share the table with others who also have access to the underlying external data source, and you can query the table at any time.

When you query an external data source using a temporary table, you submit a command that includes a query and creates a non-permanent table linked to the external data source. When you use a temporary table, you do not create a table in one of your BigQuery datasets. Because the table is not permanently stored in a dataset, it cannot be shared with others. Querying an external data source using a temporary table is useful for one-time, ad-hoc queries over external data, or for extract, transform, and load (ETL) processes.

You can use a table definition file to describe a permanent or temporary external table.

Creating a table definition using schema auto-detect

If you specify a CSV, JSON, or Google Sheets file without including an inline schema description or a schema file, you can use the --autodetect flag to set the "autodetect" option to true in the table definition file. When auto-detect is enabled, BigQuery makes a best-effort attempt to automatically infer the schema. See Schema Auto-Detection in BigQuery for more information.

You can use a schema auto-detect when creating table definitions for:

  • JSON files stored in Google Cloud Storage or Google Drive
  • CSV files stored in Google Cloud Storage or Google Drive
  • Google Sheets files stored in Google Drive

JSON table definitions

You can create a table definition with auto-detect enabled for newline-delimited JSON files stored in Google Cloud Storage or Google Drive.

Google Cloud Storage

CLI

  1. Use the command-line tool's mkdef command with the --autodetect flag to create a table definition. The mkdef command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file - /tmp/[FILE_NAME]. Replace [FILE_NAME] with the name of your table definition file, and replace [BUCKET_URI] with your Cloud Storage URI, for example, gs://mybucket/myfile.

    bq mkdef --autodetect --source_format=NEWLINE_DELIMITED_JSON "[BUCKET_URI]" > /tmp/[FILE_NAME]
    
  2. (Optional) Open the table definition file in a text editor. For example, the command nano /tmp/[FILE_NAME] opens the file in nano. The file should look like the following. Notice "autodetect" is set to true.

    {
      "autodetect": true,
      "sourceFormat": "NEWLINE_DELIMITED_JSON",
      "sourceUris": [
        "[BUCKET_URI]"
      ]
    }
    

  3. (Optional) Manually edit the table definition file to modify, add, or delete settings. There are no configuration settings specific to JSON source files, but many of the general settings apply to JSON files. For more information, see configuration.query.tableDefinitions in the API Reference.

API

If you are using the BigQuery API, define a table definition as part of your job configuration. For more information, see configuration.query.tableDefinitions in the API Reference. For information on schema auto-detect, see configuration.query.tableDefinitions.(key).autodetect.

Google Drive

CLI

  1. Use the command-line tool's mkdef command with the --autodetect flag to create a table definition. The mkdef command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file - /tmp/[FILE_NAME]. Replace [FILE_NAME] with the name of your table definition file, and replace [DRIVE_URI] with your Google Drive URI, for example, https://drive.google.com/open?id=123ABCD123AbcD123Abcd.

    bq mkdef --autodetect --source_format=NEWLINE_DELIMITED_JSON "[DRIVE_URI]" > /tmp/[FILE_NAME]
    
  2. Open the table definition file in a text editor. For example, the command nano /tmp/[FILE_NAME] opens the file in nano. The file should look like the following. Notice "autodetect" is set to true.

    {
      "autodetect": true,
      "sourceFormat": "NEWLINE_DELIMITED_JSON",
      "sourceUris": [
        "[DRIVE_URI]"
      ]
    }
    

  3. (Optional) Manually edit the table definition file to modify, add, or delete settings. There are no configuration settings specific to JSON source files, but many of the general settings apply to JSON files. For more information, see configuration.query.tableDefinitions in the API Reference.

API

If you are using the BigQuery API, define a table definition as part of your job configuration. For more information, see configuration.query.tableDefinitions in the API Reference. For information on schema auto-detect, see configuration.query.tableDefinitions.(key).autodetect in the API Reference.

CSV table definitions

You can create a table definition with auto-detect enabled for CSV files stored in Google Cloud Storage or Google Drive.

Google Cloud Storage

CLI

  1. Use the command-line tool's mkdef command with the --autodetect flag to create a table definition. The mkdef command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file - /tmp/[FILE_NAME]. Replace [FILE_NAME] with the name of your table definition file, and replace [BUCKET_URI] with your Cloud Storage URI, for example, gs://mybucket/myfile.

    bq mkdef --autodetect --source_format=CSV "[BUCKET_URI]" > /tmp/[FILE_NAME]
    
  2. (Optional) Open the table definition file in a text editor. For example, the command nano /tmp/[FILE_NAME] opens the file in nano. The file should look like the following. Notice "autodetect" is set to true.

    {
      "autodetect": true,
      "csvOptions": {
        "allowJaggedRows": false,
        "allowQuotedNewlines": false,
        "encoding": "UTF-8",
        "fieldDelimiter": ",",
        "quote": "\"",
        "skipLeadingRows": 0
      },
      "sourceFormat": "CSV",
      "sourceUris": [
        "[BUCKET_URI]"
      ]
    }
    

  3. (Optional) Manually edit the table definition file to modify, add, or delete settings. For example, if your file contains a header row, you would change "skipLeadingRows" to 1.

    There are several settings specific to CSV files such as "allowJaggedRows". For more information, see configuration.query.tableDefinitions.(key).csvOptions in the API Reference.

API

If you are using the BigQuery API, define a table definition as part of your job configuration. For more information, see configuration.query.tableDefinitions in the API Reference. For information on schema auto-detect options, see configuration.query.tableDefinitions.(key).autodetect. For information on CSV options, see configuration.query.tableDefinitions.(key).csvOptions.

Google Drive

CLI

  1. Use the command-line tool's mkdef command with the --autodetect flag to create a table definition. The mkdef command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file - /tmp/[FILE_NAME]. Replace [FILE_NAME] with the name of your table definition file, and replace [DRIVE_URI] with your Google Drive URI, for example, https://drive.google.com/open?id=123ABCD123AbcD123Abcd.

    bq mkdef --autodetect --source_format=CSV "[DRIVE_URI]" > /tmp/[FILE_NAME]
    
  2. (Optional) Open the table definition file in a text editor. For example, the command nano /tmp/[FILE_NAME] opens the file in nano. The file should look like the following. Notice "autodetect" is set to true.

    {
      "autodetect": true,
      "csvOptions": {
        "allowJaggedRows": false,
        "allowQuotedNewlines": false,
        "encoding": "UTF-8",
        "fieldDelimiter": ",",
        "quote": "\"",
        "skipLeadingRows": 0
      },
      "sourceFormat": "CSV",
      "sourceUris": [
        "[DRIVE_URI]"
      ]
    }
    

  3. (Optional) Manually edit the table definition file to modify, add, or delete settings. For example, if your file contains a header row, you would change "skipLeadingRows" to 1.

    There are several settings specific to CSV files such as "allowJaggedRows". For more information, see configuration.query.tableDefinitions.(key).csvOptions in the API Reference.

API

If you are using the BigQuery API, define a table definition as part of your job configuration. For more information, see configuration.query.tableDefinitions in the API Reference. For information on schema auto-detect options, see configuration.query.tableDefinitions.(key).autodetect in the API Reference. For information on CSV options, see configuration.query.tableDefinitions.(key).csvOptions in the API Reference.

Google Sheets table definitions

You can create a table definition with auto-detect enabled for Google Sheets files (first tab only) stored in Google Drive.

CLI

  1. Use the command-line tool's mkdef command with the --autodetect flag to create a table definition. The mkdef command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file - /tmp/[FILE_NAME]. Replace [FILE_NAME] with the name of your table definition file, and replace [DRIVE_URI] with your Google Drive URI, for example, https://drive.google.com/open?id=123ABCD123AbcD123Abcd.

    bq mkdef --autodetect --source_format=GOOGLE_SHEETS "[DRIVE_URI]" > /tmp/[FILE_NAME]
    
  2. Open the table definition file in a text editor. For example, the command nano /tmp/[FILE_NAME] opens the file in nano. The file should look like the following. Notice "autodetect" is set to true.

    {
      "autodetect": true,
      "sourceFormat": "GOOGLE_SHEETS",
      "sourceUris": [
        "[DRIVE_URI]"
      ]
    }
    

  3. (Optional) Manually edit the table definition file to modify, add, or delete settings. For information on settings specific to Google Sheets files, see configuration.query.tableDefinitions.(key).googleSheetsOptions in the API Reference. Also, many of the common settings apply to Google Sheets files. For more information, see configuration.query.tableDefinitions in the API Reference. For information on schema auto-detect options, see configuration.query.tableDefinitions.(key).autodetect in the API Reference.

API

If you are using the BigQuery API, define a table definition as part of your job configuration. For more information, see configuration.query.tableDefinitions in the API Reference. For information on schema auto-detect options, see configuration.query.tableDefinitions.(key).autodetect in the API Reference. For information on settings specific to Google Sheets files, see configuration.query.tableDefinitions.(key).googleSheetsOptions.

Creating a table definition using an inline schema

If you do not want to use schema auto-detect, you can create a table definition file by providing an inline schema definition. To provide an inline schema definition, list the fields and data types on the command line in the following format: [FIELD]:[DATA_TYPE], [FIELD]:[DATA_TYPE].

You can use an inline schema definition when creating table definition files for:

  • JSON files stored in Google Cloud Storage or Google Drive
  • CSV files stored in Google Cloud Storage or Google Drive
  • Google Sheets files stored in Google Drive

JSON table definitions

You can create a table definition using an inline schema for newline-delimited JSON files stored in Google Cloud Storage or Google Drive.

Google Cloud Storage

CLI

  1. Use the command-line tool's mkdef command with the --noautodetect flag to create a table definition. The mkdef command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file - /tmp/[FILE_NAME]. Replace [FILE_NAME] with the name of your table definition file, and replace [BUCKET_URI] with your Cloud Storage URI, for example, gs://mybucket/myfile. Replace [FIELD]:[DATA_TYPE], [FIELD]:[DATA_TYPE] with your schema information, for example, Name:STRING,Address:STRING, and so on.

    bq mkdef --noautodetect --source_format=NEWLINE_DELIMITED_JSON "[BUCKET_URI]" [FIELD]:[DATA_TYPE], [FIELD]:[DATA_TYPE] > /tmp/[FILE_NAME]
    
  2. (Optional) Open the table definition file in a text editor. For example, the command nano /tmp/[FILE_NAME] opens the file in nano. The file should look like the following. Notice "autodetect" is not enabled, and the schema information is written to the table definition file.

    {
      "schema": {
        "fields": [
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          },
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          }
          ...
        ]
      },
      "sourceFormat": "NEWLINE_DELIMITED_JSON",
      "sourceUris": [
        "[BUCKET_URI]"
      ]
    }
    

  3. (Optional) Manually edit the table definition file to modify, add, or delete settings. There are no configuration settings specific to JSON source files, but many of the general settings apply to JSON files. For more information, see configuration.query.tableDefinitions in the API Reference.

API

If you are using the BigQuery API, define a table definition as part of your job configuration. For more information, see configuration.query.tableDefinitions in the API Reference.

Google Drive

CLI

  1. Use the command-line tool's mkdef command with the --noautodetect flag to create a table definition. The mkdef command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file - /tmp/[FILE_NAME]. Replace [FILE_NAME] with the name of your table definition file, and replace [DRIVE_URI] with your Google Drive URI, for example, https://drive.google.com/open?id=123ABCD123AbcD123Abcd. Replace [FIELD]:[DATA_TYPE], [FIELD]:[DATA_TYPE] with your schema information, for example, Name:STRING,Address:STRING, and so on.

    bq mkdef --noautodetect --source_format=NEWLINE_DELIMITED_JSON "[DRIVE_URI]" [FIELD]:[DATA_TYPE], [FIELD]:[DATA_TYPE] > /tmp/[FILE_NAME]
    
  2. Open the table definition file in a text editor. For example, the command nano /tmp/[FILE_NAME] opens the file in nano. The file should look like the following. Notice "autodetect" is not enabled, and the schema information is written to the table definition file.

    {
      "schema": {
        "fields": [
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          },
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          }
          ...
        ]
      },
      "sourceFormat": "NEWLINE_DELIMITED_JSON",
      "sourceUris": [
        "[DRIVE_URI]"
      ]
    }
    

  3. (Optional) Manually edit the table definition file to modify, add, or delete settings. There are no configuration settings specific to JSON source files, but many of the general settings apply to JSON files. For more information, see configuration.query.tableDefinitions in the API Reference.

API

If you are using the BigQuery API, define a table definition as part of your job configuration. For more information, see configuration.query.tableDefinitions in the API Reference.

CSV table definitions

You can create a table definition using an inline schema for CSV files stored in Google Cloud Storage or Google Drive.

Google Cloud Storage

CLI

  1. Use the command-line tool's mkdef command with the --noautodetect flag to create a table definition. The mkdef command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file - /tmp/[FILE_NAME]. Replace [FILE_NAME] with the name of your table definition file, and replace [BUCKET_URI] with your Cloud Storage URI, for example, gs://mybucket/myfile. Replace [FIELD]:[DATA_TYPE], [FIELD]:[DATA_TYPE] with your schema information, for example, Name:STRING,Address:STRING, and so on.

    bq mkdef --noautodetect --source_format=CSV "[BUCKET_URI]" [FIELD]:[DATA_TYPE], [FIELD]:[DATA_TYPE] > /tmp/[FILE_NAME]
    
  2. (Optional) Open the table definition file in a text editor. For example, the command nano /tmp/[FILE_NAME] opens the file in nano. The file should look like the following. Notice "autodetect" is not enabled, and the schema information is written to the table definition file.

    {
      "csvOptions": {
        "allowJaggedRows": false,
        "allowQuotedNewlines": false,
        "encoding": "UTF-8",
        "fieldDelimiter": ",",
        "quote": "\"",
        "skipLeadingRows": 0
      },
      "schema": {
        "fields": [
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          },
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          }
          ...
        ]
      },
      "sourceFormat": "CSV",
      "sourceUris": [
        "[BUCKET_URI]"
      ]
    }
    

  3. (Optional) Manually edit the table definition file to modify, add, or delete settings. For example, if your file contains a header row, you would change "skipLeadingRows" to 1.

    There are several settings specific to CSV files such as "allowJaggedRows". For more information, see configuration.query.tableDefinitions.(key).csvOptions in the API Reference.

API

If you are using the BigQuery API, define a table definition as part of your job configuration. For more information, see configuration.query.tableDefinitions in the API Reference. For information on settings specific to CSV files, see configuration.query.tableDefinitions.(key).csvOptions in the API Reference.

Google Drive

CLI

  1. Use the command-line tool's mkdef command with the --noautodetect flag to create a table definition. The mkdef command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file - /tmp/[FILE_NAME]. Replace [FILE_NAME] with the name of your table definition file, and replace [DRIVE_URI] with your Google Drive URI, for example, https://drive.google.com/open?id=123ABCD123AbcD123Abcd. Replace [FIELD]:[DATA_TYPE], [FIELD]:[DATA_TYPE] with your schema information, for example, Name:STRING,Address:STRING, and so on.

    bq mkdef --noautodetect --source_format=CSV "[DRIVE_URI]" [FIELD]:[DATA_TYPE], [FIELD]:[DATA_TYPE] > /tmp/[FILE_NAME]
    
  2. (Optional) Open the table definition file in a text editor. For example, the command nano /tmp/[FILE_NAME] opens the file in nano. The file should look like the following. Notice "autodetect" is not enabled, and the schema information is written to the table definition file.

    {
      "csvOptions": {
        "allowJaggedRows": false,
        "allowQuotedNewlines": false,
        "encoding": "UTF-8",
        "fieldDelimiter": ",",
        "quote": "\"",
        "skipLeadingRows": 0
      },
      "schema": {
        "fields": [
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          },
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          }
          ...
        ]
      },
      "sourceFormat": "CSV",
      "sourceUris": [
        "[DRIVE_URI]"
      ]
    }
    

  3. (Optional) Manually edit the table definition file to modify, add, or delete settings. For example, if your file contains a header row, you would change "skipLeadingRows" to 1.

    There are several settings specific to CSV files such as "allowJaggedRows". For more information, see configuration.query.tableDefinitions.(key).csvOptions in the API Reference.

API

If you are using the BigQuery API, define a table definition as part of your job configuration. For more information, see configuration.query.tableDefinitions in the API Reference. For information on settings specific to CSV files, see configuration.query.tableDefinitions.(key).csvOptions in the API Reference.

Google Sheets table definitions

You can create a table definition using an inline schema for Google Sheets files (first tab only) stored in Google Drive.

CLI

  1. Use the command-line tool's mkdef command with the --noautodetect flag to create a table definition. The mkdef command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file - /tmp/[FILE_NAME]. Replace [FILE_NAME] with the name of your table definition file, and replace [DRIVE_URI] with your Google Drive URI, for example, https://drive.google.com/open?id=123ABCD123AbcD123Abcd. Replace [FIELD]:[DATA_TYPE], [FIELD]:[DATA_TYPE] with your schema information, for example, Name:STRING,Address:STRING, and so on.

    bq mkdef --noautodetect --source_format=GOOGLE_SHEETS "[DRIVE_URI]" [FIELD]:[DATA_TYPE], [FIELD]:[DATA_TYPE] > /tmp/[FILE_NAME]
    
  2. Open the table definition file in a text editor. For example, the command nano /tmp/[FILE_NAME] opens the file in nano. The file should look like the following. Notice "autodetect" is not enabled, and the schema information is written to the table definition file.

    {
      "googleSheetsOptions": {
        "skipLeadingRows": 0
      },
      "schema": {
        "fields": [
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          },
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          },
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          }
        ]
      },
      "sourceFormat": "GOOGLE_SHEETS",
      "sourceUris": [
        "[DRIVE_URI]"
      ]
    }
    

  3. (Optional) Manually edit the table definition file to modify, add, or delete settings. For information on settings specific to Google Sheets files, see configuration.query.tableDefinitions.sheetsOptions in the API Reference. Also, many of the common settings apply to Google Sheets files. For more information, see configuration.query.tableDefinitions in the API Reference.

API

If you are using the BigQuery API, define a table definition as part of your job configuration. For more information, see configuration.query.tableDefinitions in the API Reference. For information on settings specific to Google Sheets files, see configuration.query.tableDefinitions.sheetsOptions in the API Reference.

Creating a table definition using a JSON schema file

If you do not want to use auto-detect or provide an inline schema definition, you can create a JSON schema file and reference it when creating your table definition file. You must create the JSON schema file manually, and it must be on your local machine. Referencing a JSON schema file in Google Cloud Storage or Google Drive is not supported.

You can use a JSON schema file when creating table definitions for:

  • JSON files stored in Google Cloud Storage or Google Drive
  • CSV files stored in Google Cloud Storage or Google Drive
  • Google Sheets files stored in Google Drive

JSON table definitions

You can create a table definition using a JSON schema file for newline-delimited JSON files stored in Google Cloud Storage or Google Drive.

Google Cloud Storage

CLI

  1. Use the command-line tool's mkdef command with the --noautodetect flag to create a table definition. The mkdef command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file - /tmp/[FILE_NAME]. Replace [FILE_NAME] with the name of your table definition file. Replace [BUCKET_URI] with your Cloud Storage URI, for example, gs://mybucket/myfile. Replace [PATH_TO_SCHEMA] with the location of the JSON schema file on your local machine.

    bq mkdef --noautodetect --source_format=NEWLINE_DELIMITED_JSON "[BUCKET_URI]" [PATH_TO_SCHEMA] > /tmp/[FILE_NAME]
    
  2. (Optional) Open the table definition file in a text editor. For example, the command nano /tmp/[FILE_NAME] opens the file in nano. The file should look like the following. Notice "autodetect" is not enabled, and the schema information is written to the table definition file.

    {
      "schema": {
        "fields": [
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          },
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          }
          ...
        ]
      },
      "sourceFormat": "NEWLINE_DELIMITED_JSON",
      "sourceUris": [
        "[BUCKET_URI]"
      ]
    }
    

  3. (Optional) Manually edit the table definition file to modify, add, or delete settings. There are no configuration settings specific to JSON source files, but many of the general settings apply to JSON files. For more information, see configuration.query.tableDefinitions in the API Reference.

API

If you are using the BigQuery API, define a table definition as part of your job configuration. For more information, see configuration.query.tableDefinitions in the API Reference.

Google Drive

CLI

  1. Use the command-line tool's mkdef command with the --noautodetect flag to create a table definition. The mkdef command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file - /tmp/[FILE_NAME]. Replace [FILE_NAME] with the name of your table definition file, and replace [DRIVE_URI] with your Google Drive URI, for example, https://drive.google.com/open?id=123ABCD123AbcD123Abcd. Replace [PATH_TO_SCHEMA] with the location of the JSON schema file on your local machine.

    bq mkdef --noautodetect --source_format=NEWLINE_DELIMITED_JSON "[DRIVE_URI]" [PATH_TO_SCHEMA] > /tmp/[FILE_NAME]
    
  2. Open the table definition file in a text editor. For example, the command nano /tmp/[FILE_NAME] opens the file in nano. The file should look like the following. Notice "autodetect" is not enabled, and the schema information is written to the table definition file.

    {
      "schema": {
        "fields": [
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          },
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          }
          ...
        ]
      },
      "sourceFormat": "NEWLINE_DELIMITED_JSON",
      "sourceUris": [
        "[DRIVE_URI]"
      ]
    }
    

  3. (Optional) Manually edit the table definition file to modify, add, or delete settings. There are no configuration settings specific to JSON source files, but many of the general settings apply to JSON files. For more information, see configuration.query.tableDefinitions in the API Reference.

API

If you are using the BigQuery API, define a table definition as part of your job configuration. For more information, see configuration.query.tableDefinitions in the API Reference.

CSV table definitions

You can create a table definition using a JSON schema file for CSV files stored in Google Cloud Storage or Google Drive.

Google Cloud Storage

CLI

  1. Use the command-line tool's mkdef command with the --noautodetect flag to create a table definition. The mkdef command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file - /tmp/[FILE_NAME]. Replace [FILE_NAME] with the name of your table definition file, and replace [BUCKET_URI] with your Cloud Storage URI, for example, gs://mybucket/myfile. Replace [PATH_TO_SCHEMA] with the location of the JSON schema file on your local machine.

    bq mkdef --noautodetect --source_format=CSV "[BUCKET_URI]" [PATH_TO_SCHEMA] > /tmp/[FILE_NAME]
    
  2. (Optional) Open the table definition file in a text editor. For example, the command nano /tmp/[FILE_NAME] opens the file in nano. The file should look like the following. Notice "autodetect" is not enabled, and the schema information is written to the table definition file.

    {
      "csvOptions": {
        "allowJaggedRows": false,
        "allowQuotedNewlines": false,
        "encoding": "UTF-8",
        "fieldDelimiter": ",",
        "quote": "\"",
        "skipLeadingRows": 0
      },
      "schema": {
        "fields": [
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          },
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          }
          ...
        ]
      },
      "sourceFormat": "CSV",
      "sourceUris": [
        "[BUCKET_URI]"
      ]
    }
    

  3. (Optional) Manually edit the table definition file to modify, add, or delete settings. For example, if your file contains a header row, you would change "skipLeadingRows" to 1.

    There are several settings specific to CSV files such as "allowJaggedRows". For more information, see configuration.query.tableDefinitions.(key).csvOptions in the API Reference.

API

If you are using the BigQuery API, define a table definition as part of your job configuration. For more information, see configuration.query.tableDefinitions in the API Reference. For information on settings specific to CSV files, see configuration.query.tableDefinitions.(key).csvOptions in the API Reference.

Google Drive

CLI

  1. Use the command-line tool's mkdef command with the --noautodetect flag to create a table definition. The mkdef command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file - /tmp/[FILE_NAME]. Replace [FILE_NAME] with the name of your table definition file, and replace [DRIVE_URI] with your Google Drive URI, for example, https://drive.google.com/open?id=123ABCD123AbcD123Abcd. Replace [PATH_TO_SCHEMA] with the location of the JSON schema file on your local machine.

    bq mkdef --noautodetect --source_format=CSV "[DRIVE_URI]" [PATH_TO_SCHEMA] > /tmp/[FILE_NAME]
    
  2. (Optional) Open the table definition file in a text editor. For example, the command nano /tmp/[FILE_NAME] opens the file in nano. The file should look like the following. Notice "autodetect" is not enabled, and the schema information is written to the table definition file.

    {
      "csvOptions": {
        "allowJaggedRows": false,
        "allowQuotedNewlines": false,
        "encoding": "UTF-8",
        "fieldDelimiter": ",",
        "quote": "\"",
        "skipLeadingRows": 0
      },
      "schema": {
        "fields": [
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          },
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          }
          ...
        ]
      },
      "sourceFormat": "CSV",
      "sourceUris": [
        "[DRIVE_URI]"
      ]
    }
    

  3. (Optional) Manually edit the table definition file to modify, add, or delete settings. For example, if your file contains a header row, you would change "skipLeadingRows" to 1.

    There are several settings specific to CSV files such as "allowJaggedRows". For more information, see configuration.query.tableDefinitions.(key).csvOptions in the API Reference.

API

If you are using the BigQuery API, define a table definition as part of your job configuration. For more information, see configuration.query.tableDefinitions in the API Reference. For information on settings specific to CSV files, see configuration.query.tableDefinitions.(key).csvOptions in the API Reference.

Google Sheets table definitions

You can create a table definition using a JSON schema file for Google Sheets files (first tab only) stored in Google Drive.

CLI

  1. Use the command-line tool's mkdef command with the --noautodetect flag to create a table definition. The mkdef command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file - /tmp/[FILE_NAME]. Replace [FILE_NAME] with the name of your table definition file, and replace [DRIVE_URI] with your Google Drive URI, for example, https://drive.google.com/open?id=123ABCD123AbcD123Abcd. Replace [PATH_TO_SCHEMA] with the location of the JSON schema file on your local machine.

    bq mkdef --noautodetect --source_format=GOOGLE_SHEETS "[DRIVE_URI]" [PATH_TO_SCHEMA] > /tmp/[FILE_NAME]
    
  2. Open the table definition file in a text editor. For example, the command nano /tmp/[FILE_NAME] opens the file in nano. The file should look like the following. Notice "autodetect" is not enabled, and the schema information is written to the table definition file.

    {
      "googleSheetsOptions": {
        "skipLeadingRows": 0
      },
      "schema": {
        "fields": [
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          },
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          },
          {
            "name": "[FIELD]",
            "type": "[DATA_TYPE]"
          }
        ]
      },
      "sourceFormat": "GOOGLE_SHEETS",
      "sourceUris": [
        "[DRIVE_URI]"
      ]
    }
    

  3. (Optional) Manually edit the table definition file to modify, add, or delete settings. For information on settings specific to Google Sheets files, see configuration.query.tableDefinitions.sheetsOptions in the API Reference. Also, many of the common settings apply to Google Sheets files. For more information, see configuration.query.tableDefinitions in the API Reference.

API

If you are using the BigQuery API, define a table definition as part of your job configuration. For more information, see configuration.query.tableDefinitions in the API Reference.

Creating Avro table definitions

If you use an Avro file as an external data source, BigQuery automatially retrieves the schema using the source data. When creating a table definition for Avro files, you do not need to use schema auto-detect, and you do not need to provide an inline schema definiton or schema file.

You can create a table definition file for Avro data stored in Google Cloud Storage or Google Drive.

Google Cloud Storage

CLI

  1. Use the command-line tool's mkdef command to create a table definition. You do not need to use the --noautodetect flag with Avro files. Schema auto-detect is disabled for Avro files. The mkdef command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file - /tmp/[FILE_NAME]. Replace [FILE_NAME] with the name of your table definition file, and replace [BUCKET_URI] with your Cloud Storage URI.

    bq mkdef --source_format=AVRO "[BUCKET_URI]" > /tmp/[FILE_NAME]
    
  2. (Optional) Open the table definition file in a text editor. For example, the command nano /tmp/[FILE_NAME] opens the file in nano. The file should look like the following. Notice there is no need for the "autodetect" setting.

    {
      "sourceFormat": "AVRO",
      "sourceUris": [
        "[BUCKET_URI]"
      ]
    }
    

  3. (Optional) Manually edit the table definition file to modify, add, or delete settings. There are no configuration settings specific to Avro source files, but many of the general settings apply to Avro files. For more information, see configuration.query.tableDefinitions in the API Reference.

API

If you are using the BigQuery API, define a table definition as part of your job configuration. For more information, see configuration.query.tableDefinitions in the API Reference.

Google Drive

CLI

  1. Use the command-line tool's mkdef command to create a table definition. You do not need to use the --noautodetect flag with Avro files. Schema auto-detect is disabled for Avro files. The mkdef command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file - /tmp/[FILE_NAME]. Replace [FILE_NAME] with the name of your table definition file, and replace [DRIVE_URI] with your Google Drive URI.

    bq mkdef --source_format=AVRO "[DRIVE_URI]" > /tmp/[FILE_NAME]
    
  2. Open the table definition file in a text editor. For example, the command nano /tmp/[FILE_NAME] opens the file in nano. The file should look like the following. Notice "autodetect" is set to true.

    {
      "autodetect": true,
      "sourceFormat": "AVRO",
      "sourceUris": [
        "[DRIVE_URI]"
      ]
    }
    

  3. (Optional) Manually edit the table definition file to modify, add, or delete settings. There are no configuration settings specific to Avro source files, but many of the general settings apply to Avro files. For more information, see configuration.query.tableDefinitions in the API Reference.

API

If you are using the BigQuery API, define a table definition as part of your job configuration. For more information, see configuration.query.tableDefinitions in the API Reference.

Creating Cloud Datastore backup table definitions

If you use a Google Cloud Datastore backup as an external data source, BigQuery automatially retrieves the schema using the source data. When creating a table definition for Cloud Datastore backup files, you do not need to use schema auto-detect, and you do not need to provide an inline schema definiton or schema file.

You can create a table definition file for Google Cloud Datastore backup data stored in Google Cloud Storage.

CLI

  1. Use the command-line tool's mkdef command to create a table definition. You do not need to use the --noautodetect flag with Cloud Datastore backup files. Schema auto-detect is disabled for Cloud Datastore backup files. The mkdef command generates a table definition file in JSON format. The following example creates a table definition and writes the output to a file - /tmp/[FILE_NAME]. Replace [FILE_NAME] with the name of your table definition file, and replace [BUCKET_URI] with your Cloud Storage URI.

    bq mkdef --source_format=DATASTORE_BACKUP "[BUCKET_URI]" > /tmp/[FILE_NAME]
    
  2. (Optional) Open the table definition file in a text editor. For example, the command nano /tmp/[FILE_NAME] opens the file in nano. The file should look like the following. Notice there is no need for the "autodetect" setting.

    {
      "sourceFormat": "DATASTORE_BACKUP",
      "sourceUris": [
        "gs://[BUCKET_URI]"
      ]
    }
    

  3. (Optional) Manually edit the table definition file to modify, add, or delete settings. There are no configuration settings specific to Cloud Datastore backup files, but many of the general settings apply to Cloud Datastore backup files. For more information, see configuration.query.tableDefinitions in the API Reference.

API

If you are using the BigQuery API, define a table definition as part of your job configuration. For more information, see configuration.query.tableDefinitions in the API Reference.

Creating Cloud Bigtable table definitions

When you create a table definition file for Cloud Bigtable, you manually generate the file in JSON format. Using the mkdef command to create a table definition is not currently supported for Cloud Bigtable data sources. Schema auto-detect is also unsupported for Cloud Bigtable. For a list of Cloud Bigtable table definition options, see configuration.query.tableDefinitions.(key).bigtableOptions in the API Reference.

A JSON table definition file for Cloud Bigtable looks like the following. Using this table definition file, BigQuery reads data from a single column family, interpreting the values as binary encoded integers.

{
    "sourceFormat": "BIGTABLE",
    "sourceUris": [
        "https://googleapis.com/bigtable/projects/[PROJECT_ID]/instances/[INSTANCE_ID]/tables/[TABLE_NAME]"
    ],
    "bigtableOptions": {
        "columnFamilies" : [
            {
                "familyId": "family_int",
                "type": "INTEGER",
                "encoding": "BINARY"
            }
        ],
    }
}

Wildcard support for table definition files

If your Google Cloud Storage data is separated into multiple files that share a common base-name, you can use a wildcard in the URI in the table definition file. You append an asterisk (*) to the base-name and enclose the bucket and filename in quotes. For example, if you have two files named fed-sample000001.csv and fed-sample000002.csv, the bucket URI would be "gs://mybucket/fed-sample*".

You can only use one wildcard for objects (filenames) within your bucket. The wildcard can appear inside the object name or at the end of the object name. Appending a wildcard to the bucket name is unsupported.

For Google Cloud Bigtable data, only one URI can be specified and it has be a fully specified and valid HTTPS URL for a Google Cloud Bigtable table. For Google Cloud Datastore backups, only one URI can be specified, and it must end with .backup_info.

The * wildcard character is not allowed when creating table definition files for:

  • Cloud Bigtable data sources
  • Cloud Datastore backups stored in Google Cloud Storage
  • Data stored in Google Drive

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...