Managing Tables

This document describes how to manage tables.

Overview

A BigQuery table is a standard, two-dimensional table with individual records organized in rows, and a data type assigned to each column (also called a field). Individual fields within a record may contain nested and repeated children fields.

Every table is described by a schema that describes field names, types, and other information. If you need to change the schema later on, you can update the schema. You can specify the schema of a table during the initial table creation request, or you can create a table without a schema and declare the schema in the query or load job that first populates the table.

For more information about allowed types and data formats, see preparing data for BigQuery.

List tables in a dataset by calling bigquery.tables.list. You must have access to the specified dataset to list its tables.

Example

Java

This sample uses the Google APIs Client Library for Java.

// List all tables
private static void listTables(Bigquery service, String project, String dataset) throws IOException {

  Tables.List listTablesReply = service.tables().list(project, dataset);
  TableList tableList = listTablesReply.execute();

  if (tableList.getTables() != null) {

    List<TableList.Tables> tables = tableList.getTables();

    System.out.println("Tables list:");

    for (TableList.Tables table : tables) {
      System.out.format("%s\n", table.getId());
    }

  }
}

// Retrieve the specified table resource
private static void getTable(Bigquery service, String project, String dataset, String tableId) throws IOException {

  Tables tableRequest = service.tables();
  Table table = tableRequest.get(project,dataset,tableId).execute();

  System.out.format("Printing table resource: %s:%s.%s\n", project,dataset,tableId);
  System.out.println(table.toPrettyString());

}

Python

This sample uses the Google Cloud Client Library for Python.

def list_tables(dataset_name, project=None):
    """Lists all of the tables in a given dataset.

    If no project is specified, then the currently active project is used.
    """
    bigquery_client = bigquery.Client(project=project)
    dataset = bigquery_client.dataset(dataset_name)

    if not dataset.exists():
        print('Dataset {} does not exist.'.format(dataset_name))
        return

    tables = []
    page_token = None

    while True:
        results, page_token = dataset.list_tables(page_token=page_token)
        tables.extend(results)

        if not page_token:
            break

    for table in tables:
        print(table.name)

PHP

This sample uses the Google Cloud Library for PHP.

use Google\Cloud\ServiceBuilder;

/**
 * @param string $projectId The Google project ID.
 * @param string $datasetId The BigQuery dataset ID.
 */
function list_tables($projectId, $datasetId)
{
    $builder = new ServiceBuilder([
        'projectId' => $projectId,
    ]);
    $bigQuery = $builder->bigQuery();
    $dataset = $bigQuery->dataset($datasetId);
    $tables = $dataset->tables();
    foreach ($tables as $table) {
        print($table->id() . PHP_EOL);
    }
}

/**
 * @param string $projectId The Google project ID.
 * @param string $datasetId The BigQuery dataset ID.
 * @param string $tableId   The BigQuery table ID.
 */
function get_table($projectId, $datasetId, $tableId)
{
    $builder = new ServiceBuilder([
        'projectId' => $projectId,
    ]);
    $bigQuery = $builder->bigQuery();
    $dataset = $bigQuery->dataset($datasetId);
    return $dataset->table($tableId);
}

Node.js

This sample uses the Google Cloud Client Library for Node.js.

function listTables (datasetId, callback) {
  var bigquery = BigQuery();
  var dataset = bigquery.dataset(datasetId);

  // List the tables in the specified dataset
  // See https://googlecloudplatform.github.io/google-cloud-node/#/docs/bigquery/latest/bigquery/dataset?method=getTables
  dataset.getTables(function (err, tables) {
    if (err) {
      return callback(err);
    }

    console.log('Found %d table(s)!', tables.length);
    return callback(null, tables);
  });
}

Back to top

Creating a table

BigQuery offers various ways to create a new table:

All tables defined over data in managed BigQuery storage support appending data to the table.

Back to top

Exporting a table

You can export a table in CSV, JSON or Avro format, into a Cloud Storage bucket. For more information, see exporting data from BigQuery.

Back to top

Appending data

You can load additional data into a table either from source files or by appending query results. Note that the schema of the loaded data must match the schema of the existing table, but you can update the schema before appending.

To append data from a source file:

  1. Create additional source files
  2. Upload the files, using either code or the web UI, and
  3. Load them into to your existing table by referencing the existing table. In code, set writeDisposition=WRITE_APPEND.

To append data from a query result:

  1. Run an asynchronous query, pass in the name of your existing table, and set writeDisposition=WRITE_APPEND.

Back to top

Updating table schema

Once your table is created, you can update the schema by calling the tables.update or tables.patch functions. For example, using the BigQuery command-line tool: bq update -t <schema>.

Allowed operations include:

  • Adding NULLABLE or REPEATED columns at the end
  • Making REQUIRED fields NULLABLE

Back to top

Browsing through table data

Browse through a table's data by calling bigquery.tabledata.list, specifying the name of the table and an optional row offset. This method lets you specify the maximum number of rows to return per page. If you request a row index beyond the last row, the method will return successfully, but without a rows property.

Values are returned wrapped in a JSON object that you must parse, as described in the reference documentation.

Tip: You can find out how many rows are in your table either by querying for one page of results and examining the totalRows property, or by running the query SELECT COUNT(*) from myProject:mydataset.myTable

Browsing Through Table Data Examples

Python

This sample uses the Google Cloud Library for Python.

def list_rows(dataset_name, table_name, project=None):
    """Prints rows in the given table.

    Will print 25 rows at most for brevity as tables can contain large amounts
    of rows.

    If no project is specified, then the currently active project is used.
    """
    bigquery_client = bigquery.Client(project=project)
    dataset = bigquery_client.dataset(dataset_name)
    table = dataset.table(table_name)

    if not table.exists():
        print('Table {}:{} does not exist.'.format(dataset_name, table_name))
        return

    # Reload the table so that the schema is available.
    table.reload()

    rows = []
    page_token = None

    # Load at most 25 results. You can change this to `while True` and change
    # the max_results argument to load more rows from BigQuery, but note
    # that this can take some time. It's preferred to use a query.
    while len(rows) < 25:
        results, total_rows, page_token = table.fetch_data(
            max_results=25, page_token=page_token)
        rows.extend(results)

        if not page_token:
            break

    # Use format to create a simple table.
    format_string = '{:<16} ' * len(table.schema)

    # Print schema field names
    field_names = [field.name for field in table.schema]
    print(format_string.format(*field_names))

    for row in rows:
        print(format_string.format(*row))

PHP

This sample uses the Google Cloud Library for PHP.

use Google\Cloud\ServiceBuilder;

/**
 * Browse a bigquery table.
 * Example:
 * ```
 * browse_table($projectId, $datasetId, $tableId);
 * ```
 *
 * @param string $projectId  The Google project ID.
 * @param string $datasetId  The BigQuery dataset ID.
 * @param string $tableId    The BigQuery table ID.
 * @param string $maxResults The number of results to return at a time.
 * @param string $startIndex The row index to start on.
 *
 * @return int number of rows returned
 */
function browse_table($projectId, $datasetId, $tableId, $maxResults = 10, $startIndex = 0)
{
    $options = [
        'maxResults' => $maxResults,
        'startIndex' => $startIndex
    ];
    $builder = new ServiceBuilder([
        'projectId' => $projectId,
    ]);
    $bigQuery = $builder->bigQuery();
    $dataset = $bigQuery->dataset($datasetId);
    $table = $dataset->table($tableId);
    $numRows = 0;
    foreach ($table->rows($options) as $row) {
        print('---');
        foreach ($row as $column => $value) {
            printf('%s: %s' . PHP_EOL, $column, $value);
        }
        $numRows++;
    }

    return $numRows;
}

Node.js

This sample uses the Google Cloud Client Library for Node.js.

function browseRows (datasetId, tableId, callback) {
  var bigquery = BigQuery();
  var table = bigquery.dataset(datasetId).table(tableId);

  // Retrieve rows from the specified table
  // See https://googlecloudplatform.github.io/google-cloud-node/#/docs/bigquery/latest/bigquery/table?method=getRows
  table.getRows(function (err, rows) {
    if (err) {
      return callback(err);
    }

    console.log('Found %d row(s)!', rows.length);
    return callback(null, rows);
  });
}

Back to top

Modifying or deleting rows or data

BigQuery tables are append-only. The query language does not currently support either updating or deleting data. In order to update or delete data, you must delete the table, then recreate the table with new data. Alternatively, you could write a query that modifies the data and specify a new results table.

Back to top

Deleting a table

When you delete a table, the table and all its data are removed immediately.

Example

Java

This sample uses the Google APIs Client Library for Java.

private static void deleteTable(Bigquery service, String projectId, String datasetId, String tableId)
  throws IOException {
    service.tables().delete(projectId,
                            datasetId,
                            tableId).execute();
}

Python

This sample uses the Google Cloud Library for Python.

def delete_table(dataset_name, table_name, project=None):
    """Deletes a table in a given dataset.

    If no project is specified, then the currently active project is used.
    """
    bigquery_client = bigquery.Client(project=project)
    dataset = bigquery_client.dataset(dataset_name)
    table = dataset.table(table_name)

    table.delete()

    print('Table {}:{} deleted.'.format(dataset_name, table_name))

PHP

This sample uses the Google Cloud Library for PHP.

use Google\Cloud\ServiceBuilder;

/**
 * @param string $projectId The Google project ID.
 * @param string $datasetId The BigQuery dataset ID.
 * @param string $tableId   The BigQuery table ID.
 */
function delete_table($projectId, $datasetId, $tableId)
{
    $builder = new ServiceBuilder([
        'projectId' => $projectId,
    ]);
    $bigQuery = $builder->bigQuery();
    $dataset = $bigQuery->dataset($datasetId);
    $table = $dataset->table($tableId);
    $table->delete();
}

Node.js

This sample uses the Google Cloud Client Library for Node.js.

function deleteTable (datasetId, tableId, callback) {
  var bigquery = BigQuery();
  var table = bigquery.dataset(datasetId).table(tableId);

  // Delete the specified table
  // See https://googlecloudplatform.github.io/google-cloud-node/#/docs/bigquery/latest/bigquery/table?method=delete
  table.delete(function (err) {
    if (err) {
      return callback(err);
    }

    console.log('Deleted table %s from %s', tableId, datasetId);
    return callback(null);
  });
}

Back to top

Undeleting a Table

It's possible to undelete a table within two days of table deletion by using table decorators and the copy command. Undeletions are performed on a best-effort basis and are not guaranteed. Tables from deleted datasets and deleted tables replaced with a new table of the same name cannot be undeleted.

To undelete a table, determine a UNIX timestamp of when the table existed (in milliseconds). Then, copy the table at that time to a new table. The new table must have a different name than the deleted table.

For example, the following statement copies mydataset.mytable at the time 1418864998000 into a new table mydataset.newtable:

bq cp mydataset.mytable@1418864998000 mydataset.newtable

Back to top

Copying an existing table

There are three different ways you can copy a table: using the web UI, the command-line tool, or programmatically through the API:

To copy a table using the API:

You can copy an existing table through the API by calling the bigquery.jobs.insert method, and configuring a copy job. You must specify the following in your job configuration:

"copy": {
      "sourceTable": {       // Required
        "projectId": string, // Required
        "datasetId": string, // Required
        "tableId": string    // Required
      },
      "destinationTable": {  // Required
        "projectId": string, // Required
        "datasetId": string, // Required
        "tableId": string    // Required
      },
      "createDisposition": string,  // Optional
      "writeDisposition": string,   // Optional
    },

where sourceTable provides information about the table to be copied, destinationTable provides information about the new table, createDisposition specifies whether to create the table if it doesn't exist, and writeDisposition specifies whether to overwrite or append to an existing table.

Example

Python

The following Python sample describes how to copy a table using the API. If you run this script on the command line, it prompts you for information about the source table and the destination table and performs the copy job:

def copyTable(service):
   try:
    sourceProjectId = raw_input("What is your source project? ")
    sourceDatasetId = raw_input("What is your source dataset? ")
    sourceTableId = raw_input("What is your source table? ")

    targetProjectId = raw_input("What is your target project? ")
    targetDatasetId = raw_input("What is your target dataset? ")
    targetTableId = raw_input("What is your target table? ")

    jobCollection = service.jobs()
    jobData = {
      "projectId": sourceProjectId,
      "configuration": {
          "copy": {
              "sourceTable": {
                  "projectId": sourceProjectId,
                  "datasetId": sourceDatasetId,
                  "tableId": sourceTableId,
              },
              "destinationTable": {
                  "projectId": targetProjectId,
                  "datasetId": targetDatasetId,
                  "tableId": targetTableId,
              },
          "createDisposition": "CREATE_IF_NEEDED",
          "writeDisposition": "WRITE_TRUNCATE"
          }
        }
      }

    insertResponse = jobCollection.insert(projectId=targetProjectId, body=jobData).execute()

    # Ping for status until it is done, with a short pause between calls.
    import time
    while True:
      status = jobCollection.get(projectId=targetProjectId,
                                 jobId=insertResponse['jobReference']['jobId']).execute()
      if 'DONE' == status['status']['state']:
          break
      print 'Waiting for the import to complete...'
      time.sleep(10)

    if 'errors' in status['status']:
      print 'Error loading table: ', pprint.pprint(status)
      return

    print 'Loaded the table:' , pprint.pprint(status)#!!!!!!!!!!

    # Now query and print out the generated results table.
    queryTableData(service, targetProjectId, targetDatasetId, targetTableId)

   except HttpError as err:
    print 'Error in loadTable: ', pprint.pprint(err.resp)

Node.js

This sample uses the Google Cloud Client Library for Node.js.

function copyTable (srcDatasetId, srcTableId, destDatasetId, destTableId, callback) {
  var bigquery = BigQuery();

  var srcTable = bigquery.dataset(srcDatasetId).table(srcTableId);
  var destTable = bigquery.dataset(destDatasetId).table(destTableId);

  // See https://googlecloudplatform.github.io/google-cloud-node/#/docs/bigquery/latest/bigquery/table?method=copy
  srcTable.copy(destTable, function (err, job, apiResponse) {
    if (err) {
      return callback(err);
    }

    console.log('Started job: %s', job.id);
    job
      .on('error', callback)
      .on('complete', function (metadata) {
        console.log('Completed job: %s', job.id);
        return callback(null, metadata, apiResponse);
      });
  });
}

Back to top

Send feedback about...

BigQuery Documentation