Hide
BigQuery

Managing Tables

This document describes how to manage tables.

Contents

Overview

A BigQuery table is a standard, two-dimensional table with individual records organized in rows, and a data type assigned to each column (also called a field). Individual fields within a record may contain nested and repeated children fields.

Every table is described by a schema that describes field names, types, and other information. If you need to change the schema later on, you can update the schema. You can specify the schema of a table during the initial table creation request, or you can create a table without a schema and declare the schema in the query or load job that first populates the table.

For more information about allowed types and data formats, see preparing data for BigQuery.

List tables in a dataset by calling bigquery.tables.list. You must have access to the specified dataset to list its tables.

Example

Java

This sample uses the Google APIs Client Library for Java.

// List all tables
private static void listTables(Bigquery service, String project, String dataset) throws IOException {

  Tables.List listTablesReply = service.tables().list(project, dataset);
  TableList tableList = listTablesReply.execute();

  if (tableList.getTables() != null) {

    List tables = tableList.getTables();

    System.out.println("Tables list:");

    for (TableList.Tables table : tables) {
      System.out.format("%s\n", table.getId());
    }

  }
}

// Retrieve the specified table resource
private static void getTable(Bigquery service, String project, String dataset, String tableId) throws IOException {

  Tables tableRequest = service.tables();
  Table table = tableRequest.get(project,dataset,tableId).execute();

  System.out.format("Printing table resource: %s:%s.%s\n", project,dataset,tableId);
  System.out.println(table.toPrettyString());

}
  

Python

This sample uses the Google APIs Client Library for Python.

# List all tables
def listTables(service, project, dataset):
  try:
      tables = service.tables()
      listReply = tables.list(projectId=project, datasetId=dataset).execute()
      print 'Tables list:'
      pprint.pprint(listReply)

  except HttpError as err:
      print 'Error in listTables:', pprint.pprint(err.content)

# Retrieve the specified table resource
def getTable(service, projectId, datasetId, tableId):
  tableCollection = service.tables()
  try:
    tableReply = tableCollection.get(projectId=projectId,
                                   datasetId=datasetId,
                                   tableId=tableId).execute()
    print 'Printing table resource %s:%s.%s' % (projectId, datasetId, tableId)
    pprint.pprint(tableReply)

  except HttpError as err:
    print 'Error in querytableData: ', pprint.pprint(err)
  

Back to top

Creating a Table

BigQuery offers two ways to create a new table:

Both methods support creating a new table based on the data, populating an existing, empty table with an appropriate schema, or appending data to a table that already has data.

Back to top

Exporting a Table

You can export a table in CSV, JSON or Avro format, into a Google Cloud Storage bucket. For more information, see exporting data from BigQuery.

Back to top

Appending Data

You can load additional data into a table either from source files or by appending query results. Note that the schema of the loaded data must match the schema of the existing table, but you can update the schema before appending.

To append data from a source file:

  1. Create additional source files
  2. Upload the files, using either code or the web UI, and
  3. Load them into to your existing table by referencing the existing table. In code, set writeDisposition=WRITE_APPEND.

To append data from a query result:

  1. Run an asynchronous query, pass in the name of your existing table, and set writeDisposition=WRITE_APPEND.

Back to top

Updating Table Schema

Once your table is created, you can update the schema by calling the tables.update or tables.patch functions. For example, using the BigQuery command-line tool: bq update -t <schema>.

Allowed operations include:

  • Adding NULLABLE or REPEATED columns at the end
  • Making REQUIRED fields NULLABLE

Back to top

Browsing Through Table Data

Browse through a table's data by calling bigquery.tabledata.list, specifying the name of the table and an optional row offset. This method lets you specify the maximum number of rows to return per page. If you request a row index beyond the last row, the method will return successfully, but without a rows property.

Values are returned wrapped in a JSON object that you must parse, as described in the reference documentation.

Tip: You can find out how many rows are in your table either by querying for one page of results and examining the totalRows property, or by running the query SELECT COUNT(*) from myProject:mydataset.myTable

Browsing Through Table Data Examples

Java

This sample uses the Google APIs Client Library for Java.

import com.google.api.services.bigquery.Bigquery;
import com.google.api.services.bigquery.model.TableList;

import java.io.IOException;
import java.util.List;

// ...

public class MyClass {

  // ...

  /**
   * Lists all tables in the specified dataset.
   *
   * @param service An authorized BigQuery client.
   * @param projectNumber The current Project number.
   * @param datasetId The BigQuery dataset id.
   * @throws IOException
   */
  private static void listTables(Bigquery service, String projectNumber, String datasetId) throws IOException {

    Bigquery.Tables.List listTablesReply = service.tables().list(projectNumber, datasetId);
    TableList tableList = listTablesReply.execute();

    if (tableList.getTables() != null) {

      List tables = tableList.getTables();

      System.out.println("Tables list:");

      for (TableList.Tables table : tables) {
        System.out.format("%s\n", table.getId());
      }

    }

  }

}
  

Python

This sample uses the Google APIs Client Library for Python.

# Query for table data, with an optional start index.
def queryTableData(service, project, dataset, table, startIndex=0):
  tableDataJob = service.tabledata()
  try:
    queryReply = tableDataJob.list(projectId=project,
                                   datasetId=dataset,
                                   tableId=table,
                                   startIndex=startIndex).execute()
    print 'Printing table %s:%s.%s' % (project, dataset, table)

    # When we've printed the last page of results, the next
    # page does not have a rows[] array.
    while 'rows' in queryReply:
      printTableData(queryReply, startIndex)
      startIndex += len(queryReply['rows'])
      queryReply = tableDataJob.list(projectId=project,
                                     datasetId=dataset,
                                     tableId=table,
                                     startIndex=startIndex).execute()
  except HttpError as err:
    print 'Error in querytableData: ', pprint.pprint(err.content)

def printTableData(data, startIndex):
  for row in data['rows']:
    rowVal = []
    for cell in row['f']:
        rowVal.append(cell['v'])
    print 'Row %d: %s' % (startIndex, rowVal)
    startIndex +=1
  

Back to top

Modifying or Deleting Rows or Data

BigQuery tables are append-only. The query language does not currently support either updating or deleting data. In order to update or delete data, you must delete the table, then recreate the table with new data. Alternatively, you could write a query that modifies the data and specify a new results table.

Back to top

Deleting a Table

When you delete a table, the table and all its data are removed immediately.

Example

Java

This sample uses the Google APIs Client Library for Java.

private static void deleteTable(Bigquery service, String projectId, String datasetId, String tableId)
  throws IOException {
    service.tables().delete(projectId,
                            datasetId,
                            tableId).execute();
}
  

Python

This sample uses the Google APIs Client Library for Python.

def deleteTable(service, projectId, datasetId, tableId):
  service.tables().delete(projectId=projectId,
                          datasetId=datasetId,
                          tableId=tableId).execute()
  

Back to top

Undeleting a Table

It's possible to undelete a table within two days of table deletion by using table decorators and the copy command. Undeletions are performed on a best-effort basis and are not guaranteed.

To undelete a table, determine a UNIX timestamp of when the table existed (in milliseconds). Then, copy the table at that time to a new table. The new table must have a different name than the deleted table.

For example, the following statement copies mydataset.mytable at the time 1418864998000 into a new table mydataset.newtable:

bq cp mydataset.mytable@1418864998000 mydataset.newtable

Back to top

Copying an Existing Table

There are three different ways you can copy a table: using the web UI, the command-line tool, or programmatically through the API:

To copy a table using the API:

You can copy an existing table through the API by calling the bigquery.jobs.insert method, and configuring a copy job. You must specify the following in your job configuration:

"copy": {
      "sourceTable": {       // Required
        "projectId": string, // Required
        "datasetId": string, // Required
        "tableId": string    // Required
      },
      "destinationTable": {  // Required
        "projectId": string, // Required
        "datasetId": string, // Required
        "tableId": string    // Required
      },
      "createDisposition": string,  // Optional
      "writeDisposition": string,   // Optional
    },

where sourceTable provides information about the table to be copied, destinationTable provides information about the new table, createDisposition specifies whether to create the table if it doesn't exist, and writeDisposition specifies whether to overwrite or append to an existing table.

Python Sample

The following Python sample describes how to copy a table using the API. If you run this script on the command line, it prompts you for information about the source table and the destination table and performs the copy job:

def copyTable(service):
   try:
    sourceProjectId = raw_input("What is your source project? ")
    sourceDatasetId = raw_input("What is your source dataset? ")
    sourceTableId = raw_input("What is your source table? ")

    targetProjectId = raw_input("What is your target project? ")
    targetDatasetId = raw_input("What is your target dataset? ")
    targetTableId = raw_input("What is your target table? ")

    jobCollection = service.jobs()
    jobData = {
      "projectId": sourceProjectId,
      "configuration": {
          "copy": {
              "sourceTable": {
                  "projectId": sourceProjectId,
                  "datasetId": sourceDatasetId,
                  "tableId": sourceTableId,
              },
              "destinationTable": {
                  "projectId": targetProjectId,
                  "datasetId": targetDatasetId,
                  "tableId": targetTableId,
              },
          "createDisposition": "CREATE_IF_NEEDED",
          "writeDisposition": "WRITE_TRUNCATE"
          }
        }
      }

    insertResponse = jobCollection.insert(projectId=targetProjectId, body=jobData).execute()

    # Ping for status until it is done, with a short pause between calls.
    import time
    while True:
      status = jobCollection.get(projectId=targetProjectId,
                                 jobId=insertResponse['jobReference']['jobId']).execute()
      if 'DONE' == status['status']['state']:
          break
      print 'Waiting for the import to complete...'
      time.sleep(10)

    if 'errors' in status['status']:
      print 'Error loading table: ', pprint.pprint(status)
      return

    print 'Loaded the table:' , pprint.pprint(status)#!!!!!!!!!!

    # Now query and print out the generated results table.
    queryTableData(service, targetProjectId, targetDatasetId, targetTableId)

   except HttpError as err:
    print 'Error in loadTable: ', pprint.pprint(err.resp)

Back to top