Managing Table Data

This document describes how to manage table data in BigQuery. You can work with BigQuery table data in the following ways:

  • Load data into a table
  • Browse (or preview) table data
  • Query table data
  • Append to or overwrite table data
  • Modify table data using the Data Manipulation Language
  • Copy table data
  • Export table data

For information on managing table schemas, see Modifying Table Schemas.

Loading data into a table

You can load data when you create a table, or you can create an empty table and load the data later. When you load data, you can use schema auto-detect for supported data formats, or you can specify the schema.

For more information on loading data, see the documentation for your source data's format and location:

Browsing table data

You can browse table data using the BigQuery web UI, the command-line tool's bq head command, or by calling the tabledata.list API method.

Required permissions

At the dataset level, browsing table data requires READER access to the dataset that contains the table being browsed.

Instead of using dataset-level permissions, you can leverage a project-level IAM role that includes bigquery.tables.getData permissions. The following predefined, project-level IAM roles include bigquery.tables.getData permissions for every table in the project:

In addition, because the bigquery.user role has bigquery.datasets.create permissions, a user assigned to the bigquery.user role can read the data in any table in any dataset that user creates. When a user assigned to the bigquery.user role creates a dataset, that user is given OWNER access to the dataset. OWNER access to a dataset gives the user full control over it and all the tables in it.

For more information on IAM roles and permissions in BigQuery, see access control. For more information on dataset-level roles, see Primitive roles for datasets.

Browsing table data

To browse table data:

Web UI

  1. In the BigQuery web UI, in the navigation pane, click the blue arrow to the left of your dataset to expand it, or double-click the dataset name. This displays the tables and views in the dataset.

  2. Click a table in the list.

  3. Cick Details and note the value in Number of Rows. You may need this value to control the starting point for your results using the CLI or API.

  4. Click Preview. A sample set of data is displayed.

    Table preview

Command-line

Issue the bq head command with the --max_rows flag to list all fields in a particular number of table rows. If --max_rows is not specified, the default is 100.

To browse a subset of fields in the table (including nested and repeated fields), use the --selected_fields flag and enter the fields as a comma-separated list.

To specify the number of rows to skip before displaying table data, use the --start_row=[INTEGER] flag (or the -s shortcut). The default value is 0. You can retrieve the number of rows in a table by using the bq show command to retrieve table information.

If the table you're browsing is in a project other than your default project, add the project ID to the command in the following format: [PROJECT_ID]:[DATASET].[TABLE].

bq head --max_rows [INTEGER1] --start_row [INTEGER2] --selected_fields "[FIELDS]" [PROJECT_ID]:[DATASET].[TABLE]

Where:

  • [INTEGER1] is the number of rows to display.
  • [INTEGER2] is the number of rows to skip before displaying data.
  • [FIELDS] is a comma-separated list of fields.
  • [PROJECT_ID] is your project ID.
  • [DATASET] is the name of the dataset containing the table.
  • [TABLE] is the name of the table to browse.

Examples:

Enter the following command to list all fields in the first 10 rows in mydataset.mytable. mydataset is in your default project.

bq head --max_rows=10 mydataset.mytable

Enter the following command to list all fields in the first 100 rows in mydataset.mytable. mydataset is in myotherproject, not your default project.

bq head myotherproject:mydataset.mytable

Enter the following command to display only field1 and field2 in mydataset.mytable. The command uses the --start_row flag to skip to row 100. mydataset.mytable is in your default project.

bq head --start_row 100 --selected_fields "field1,field2" mydataset.mytable

Because the bq head command does not create a query job, bq head commands do not appear in your query history, and you are not charged for them.

API

Browse through a table's data by calling tabledata.list. Specify the name of the table in the tableId parameter.

Configure these optional parameters to control the output:

  • maxResults — Maximum number of results to return
  • selectedFields — Comma-separated list of fields to return; If unspecified, all fields are returned
  • startIndex — Zero-based index of the starting row to read

Values are returned wrapped in a JSON object that you must parse, as described in the tabledata.list reference documentation.

C#

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

public int TableDataList(
    string datasetId, string tableId, int pageSize, BigQueryClient client)
{
    int recordCount = 0;
    var result = client.ListRows(datasetId, tableId, null,
        new ListRowsOptions { PageSize = pageSize });
    // If there are more rows than were returned in the first page of results, 
    // iterating over the rows will lazily evaluate the results each time, 
    // making further requests as necessary.
    foreach (var row in result)
    {
        Console.WriteLine($"{row["title"]}: {row["unique_words"]}");
        recordCount++;
    }
    return recordCount;
}

Go

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

The Google Cloud Client Library for Go automatically paginates by default, so you do not need to implement pagination yourself, for example:

table := client.Dataset(datasetID).Table(tableID)
it := table.Read(ctx)
for {
	var row []bigquery.Value
	err := it.Next(&row)
	if err == iterator.Done {
		break
	}
	if err != nil {
		return err
	}
	fmt.Println(row)
}

Java

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

TableId tableIdObject = TableId.of(datasetName, tableName);
// This example reads the result 100 rows per RPC call. If there's no need to limit the number,
// simply omit the option.
TableResult tableData =
    bigquery.listTableData(tableIdObject, TableDataListOption.pageSize(100));
for (FieldValueList row : tableData.iterateAll()) {
  // do something with the row
}

Node.js

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

The Google Cloud Client Library for Node.js automatically paginates by default, so you do not need to implement pagination yourself, for example:

// Imports the Google Cloud client library
const BigQuery = require('@google-cloud/bigquery');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = "your-project-id";
// const datasetId = "my_dataset";
// const tableId = "my_table";

// Creates a client
const bigquery = new BigQuery({
  projectId: projectId,
});

// Lists rows in the table
bigquery
  .dataset(datasetId)
  .table(tableId)
  .getRows()
  .then(results => {
    const rows = results[0];
    console.log('Rows:');
    rows.forEach(row => console.log(row));
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

PHP

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

Pagination happens automatically in the Google Cloud Client Library for PHP using the generator function rows, which fetches the next page of results during iteration.

use Google\Cloud\BigQuery\BigQueryClient;

/**
 * Browse a bigquery table.
 * Example:
 * ```
 * browse_table($projectId, $datasetId, $tableId);
 * ```
 *
 * @param string $projectId  The Google project ID.
 * @param string $datasetId  The BigQuery dataset ID.
 * @param string $tableId    The BigQuery table ID.
 * @param string $maxResults The number of results to return at a time.
 * @param string $startIndex The row index to start on.
 *
 * @return int number of rows returned
 */
function browse_table($projectId, $datasetId, $tableId, $maxResults = 10, $startIndex = 0)
{
    $options = [
        'maxResults' => $maxResults,
        'startIndex' => $startIndex
    ];
    $bigQuery = new BigQueryClient([
        'projectId' => $projectId,
    ]);
    $dataset = $bigQuery->dataset($datasetId);
    $table = $dataset->table($tableId);
    $numRows = 0;
    foreach ($table->rows($options) as $row) {
        print('---');
        foreach ($row as $column => $value) {
            printf('%s: %s' . PHP_EOL, $column, $value);
        }
        $numRows++;
    }

    return $numRows;
}

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

# from google.cloud import bigquery
# client = bigquery.Client()

dataset_ref = client.dataset('samples', project='bigquery-public-data')
table_ref = dataset_ref.table('shakespeare')
table = client.get_table(table_ref)  # API call

# Load all rows from a table
rows = client.list_rows(table)
assert len(list(rows)) == table.num_rows

# Load the first 10 rows
rows = client.list_rows(table, max_results=10)
assert len(list(rows)) == 10

# Specify selected fields to limit the results to certain columns
fields = table.schema[:2]  # first two columns
rows = client.list_rows(table, selected_fields=fields, max_results=10)
assert len(rows.schema) == 2
assert len(list(rows)) == 10

# Use the start index to load an arbitrary portion of the table
rows = client.list_rows(table, start_index=10, max_results=10)

# Print row data in tabular format
format_string = '{!s:<16} ' * len(rows.schema)
field_names = [field.name for field in rows.schema]
print(format_string.format(*field_names))  # prints column headers
for row in rows:
    print(format_string.format(*row))      # prints row data

Ruby

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

Pagination happens automatically in the Google Cloud Client Library for Ruby using Table#data and Data#next.

# project_id = "Your Google Cloud project ID"
# dataset_id = "ID of the dataset containing table"
# table_id   = "ID of the table to display data for"

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new project: project_id
dataset  = bigquery.dataset dataset_id
table    = dataset.table table_id

table.data.each do |row|
  row.each      do |column_name, value|
    puts "#{column_name} = #{value}"
  end
end

Querying table data

After you load your data into BigQuery, you can query the data in your tables. BigQuery supports two types of queries:

By default, BigQuery runs interactive queries, which means that the query is executed as soon as possible.

BigQuery also offers batch queries. BigQuery queues each batch query on your behalf and starts the query as soon as idle resources are available, usually within a few minutes.

You can run interactive and batch queries by using the:

For more information on querying BigQuery tables, see Introduction to Querying BigQuery Data.

In addition to querying data stored in BigQuery tables, you can query data stored externally. For more information, see Introduction to External Data Sources.

Appending to and overwriting table data

You can overwrite table data using a load or query operation. You can append additional data to an existing table by performing a load-append operation or by appending query results to the table.

For more information on appending to or overwriting a table when loading data, see the documentation for your source data format:

To append to or overwrite a table using query results, specify a destination table and set the write disposition to either:

  • Append to table — Appends the query results to an existing table.
  • Overwrite table — Overwrites an existing table with the same name using the query results.

For more information on using query results to append to or overwrite data, see Writing Query Results.

Modifying table data

Using the standard SQL dialect, you can modify data in a table using DML statements. DML statements allow you to perform bulk row updates, insertions, and deletions. For examples on using DML, see Updating Table Data Using the Data Manipulation Language.

The legacy SQL dialect does not support DML statements. To update or delete data using legacy SQL, you must delete the table and then recreate it with new data. Alternatively, you can write a query that modifies the data and write the query results to a new, destination table.

Copying table data

You can copy a table by using the BigQuery web UI, the command-line tool's bq cp command, or by calling the jobs.insert API method and configuring a copy job.

For more information on copying tables, see Copying a table.

Exporting table data

You can export table data in CSV, JSON, or Avro format. Currently, you must export data into a Cloud Storage bucket. Exporting to your local machine is not supported; however, you can download and save query results using the BigQuery web UI.

For more information, see Exporting Table Data.

Next steps

Was this page helpful? Let us know how we did:

Send feedback about...