Migrating from Python Client Library v0.27

The Google BigQuery client library for Python v0.28 includes some significant changes to how previous client libraries were designed in v0.27 and earlier. These changes can be summarized as follows:

  • Query and view operations default to the Standard SQL dialect.
  • Client functions related to jobs, like running queries, immediately start the job.
  • Functions to create, get, update, delete datasets and tables moved to the client class.

This topic provides details on the changes that you need to make to your Python code for the BigQuery client libraries so that you can use the latest version of the Python client library.

Running previous versions of the client library

You are not required to upgrade your Python client library to the latest version. However, new functionality in the BigQuery API is only supported in the v0.28 and later versions.

If you want to continue using a previous version of the Python client library and you do not want to migrate your code, specify the version of the Python client library that is used by your app. To specify a specific library version, edit the requirements.txt file as shown in the following example:

google-cloud-bigquery==0.27

Running the latest version of the client library.

To install the latest version of the Python client library, use the pip command.

pip install --upgrade google-cloud-bigquery

For more detailed instructions, see BigQuery client libraries.

Importing the library and creating a client

Importing the Python client library and creating a client object is the same in previous and newer versions of the library.

from google.cloud import bigquery

client = bigquery.Client()

Query code changes

Querying data with the standard SQL dialect

Changes in v0.28 and later include:

  • Standard SQL is the default SQL dialect.
  • Use the QueryJobConfig class to configure the query job.
  • client.query() makes an API request to immediately start the query.
  • A job ID is optional. If one is not supplied, the client library generates one for you.

The following sample shows how to run a query.

Previous versions of the client libraries:

client = bigquery.Client()
query_job = client.run_async_query(str(uuid.uuid4()), query)

# Use standard SQL syntax.
query_job.use_legacy_sql = False

# Set a destination table.
dest_dataset = client.dataset(dest_dataset_id)
dest_table = dest_dataset.table(dest_table_id)
query_job.destination = dest_table

# Allow the results table to be overwritten.
query_job.write_disposition = 'WRITE_TRUNCATE'

query_job.begin()
query_job.result()  # Wait for query to finish.

rows = query_job.query_results().fetch_data()
for row in rows:
    print(row)

In version 0.25.0 or earlier of the google-cloud-bigquery library, instead of job.result(), the following code was required to wait for the job objects to finish:

while True:
    job.reload()  # Refreshes the state via a GET request.
    if job.state == 'DONE':
        if job.error_result:
            raise RuntimeError(job.errors)
        return
    time.sleep(1)

In version 0.25.0 or earlier of the google-cloud-bigquery library, instead of job.query_results().fetch_data(), the following code was used to get the resulting rows:

rows = query_job.results().fetch_data()

Latest version of the client library:

The Python client library now uses standard SQL by default.

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# from google.cloud import bigquery
# client = bigquery.Client()

query = (
    'SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` '
    'WHERE state = "TX" '
    'LIMIT 100')
query_job = client.query(
    query,
    # Location must match that of the dataset(s) referenced in the query.
    location='US')  # API request - starts the query

for row in query_job:  # API request - fetches results
    # Row values can be accessed by field name or index
    assert row[0] == row.name == row['name']
    print(row)

For more samples of running queries with the latest Python client library version, see:

Querying data with the legacy SQL dialect

The following sample shows how to run a query using the legacy SQL dialect.

Previous versions of the client libraries:

Previous versions of the client libraries defaulted to legacy SQL syntax. For information on how to configure and run a query, see the query sample.

Latest version of the client library:

The client library defaults to standard SQL syntax. Set use_legacy_sql to true to use legacy SQL. For information on how to configure and run a query, see the query sample.

Querying data synchronously

In v0.28 and later, the Client.query() method is recommended because one is able to access statistics and other properties of the query in the QueryJob.

Previous versions of the client libraries:

query_results = client.run_sync_query(query)
query_results.use_legacy_sql = False

query_results.run()

# The query might not complete in a single request. To account for a
# long-running query, force the query results to reload until the query
# is complete.
while not query_results.complete:
  query_iterator = query_results.fetch_data()
  try:
     six.next(iter(query_iterator))
  except StopIteration:
      pass

rows = query_results.fetch_data()
for row in rows:
    print(row)

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# from google.cloud import bigquery
# client = bigquery.Client()

query = (
    'SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` '
    'WHERE state = "TX" '
    'LIMIT 100')
query_job = client.query(
    query,
    # Location must match that of the dataset(s) referenced in the query.
    location='US')  # API request - starts the query

for row in query_job:  # API request - fetches results
    # Row values can be accessed by field name or index
    assert row[0] == row.name == row['name']
    print(row)

Table code changes

Table references

Use a TableReference object to refer to a table without additional properties and a Table to refer to a full table resource. Several properties which formerly used the Table class now use the TableReference class in v0.28 and later. For example:

  • QueryJob.destination is now a TableReference.
  • client.dataset('mydataset').table('mytable') now returns a TableReference.

For an example that uses both the TableReference and Table classes, see how to create a table.

Loading data from a local file

The following sample shows how to load a local file into a BigQuery table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)

# Reload the table to get the schema.
table.reload()

with open(source_file_name, 'rb') as source_file:
    # This example uses CSV, but you can use other formats.
    # See https://cloud.google.com/bigquery/loading-data
    job = table.upload_from_file(
        source_file, source_format='text/csv')

# Wait for the load job to complete.
while True:
    job.reload()
    if job.state == 'DONE':
        if job.error_result:
            raise RuntimeError(job.errors)
        return
    time.sleep(1)

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# from google.cloud import bigquery
# client = bigquery.Client()
# filename = '/path/to/file.csv'
# dataset_id = 'my_dataset'
# table_id = 'my_table'

dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.CSV
job_config.skip_leading_rows = 1
job_config.autodetect = True

with open(filename, 'rb') as source_file:
    job = client.load_table_from_file(
        source_file,
        table_ref,
        location='US',  # Must match the destination dataset location.
        job_config=job_config)  # API request

job.result()  # Waits for table load to complete.

print('Loaded {} rows into {}:{}.'.format(
    job.output_rows, dataset_id, table_id))

For more details, see loading data from a local data source.

Loading data from Google Cloud Storage

The following sample shows how to load a file from Google Cloud Storage into a BigQuery table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
job_id = str(uuid.uuid4())

job = client.load_table_from_storage(
    job_id, table, 'gs://bucket_name/object_name')
job.begin()

# Wait for the load job to complete.
while True:
    job.reload()
    if job.state == 'DONE':
        if job.error_result:
            raise RuntimeError(job.errors)
        return
    time.sleep(1)

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'my_dataset'

dataset_ref = client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
job_config.schema = [
    bigquery.SchemaField('name', 'STRING'),
    bigquery.SchemaField('post_abbr', 'STRING')
]
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
uri = 'gs://cloud-samples-data/bigquery/us-states/us-states.csv'

load_job = client.load_table_from_uri(
    uri,
    dataset_ref.table('us_states'),
    job_config=job_config)  # API request
print('Starting job {}'.format(load_job.job_id))

load_job.result()  # Waits for table load to complete.
print('Job finished.')

destination_table = client.get_table(dataset_ref.table('us_states'))
print('Loaded {} rows.'.format(destination_table.num_rows))

For more details, see loading data from Google Cloud Storage.

Extracting a table to Google Cloud Storage

The following sample shows how to extract a table to Google Cloud Storage.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
job_id = str(uuid.uuid4())

job = client.extract_table_to_storage(
    job_id, table, 'gs://bucket_name/object_name')
job.begin()

# Wait for the job to complete.
while True:
    job.reload()
    if job.state == 'DONE':
        if job.error_result:
            raise RuntimeError(job.errors)
        return
    time.sleep(1)

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# from google.cloud import bigquery
# client = bigquery.Client()
# bucket_name = 'my-bucket'
project = 'bigquery-public-data'
dataset_id = 'samples'
table_id = 'shakespeare'

destination_uri = 'gs://{}/{}'.format(bucket_name, 'shakespeare.csv')
dataset_ref = client.dataset(dataset_id, project=project)
table_ref = dataset_ref.table(table_id)

extract_job = client.extract_table(
    table_ref,
    destination_uri,
    # Location must match that of the source table.
    location='US')  # API request
extract_job.result()  # Waits for job to complete.

print('Exported {}:{}.{} to {}'.format(
    project, dataset_id, table_id, destination_uri))

For more details, see exporting data.

Copying a table

The following sample shows how to copy a table to another table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
destination_table = dataset.table(new_table_name)

job_id = str(uuid.uuid4())
job = client.copy_table(job_id, destination_table, table)

job.create_disposition = (
        google.cloud.bigquery.job.CreateDisposition.CREATE_IF_NEEDED)
job.begin()

# Wait for the job to complete.
while True:
    job.reload()
    if job.state == 'DONE':
        if job.error_result:
            raise RuntimeError(job.errors)
        return
    time.sleep(1)

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# from google.cloud import bigquery
# client = bigquery.Client()

source_dataset = client.dataset('samples', project='bigquery-public-data')
source_table_ref = source_dataset.table('shakespeare')

# dataset_id = 'my_dataset'
dest_table_ref = client.dataset(dataset_id).table('destination_table')

job = client.copy_table(
    source_table_ref,
    dest_table_ref,
    # Location must match that of the source and destination tables.
    location='US')  # API request

job.result()  # Waits for job to complete.

assert job.state == 'DONE'
dest_table = client.get_table(dest_table_ref)  # API request
assert dest_table.num_rows > 0

For more details, see copying a table.

Streaming data into a table

The following sample shows how to write rows to a table's streaming buffer.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)

# Reload the table to get the schema.
table.reload()

rows = [('values', 'in', 'same', 'order', 'as', 'schema')]
errors = table.insert_data(rows)

if not errors:
    print('Loaded 1 row into {}:{}'.format(dataset_name, table_name))
else:
    do_something_with(errors)

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# TODO(developer): Uncomment the lines below and replace with your values.
# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'my_dataset'  # replace with your dataset ID
# For this sample, the table must already exist and have a defined schema
# table_id = 'my_table'  # replace with your table ID
# table_ref = client.dataset(dataset_id).table(table_id)
# table = client.get_table(table_ref)  # API request

rows_to_insert = [
    (u'Phred Phlyntstone', 32),
    (u'Wylma Phlyntstone', 29),
]

errors = client.insert_rows(table, rows_to_insert)  # API request

assert errors == []

For more details, see streaming data into BigQuery.

Listing tables

The following sample shows how to list tables in a dataset.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
for table in dataset.list_tables():
    print(table.name)

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_ref = client.dataset('my_dataset')

tables = list(client.list_tables(dataset_ref))  # API request(s)
assert len(tables) == 0

table_ref = dataset.table('my_table')
table = bigquery.Table(table_ref)
client.create_table(table)                  # API request
tables = list(client.list_tables(dataset))  # API request(s)

assert len(tables) == 1
assert tables[0].table_id == 'my_table'

For more details, see listing tables in a dataset.

Get a table

The following sample shows how to get a table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
table.reload()

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'my_dataset'
# table_id = 'my_table'

dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
table = client.get_table(table_ref)  # API Request

# View table properties
print(table.schema)
print(table.description)
print(table.num_rows)

For more details, see getting information about tables.

Check that a table exists

The BigQuery API does not provide a native exists method. Instead, get the table resource and check if that request results in a 404 error. Previously, the client library provided the exists() helper to perform this check. The exists() helper allowed some inefficient use cases such as calling exists() before trying to get the full resource. As a result, the exists() helper was removed from the client library.

The following sample shows how to check whether a table exists.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
if table.exists():
    # do something
else:
    # do something else

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

def table_exists(client, table_reference):
    """Return if a table exists.

    Args:
        client (google.cloud.bigquery.client.Client):
            A client to connect to the BigQuery API.
        table_reference (google.cloud.bigquery.table.TableReference):
            A reference to the table to look for.

    Returns:
        bool: ``True`` if the table exists, ``False`` otherwise.
    """
    from google.cloud.exceptions import NotFound

    try:
        client.get_table(table_reference)
        return True
    except NotFound:
        return False

Create a table

The following sample shows how to create a table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
table.create()

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_ref = client.dataset('my_dataset')

schema = [
    bigquery.SchemaField('full_name', 'STRING', mode='REQUIRED'),
    bigquery.SchemaField('age', 'INTEGER', mode='REQUIRED'),
]
table_ref = dataset_ref.table('my_table')
table = bigquery.Table(table_ref, schema=schema)
table = client.create_table(table)  # API request

assert table.table_id == 'my_table'

For more details, see creating a table.

Update a table

The following sample shows how to update a table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
table.patch(description='new description')

Note that previous versions of the library do not check versions of the table resource via the etag property, so a read-modify-write is unsafe.

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# from google.cloud import bigquery
# client = bigquery.Client()
# table_ref = client.dataset('my_dataset').table('my_table')
# table = client.get_table(table_ref)  # API request

assert table.description == 'Original description.'
table.description = 'Updated description.'

table = client.update_table(table, ['description'])  # API request

assert table.description == 'Updated description.'

For more details, see updating table properties.

Delete a table

The following sample shows how to delete a table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
table.delete()

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'my_dataset'
# table_id = 'my_table'

table_ref = client.dataset(dataset_id).table(table_id)
client.delete_table(table_ref)  # API request

print('Table {}:{} deleted.'.format(dataset_id, table_id))

For more details, see deleting a table.

Dataset code changes

Dataset references

Use a DatasetReference object to refer to a dataset without additional properties and a Dataset to refer to a full dataset resource. Some methods that formerly used the Dataset class now use the DatasetReference class in v0.28 and later. For example:

  • client.dataset('mydataset') now returns a DatasetReference.

For an example that uses both the DatasetReference and Dataset classes, see how to create a dataset.

Listing datasets

The following sample shows how to list datasets in a project.

Previous versions of the client libraries:

for dataset in client.list_datasets():
    print(dataset.name)

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# from google.cloud import bigquery
# client = bigquery.Client()

datasets = list(client.list_datasets())
project = client.project

if datasets:
    print('Datasets in project {}:'.format(project))
    for dataset in datasets:  # API request(s)
        print('\t{}'.format(dataset.dataset_id))
else:
    print('{} project does not contain any datasets.'.format(project))

For more details, see listing datasets.

Get a dataset

The following sample shows how to get a dataset.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
dataset.reload()

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'my_dataset'

dataset_ref = client.dataset(dataset_id)
dataset = client.get_dataset(dataset_ref)  # API request

# View dataset properties
print('Dataset ID: {}'.format(dataset_id))
print('Description: {}'.format(dataset.description))
print('Labels:')
labels = dataset.labels
if labels:
    for label, value in labels.items():
        print('\t{}: {}'.format(label, value))
else:
    print("\tDataset has no labels defined.")

# View tables in dataset
print('Tables:')
tables = list(client.list_tables(dataset_ref))  # API request(s)
if tables:
    for table in tables:
        print('\t{}'.format(table.table_id))
else:
    print('\tThis dataset does not contain any tables.')

For more details, see getting information about datasets.

Check that a dataset exists

The BigQuery API does not provide a native exists method. Instead, get the dataset resource and check if that request results in a 404 error. Previously, the client library provided the exists() helper to perform this check. The exists() helper allowed some inefficient use cases such as calling exists() before trying to get the full resource. As a result, the exists() helper was removed from the client library.

The following sample shows how to check whether a dataset exists.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
if dataset.exists():
    # do something
else:
    # do something else

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

def dataset_exists(client, dataset_reference):
    """Return if a dataset exists.

    Args:
        client (google.cloud.bigquery.client.Client):
            A client to connect to the BigQuery API.
        dataset_reference (google.cloud.bigquery.dataset.DatasetReference):
            A reference to the dataset to look for.

    Returns:
        bool: ``True`` if the dataset exists, ``False`` otherwise.
    """
    from google.cloud.exceptions import NotFound

    try:
        client.get_dataset(dataset_reference)
        return True
    except NotFound:
        return False

Create a dataset

The following sample shows how to create a dataset.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
dataset.create()

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'my_dataset'

# Create a DatasetReference using a chosen dataset ID.
# The project defaults to the Client's project if not specified.
dataset_ref = client.dataset(dataset_id)

# Construct a full Dataset object to send to the API.
dataset = bigquery.Dataset(dataset_ref)
# Specify the geographic location where the dataset should reside.
dataset.location = "US"

# Send the dataset to the API for creation.
# Raises google.api_core.exceptions.Conflict if the Dataset already
# exists within the project.
dataset = client.create_dataset(dataset)  # API request

For more details, see creating datasets.

Update a dataset

The following sample shows how to update a dataset.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
dataset.patch(description='new description')

Note that previous versions of the library do not check versions of the dataset resource via the etag property, so a read-modify-write is unsafe.

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_ref = client.dataset('my_dataset')
# dataset = client.get_dataset(dataset_ref)  # API request

assert dataset.description == 'Original description.'
dataset.description = 'Updated description.'

dataset = client.update_dataset(dataset, ['description'])  # API request

assert dataset.description == 'Updated description.'

For more details, see updating dataset properties.

Delete a dataset

The following sample shows how to delete a dataset.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
dataset.delete()

Latest version of the client library:

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# from google.cloud import bigquery
# client = bigquery.Client()

# Delete a dataset that does not contain any tables
# dataset1_id = 'my_empty_dataset'
dataset1_ref = client.dataset(dataset1_id)
client.delete_dataset(dataset1_ref)  # API request

print('Dataset {} deleted.'.format(dataset1_id))

# Use the delete_contents parameter to delete a dataset and its contents
# dataset2_id = 'my_dataset_with_tables'
dataset2_ref = client.dataset(dataset2_id)
client.delete_dataset(dataset2_ref, delete_contents=True)  # API request

print('Dataset {} deleted.'.format(dataset2_id))

For more details, see deleting datasets.

Was this page helpful? Let us know how we did:

Send feedback about...

Need help? Visit our support page.