Migrating from Python Client Library v0.27

The Google BigQuery client library for Python v0.28 includes some significant changes to how previous client libraries were designed in v0.27 and earlier. These changes can be summarized as follows:

  • Query and view operations default to the Standard SQL dialect.
  • Client functions related to jobs, like running queries, immediately start the job.
  • Functions to create, get, update, delete datasets and tables moved to the client class.

This topic provides details on the changes that you need to make to your Python code for the BigQuery client libraries so that you can use the latest version of the Python client library.

Running previous versions of the client library

You are not required to upgrade your Python client library to the latest version. However, new functionality in the BigQuery API is only supported in the v0.28 and later versions.

If you want to continue using a previous version of the Python client library and you do not want to migrate your code, specify the version of the Python client library that is used by your app. To specify a specific library version, edit the requirements.txt file as shown in the following example:

google-cloud-bigquery==0.27

Running the latest version of the client library.

To install the latest version of the Python client library, use the pip command.

pip install --upgrade google-cloud-bigquery

For more detailed instructions, see BigQuery client libraries.

Importing the library and creating a client

Importing the Python client library and creating a client object is the same in previous and newer versions of the library.

from google.cloud import bigquery

client = bigquery.Client()

Query code changes

Querying data with the standard SQL dialect

Changes in v0.28 and later include:

  • Standard SQL is the default SQL dialect.
  • Use the QueryJobConfig class to configure the query job.
  • client.query() makes an API request to immediately start the query.
  • A job ID is optional. If one is not supplied, the client library generates one for you.

The following sample shows how to run a query.

Previous versions of the client libraries:

client = bigquery.Client()
query_job = client.run_async_query(str(uuid.uuid4()), query)

# Use standard SQL syntax.
query_job.use_legacy_sql = False

# Set a destination table.
dest_dataset = client.dataset(dest_dataset_id)
dest_table = dest_dataset.table(dest_table_id)
query_job.destination = dest_table

# Allow the results table to be overwritten.
query_job.write_disposition = 'WRITE_TRUNCATE'

query_job.begin()
query_job.result()  # Wait for query to finish.

rows = query_job.query_results().fetch_data()
for row in rows:
    print(row)

In version 0.25.0 or earlier of the google-cloud-bigquery library, instead of job.result(), the following code was required to wait for the job objects to finish:

while True:
    job.reload()  # Refreshes the state via a GET request.
    if job.state == 'DONE':
        if job.error_result:
            raise RuntimeError(job.errors)
        return
    time.sleep(1)

In version 0.25.0 or earlier of the google-cloud-bigquery library, instead of job.query_results().fetch_data(), the following code was used to get the resulting rows:

rows = query_job.results().fetch_data()

Latest version of the client library:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

QUERY = (
    'SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` '
    'WHERE state = "TX" '
    'LIMIT 100')
TIMEOUT = 30  # in seconds
query_job = client.query(QUERY)  # API request - starts the query

# Waits for the query to finish
iterator = query_job.result(timeout=TIMEOUT)
rows = list(iterator)

assert query_job.state == 'DONE'
assert len(rows) == 100
row = rows[0]
assert row[0] == row.name == row['name']

For more samples of running queries with the latest Python client library version, see:

Querying data with the legacy SQL dialect

The following sample shows how to run a query using the legacy SQL dialect.

Previous versions of the client libraries:

Previous versions of the client libraries defaulted to legacy SQL syntax. For information on how to configure and run a query, see the query sample.

Latest version of the client library:

The client library defaults to standard SQL syntax. Set use_legacy_sql to true to use legacy SQL. For information on how to configure and run a query, see the query sample.

Querying data synchronously

In v0.28 and later, the Client.query() method is recommended because one is able to access statistics and other properties of the query in the QueryJob.

Previous versions of the client libraries:

query_results = client.run_sync_query(query)
query_results.use_legacy_sql = False

query_results.run()

# The query might not complete in a single request. To account for a
# long-running query, force the query results to reload until the query
# is complete.
while not query_results.complete:
  query_iterator = query_results.fetch_data()
  try:
     six.next(iter(query_iterator))
  except StopIteration:
      pass

rows = query_results.fetch_data()
for row in rows:
    print(row)

Latest version of the client library:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

QUERY = (
    'SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` '
    'WHERE state = "TX" '
    'LIMIT 100')
query_job = client.query(QUERY)

for row in query_job:  # API request
    # Row values can be accessed by field name or index
    assert row[0] == row.name == row['name']

Table code changes

Table references

Use a TableReference object to refer to a table without additional properties and a Table to refer to a full table resource. Several properties which formerly used the Table class now use the TableReference class in v0.28 and later. For example:

  • QueryJob.destination is now a TableReference.
  • client.dataset('mydataset').table('mytable') now returns a TableReference.

For an example that uses both the TableReference and Table classes, see how to create a table.

Loading data from a local file

The following sample shows how to load a local file into a BigQuery table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)

# Reload the table to get the schema.
table.reload()

with open(source_file_name, 'rb') as source_file:
    # This example uses CSV, but you can use other formats.
    # See https://cloud.google.com/bigquery/loading-data
    job = table.upload_from_file(
        source_file, source_format='text/csv')

# Wait for the load job to complete.
while True:
    job.reload()
    if job.state == 'DONE':
        if job.error_result:
            raise RuntimeError(job.errors)
        return
    time.sleep(1)

Latest version of the client library:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

    csv_file = six.BytesIO(b"""full_name,age
Phred Phlyntstone,32
Wylma Phlyntstone,29
""")

    table_ref = dataset.table(TABLE_ID)
    job_config = bigquery.LoadJobConfig()
    job_config.source_format = 'CSV'
    job_config.skip_leading_rows = 1
    job_config.autodetect = True
    job = client.load_table_from_file(
        csv_file, table_ref, job_config=job_config)  # API request
    job.result()  # Waits for table load to complete.

For more details, see loading data from a local data source.

Loading data from Google Cloud Storage

The following sample shows how to load a file from Google Cloud Storage into a BigQuery table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
job_id = str(uuid.uuid4())

job = client.load_table_from_storage(
    job_id, table, 'gs://bucket_name/object_name')
job.begin()

# Wait for the load job to complete.
while True:
    job.reload()
    if job.state == 'DONE':
        if job.error_result:
            raise RuntimeError(job.errors)
        return
    time.sleep(1)

Latest version of the client library:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

table_ref = dataset.table('my_table')
GS_URL = 'gs://{}/{}'.format(bucket_name, blob_name)
job_id_prefix = "my_job"
job_config = bigquery.LoadJobConfig()
job_config.create_disposition = 'NEVER'
job_config.skip_leading_rows = 1
job_config.source_format = 'CSV'
job_config.write_disposition = 'WRITE_EMPTY'
load_job = client.load_table_from_uri(
    GS_URL, table_ref, job_config=job_config,
    job_id_prefix=job_id_prefix)  # API request

assert load_job.state == 'RUNNING'
assert load_job.job_type == 'load'

load_job.result()  # Waits for table load to complete.

assert load_job.state == 'DONE'
assert load_job.job_id.startswith(job_id_prefix)

For more details, see loading data from Google Cloud Storage.

Extracting a table to Google Cloud Storage

The following sample shows how to extract a table to Google Cloud Storage.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
job_id = str(uuid.uuid4())

job = client.extract_table_to_storage(
    job_id, table, 'gs://bucket_name/object_name')
job.begin()

# Wait for the job to complete.
while True:
    job.reload()
    if job.state == 'DONE':
        if job.error_result:
            raise RuntimeError(job.errors)
        return
    time.sleep(1)

Latest version of the client library:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

# client = bigquery.Client()
# bucket_name = 'my-bucket'
destination_uri = 'gs://{}/{}'.format(bucket_name, 'shakespeare.csv')
dataset_ref = client.dataset('samples', project='bigquery-public-data')
table_ref = dataset_ref.table('shakespeare')

extract_job = client.extract_table(
    table_ref, destination_uri)  # API request
extract_job.result()  # Waits for job to complete.

For more details, see exporting data.

Copying a table

The following sample shows how to copy a table to another table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
destination_table = dataset.table(new_table_name)

job_id = str(uuid.uuid4())
job = client.copy_table(job_id, destination_table, table)

job.create_disposition = (
        google.cloud.bigquery.job.CreateDisposition.CREATE_IF_NEEDED)
job.begin()

# Wait for the job to complete.
while True:
    job.reload()
    if job.state == 'DONE':
        if job.error_result:
            raise RuntimeError(job.errors)
        return
    time.sleep(1)

Latest version of the client library:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

# client = bigquery.Client()
source_dataset = client.dataset('samples', project='bigquery-public-data')
source_table_ref = source_dataset.table('shakespeare')

# dataset_id = 'my_dataset'
dest_table_ref = client.dataset(dataset_id).table('destination_table')

job = client.copy_table(source_table_ref, dest_table_ref)  # API request
job.result()  # Waits for job to complete.

assert job.state == 'DONE'
dest_table = client.get_table(dest_table_ref)  # API request
assert dest_table.num_rows > 0

For more details, see copying a table.

Streaming data into a table

The following sample shows how to write rows to a table's streaming buffer.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)

# Reload the table to get the schema.
table.reload()

rows = [('values', 'in', 'same', 'order', 'as', 'schema')]
errors = table.insert_data(rows)

if not errors:
    print('Loaded 1 row into {}:{}'.format(dataset_name, table_name))
else:
    do_something_with(errors)

Latest version of the client library:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

rows_to_insert = [
    (u'Phred Phlyntstone', 32),
    (u'Wylma Phlyntstone', 29),
]

errors = client.insert_rows(table, rows_to_insert)  # API request

assert errors == []

For more details, see streaming data into BigQuery.

Listing tables

The following sample shows how to list tables in a dataset.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
for table in dataset.list_tables():
    print(table.name)

Latest version of the client library:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

# client = bigquery.Client()
# dataset_ref = client.dataset('my_dataset')

tables = list(client.list_tables(dataset_ref))  # API request(s)
assert len(tables) == 0

table_ref = dataset.table('my_table')
table = bigquery.Table(table_ref)
client.create_table(table)                  # API request
tables = list(client.list_tables(dataset))  # API request(s)

assert len(tables) == 1
assert tables[0].table_id == 'my_table'

For more details, see listing tables in a dataset.

Get a table

The following sample shows how to get a table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
table.reload()

Latest version of the client library:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

# client = bigquery.Client()
# dataset_id = 'my_dataset'
# table_id = 'my_table'

dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
table = client.get_table(table_ref)  # API Request

# View table properties
print(table.schema)
print(table.description)
print(table.num_rows)

For more details, see getting information about tables.

Check that a table exists

The BigQuery API does not provide a native exists method. Instead, get the table resource and check if that request results in a 404 error. Previously, the client library provided the exists() helper to perform this check. The exists() helper allowed some inefficient use cases such as calling exists() before trying to get the full resource. As a result, the exists() helper was removed from the client library.

The following sample shows how to check whether a table exists.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
if table.exists():
    # do something
else:
    # do something else

Latest version of the client library:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

def table_exists(client, table_reference):
    """Return if a table exists.

    Args:
        client (google.cloud.bigquery.client.Client):
            A client to connect to the BigQuery API.
        table_reference (google.cloud.bigquery.table.TableReference):
            A reference to the table to look for.

    Returns:
        bool: ``True`` if the table exists, ``False`` otherwise.
    """
    from google.cloud.exceptions import NotFound

    try:
        client.get_table(table_reference)
        return True
    except NotFound:
        return False

Create a table

The following sample shows how to create a table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
table.create()

Latest version of the client library:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

# client = bigquery.Client()
# dataset_ref = client.dataset('my_dataset')

schema = [
    bigquery.SchemaField('full_name', 'STRING', mode='REQUIRED'),
    bigquery.SchemaField('age', 'INTEGER', mode='REQUIRED'),
]
table_ref = dataset_ref.table('my_table')
table = bigquery.Table(table_ref, schema=schema)
table = client.create_table(table)  # API request

assert table.table_id == 'my_table'

For more details, see creating a table.

Update a table

The following sample shows how to update a table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
table.patch(description='new description')

Note that previous versions of the library do not check versions of the table resource via the etag property, so a read-modify-write is unsafe.

Latest version of the client library:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

# client = bigquery.Client()
# table_ref = client.dataset('my_dataset').table('my_table')
# table = client.get_table(table_ref)  # API request

assert table.description == 'Original description.'
table.description = 'Updated description.'

table = client.update_table(table, ['description'])  # API request

assert table.description == 'Updated description.'

For more details, see updating table properties.

Delete a table

The following sample shows how to delete a table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
table.delete()

Latest version of the client library:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

# client = bigquery.Client()
# dataset_id = 'my_dataset'
# table_id = 'my_table'

table_ref = client.dataset(dataset_id).table(table_id)
client.delete_table(table_ref)  # API request

print('Table {}:{} deleted.'.format(dataset_id, table_id))

For more details, see deleting a table.

Dataset code changes

Dataset references

Use a DatasetReference object to refer to a dataset without additional properties and a Dataset to refer to a full dataset resource. Some methods that formerly used the Dataset class now use the DatasetReference class in v0.28 and later. For example:

  • client.dataset('mydataset') now returns a DatasetReference.

For an example that uses both the DatasetReference and Dataset classes, see how to create a dataset.

Listing datasets

The following sample shows how to list datasets in a project.

Previous versions of the client libraries:

for dataset in client.list_datasets():
    print(dataset.name)

Latest version of the client library:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

# client = bigquery.Client()
datasets = list(client.list_datasets())
project = client.project

if datasets:
    print('Datasets in project {}:'.format(project))
    for dataset in datasets:  # API request(s)
        print('\t{}'.format(dataset.dataset_id))
else:
    print('{} project does not contain any datasets.'.format(project))

For more details, see listing datasets.

Get a dataset

The following sample shows how to get a dataset.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
dataset.reload()

Latest version of the client library:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

# client = bigquery.Client()
# dataset_id = 'my_dataset'

dataset_ref = client.dataset(dataset_id)
dataset = client.get_dataset(dataset_ref)  # API request

# View dataset properties
print('Dataset ID: '.format(dataset_id))
print('Description: '.format(dataset.description))
print('Labels:')
for label, value in dataset.labels.items():
    print('\t{}: {}'.format(label, value))
# View tables in dataset
print('Tables:')
tables = list(client.list_tables(dataset_ref))  # API request(s)
if tables:
    for table in tables:
        print('\t{}'.format(table.table_id))
else:
    print('\tThis dataset does not contain any tables.')

For more details, see getting information about datasets.

Check that a dataset exists

The BigQuery API does not provide a native exists method. Instead, get the dataset resource and check if that request results in a 404 error. Previously, the client library provided the exists() helper to perform this check. The exists() helper allowed some inefficient use cases such as calling exists() before trying to get the full resource. As a result, the exists() helper was removed from the client library.

The following sample shows how to check whether a dataset exists.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
if dataset.exists():
    # do something
else:
    # do something else

Latest version of the client library:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

def dataset_exists(client, dataset_reference):
    """Return if a dataset exists.

    Args:
        client (google.cloud.bigquery.client.Client):
            A client to connect to the BigQuery API.
        dataset_reference (google.cloud.bigquery.dataset.DatasetReference):
            A reference to the dataset to look for.

    Returns:
        bool: ``True`` if the dataset exists, ``False`` otherwise.
    """
    from google.cloud.exceptions import NotFound

    try:
        client.get_dataset(dataset_reference)
        return True
    except NotFound:
        return False

Create a dataset

The following sample shows how to create a dataset.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
dataset.create()

Latest version of the client library:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

# client = bigquery.Client()
# dataset_id = 'my_dataset'

# Create a DatasetReference using a chosen dataset ID.
# The project defaults to the Client's project if not specified.
dataset_ref = client.dataset(dataset_id)

# Construct a full Dataset object to send to the API.
dataset = bigquery.Dataset(dataset_ref)
# Specify the geographic location where the dataset should reside.
dataset.location = 'US'

# Send the dataset to the API for creation.
# Raises google.api_core.exceptions.AlreadyExists if the Dataset already
# exists within the project.
dataset = client.create_dataset(dataset)  # API request

For more details, see creating datasets.

Update a dataset

The following sample shows how to update a dataset.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
dataset.patch(description='new description')

Note that previous versions of the library do not check versions of the dataset resource via the etag property, so a read-modify-write is unsafe.

Latest version of the client library:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

# client = bigquery.Client()
# dataset_ref = client.dataset('my_dataset')
# dataset = client.get_dataset(dataset_ref)  # API request

assert dataset.description == 'Original description.'
dataset.description = 'Updated description.'

dataset = client.update_dataset(dataset, ['description'])  # API request

assert dataset.description == 'Updated description.'

For more details, see updating dataset properties.

Delete a dataset

The following sample shows how to delete a dataset.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
dataset.delete()

Latest version of the client library:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

# client = bigquery.Client()

# Delete a dataset that does not contain any tables
# dataset1_id = 'my_empty_dataset'
dataset1_ref = client.dataset(dataset1_id)
client.delete_dataset(dataset1_ref)  # API request

print('Dataset {} deleted.'.format(dataset1_id))

# Use the delete_contents parameter to delete a dataset and its contents
# dataset2_id = 'my_dataset_with_tables'
dataset2_ref = client.dataset(dataset2_id)
client.delete_dataset(dataset2_ref, delete_contents=True)  # API request

print('Dataset {} deleted.'.format(dataset2_id))

For more details, see deleting datasets.

Send feedback about...