Migrating to Python Client Library v0.28

The Google BigQuery client library for Python v0.28 includes some significant changes to how previous client libraries were designed. These changes can be summarized as follows:

  • Query and view operations default to the Standard SQL dialect.
  • Client functions related to jobs, like running queries, immediately start the job.
  • Functions to create, get, update, delete datasets and tables moved to the client class.

This topic provides details on the changes that you need to make to your Python code for the BigQuery client libraries so that you can use the v0.28 Python client library.

Running previous versions of the client library

You are not required to upgrade your Python client library to v0.28. However, new functionality in the BigQuery API is only supported in the v0.28 and later versions.

If you want to continue using a previous version of the Python client library and you do not want to migrate your code, specify the version of the Python client library that is used by your app. To specify a specific library version, edit the requirements.txt file as shown in the following example:

google-cloud-bigquery==0.27

Running the latest version of the client library.

To install the latest version of the Python client library, use the pip command.

pip install --upgrade google-cloud-bigquery

For more detailed instructions, see BigQuery client libraries.

Importing the library and creating a client

Importing the Python client library and creating a client object is the same in previous and newer versions of the library.

from google.cloud import bigquery

client = bigquery.Client()

Query code changes

Querying data with the standard SQL dialect

Changes in v0.28 and later include:

  • Standard SQL is the default SQL dialect.
  • Use the QueryJobConfig class to configure the query job.
  • client.query() makes an API request to immediately start the query.
  • A job ID is optional. If one is not supplied, the client library generates one for you.

The following sample shows how to run a query.

Previous versions of the client libraries:

client = bigquery.Client()
query_job = client.run_async_query(str(uuid.uuid4()), query)

# Use standard SQL syntax.
query_job.use_legacy_sql = False

# Set a destination table.
dest_dataset = client.dataset(dest_dataset_id)
dest_table = dest_dataset.table(dest_table_id)
query_job.destination = dest_table

# Allow the results table to be overwritten.
query_job.write_disposition = 'WRITE_TRUNCATE'

query_job.begin()
query_job.result()  # Wait for query to finish.

rows = query_job.query_results().fetch_data()
for row in rows:
    print(row)

In version 0.25.0 or earlier of the google-cloud-bigquery library, instead of job.result(), the following code was required to wait for the job objects to finish:

while True:
    job.reload()  # Refreshes the state via a GET request.
    if job.state == 'DONE':
        if job.error_result:
            raise RuntimeError(job.errors)
        return
    time.sleep(1)

In version 0.25.0 or earlier of the google-cloud-bigquery library, instead of job.query_results().fetch_data(), the following code was used to get the resulting rows:

rows = query_job.results().fetch_data()

Python Client Library v0.28:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

QUERY_W_PARAM = (
    'SELECT name, state '
    'FROM `bigquery-public-data.usa_names.usa_1910_2013` '
    'WHERE state = @state '
    'LIMIT 100')
TIMEOUT = 30  # in seconds
param = bigquery.ScalarQueryParameter('state', 'STRING', 'TX')
job_config = bigquery.QueryJobConfig()
job_config.query_parameters = [param]
query_job = client.query(
    QUERY_W_PARAM, job_config=job_config)  # API request - starts the query
assert query_job.state == 'RUNNING'

# Waits for the query to finish
iterator = query_job.result(timeout=TIMEOUT)
rows = list(iterator)

assert query_job.state == 'DONE'
assert len(rows) == 100
row = rows[0]
assert row[0] == row.name == row['name']
assert row.state == 'TX'

For more details, see running interactive and batch queries.

Querying data with the legacy SQL dialect

The following sample shows how to run a query using the legacy SQL dialect.

Previous versions of the client libraries:

Previous versions of the client libraries defaulted to legacy SQL syntax. For information on how to configure and run a query, see the query sample.

Python Client Library v0.28:

The client library defaults to standard SQL syntax. Set use_legacy_sql to true to use legacy SQL. For information on how to configure and run a query, see the query sample.

Querying data synchronously

In v0.28 and later, the Client.query() method is recommended because one is able to access statistics and other properties of the query in the QueryJob, but for simple use cases a synchronous query method may be used.

Previous versions of the client libraries:

query_results = client.run_sync_query(query)
query_results.use_legacy_sql = False

query_results.run()

# The query might not complete in a single request. To account for a
# long-running query, force the query results to reload until the query
# is complete.
while not query_results.complete:
  query_iterator = query_results.fetch_data()
  try:
     six.next(iter(query_iterator))
  except StopIteration:
      pass

rows = query_results.fetch_data()
for row in rows:
    print(row)

Python Client Library v0.28:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

QUERY = (
    'SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` '
    'WHERE state = "TX" '
    'LIMIT 100')
TIMEOUT = 30  # in seconds
rows = list(client.query_rows(QUERY, timeout=TIMEOUT))  # API request

assert len(rows) == 100
row = rows[0]
assert row[0] == row.name == row['name']

Table code changes

Table references

Use a TableReference object to refer to a table without additional properties and a Table to refer to a full table resource. Several properties which formerly used the Table class now use the TableReference class in v0.28. For example:

  • QueryJob.destination is now a TableReference.
  • client.dataset('mydataset').table('mytable') now returns a TableReference.

For an example that uses both the TableReference and Table classes, see how to create a table.

Loading data from a local file

The following sample shows how to load a local file into a BigQuery table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)

# Reload the table to get the schema.
table.reload()

with open(source_file_name, 'rb') as source_file:
    # This example uses CSV, but you can use other formats.
    # See https://cloud.google.com/bigquery/loading-data
    job = table.upload_from_file(
        source_file, source_format='text/csv')

# Wait for the load job to complete.
while True:
    job.reload()
    if job.state == 'DONE':
        if job.error_result:
            raise RuntimeError(job.errors)
        return
    time.sleep(1)

Python Client Library v0.28:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

    csv_file = six.BytesIO(b"""full_name,age
Phred Phlyntstone,32
Wylma Phlyntstone,29
""")

    table_ref = dataset.table(TABLE_ID)
    job_config = bigquery.LoadJobConfig()
    job_config.source_format = 'CSV'
    job_config.skip_leading_rows = 1
    job = client.load_table_from_file(
        csv_file, table_ref, job_config=job_config)  # API request
    job.result()  # Waits for table load to complete.

For more details, see loading data from a local data source.

Loading data from Google Cloud Storage

The following sample shows how to load a file from Google Cloud Storage into a BigQuery table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
job_id = str(uuid.uuid4())

job = client.load_table_from_storage(
    job_id, table, 'gs://bucket_name/object_name')
job.begin()

# Wait for the load job to complete.
while True:
    job.reload()
    if job.state == 'DONE':
        if job.error_result:
            raise RuntimeError(job.errors)
        return
    time.sleep(1)

Python Client Library v0.28:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

table_ref = dataset.table('person_ages')
table = bigquery.Table(table_ref)
table.schema = [
    bigquery.SchemaField('full_name', 'STRING', mode='required'),
    bigquery.SchemaField('age', 'INTEGER', mode='required')
]
client.create_table(table)  # API request
GS_URL = 'gs://{}/{}'.format(bucket_name, blob_name)
job_id_prefix = "my_job"
job_config = bigquery.LoadJobConfig()
job_config.create_disposition = 'NEVER'
job_config.skip_leading_rows = 1
job_config.source_format = 'CSV'
job_config.write_disposition = 'WRITE_EMPTY'
load_job = client.load_table_from_uri(
    GS_URL, table_ref, job_config=job_config,
    job_id_prefix=job_id_prefix)  # API request

assert load_job.state == 'RUNNING'
assert load_job.job_type == 'load'

load_job.result()  # Waits for table load to complete.

assert load_job.state == 'DONE'
assert load_job.job_id.startswith(job_id_prefix)

For more details, see loading data from Google Cloud Storage.

Extracting a table to Google Cloud Storage

The following sample shows how to extract a table to Google Cloud Storage.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
job_id = str(uuid.uuid4())

job = client.extract_table_to_storage(
    job_id, table, 'gs://bucket_name/object_name')
job.begin()

# Wait for the job to complete.
while True:
    job.reload()
    if job.state == 'DONE':
        if job.error_result:
            raise RuntimeError(job.errors)
        return
    time.sleep(1)

Python Client Library v0.28:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

from google.cloud.storage import Client as StorageClient

storage_client = StorageClient()
bucket = storage_client.create_bucket(bucket_name)  # API request
destination_blob_name = 'person_ages_out.csv'
destination = bucket.blob(destination_blob_name)

destination_uri = 'gs://{}/{}'.format(bucket_name, destination_blob_name)
extract_job = client.extract_table(
    table_ref, destination_uri)  # API request
extract_job.result(timeout=100)  # Waits for job to complete.

got = destination.download_as_string().decode('utf-8')  # API request
assert 'Bharney Rhubble' in got

For more details, see exporting data.

Copying a table

The following sample shows how to copy a table to another table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
destination_table = dataset.table(new_table_name)

job_id = str(uuid.uuid4())
job = client.copy_table(job_id, destination_table, table)

job.create_disposition = (
        google.cloud.bigquery.job.CreateDisposition.CREATE_IF_NEEDED)
job.begin()

# Wait for the job to complete.
while True:
    job.reload()
    if job.state == 'DONE':
        if job.error_result:
            raise RuntimeError(job.errors)
        return
    time.sleep(1)

Python Client Library v0.28:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

source_dataset = bigquery.DatasetReference(
    'bigquery-public-data', 'samples')
source_table_ref = source_dataset.table('shakespeare')

dest_dataset = bigquery.Dataset(client.dataset(DATASET_ID))
dest_dataset = client.create_dataset(dest_dataset)  # API request
dest_table_ref = dest_dataset.table('destination_table')

job_config = bigquery.CopyJobConfig()
job = client.copy_table(
    source_table_ref, dest_table_ref, job_config=job_config)  # API request
job.result()  # Waits for job to complete.

assert job.state == 'DONE'
dest_table = client.get_table(dest_table_ref)  # API request
assert dest_table.table_id == 'destination_table'

For more details, see copying a table.

Streaming data into a table

The following sample shows how to write rows to a table's streaming buffer.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)

# Reload the table to get the schema.
table.reload()

rows = [('values', 'in', 'same', 'order', 'as', 'schema')]
errors = table.insert_data(rows)

if not errors:
    print('Loaded 1 row into {}:{}'.format(dataset_name, table_name))
else:
    do_something_with(errors)

Python Client Library v0.28:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

ROWS_TO_INSERT = [
    (u'Phred Phlyntstone', 32),
    (u'Wylma Phlyntstone', 29),
]

errors = client.create_rows(table, ROWS_TO_INSERT)  # API request

assert errors == []

For more details, see streaming data into BigQuery.

Get a table

The following sample shows how to get a table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
table.reload()

Python Client Library v0.28:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

assert table.description == ORIGINAL_DESCRIPTION
table.description = LOCALLY_CHANGED_DESCRIPTION
table = client.get_table(table)  # API request
assert table.description == ORIGINAL_DESCRIPTION

For more details, see getting information about tables.

Create a table

The following sample shows how to create a table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
table.create()

Python Client Library v0.28:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

SCHEMA = [
    bigquery.SchemaField('full_name', 'STRING', mode='required'),
    bigquery.SchemaField('age', 'INTEGER', mode='required'),
]
table_ref = dataset.table('my_table')
table = bigquery.Table(table_ref, schema=SCHEMA)
table = client.create_table(table)      # API request

assert table.table_id == 'my_table'

For more details, see creating a table.

Update a table

The following sample shows how to update a table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
table.patch(description='new description')

Note that previous versions of the library do not check versions of the table resource via the etag property, so a read-modify-write is unsafe.

Python Client Library v0.28:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

assert table.description == ORIGINAL_DESCRIPTION
table.description = UPDATED_DESCRIPTION

table = client.update_table(table, ['description'])  # API request

assert table.description == UPDATED_DESCRIPTION

For more details, see updating table properties.

Delete a table

The following sample shows how to delete a table.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
table = dataset.table(table_name)
table.delete()

Python Client Library v0.28:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

from google.cloud.exceptions import NotFound

client.delete_table(table)  # API request

with pytest.raises(NotFound):
    client.get_table(table)  # API request

For more details, see deleting a table.

Dataset code changes

Dataset references

Use a DatasetReference object to refer to a dataset without additional properties and a Dataset to refer to a full dataset resource. Some methods that formerly used the Dataset class now use the DatasetReference class in v0.28. For example:

  • client.dataset('mydataset') now returns a DatasetReference.

For an example that uses both the DatasetReference and Dataset classes, see how to create a dataset.

Get a dataset

The following sample shows how to get a dataset.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
dataset.reload()

Python Client Library v0.28:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

assert dataset.description == ORIGINAL_DESCRIPTION
dataset.description = LOCALLY_CHANGED_DESCRIPTION
assert dataset.description == LOCALLY_CHANGED_DESCRIPTION
dataset = client.get_dataset(dataset)  # API request
assert dataset.description == ORIGINAL_DESCRIPTION

For more details, see getting information about datasets.

Create a dataset

The following sample shows how to create a dataset.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
dataset.create()

Python Client Library v0.28:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

# DATASET_ID = 'dataset_ids_are_strings'
dataset_ref = client.dataset(DATASET_ID)
dataset = bigquery.Dataset(dataset_ref)
dataset.description = 'my dataset'
dataset = client.create_dataset(dataset)  # API request

For more details, see creating datasets.

Update a dataset

The following sample shows how to update a dataset.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
dataset.patch(description='new description')

Note that previous versions of the library do not check versions of the dataset resource via the etag property, so a read-modify-write is unsafe.

Python Client Library v0.28:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

assert dataset.description == ORIGINAL_DESCRIPTION
dataset.description = UPDATED_DESCRIPTION

dataset = client.update_dataset(dataset, ['description'])  # API request

assert dataset.description == UPDATED_DESCRIPTION

For more details, see updating datasets.

Delete a dataset

The following sample shows how to delete a dataset.

Previous versions of the client libraries:

dataset = client.dataset(dataset_name)
dataset.delete()

Python Client Library v0.28:

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

from google.cloud.exceptions import NotFound

client.delete_dataset(dataset)  # API request

with pytest.raises(NotFound):
    client.get_dataset(dataset)  # API request

For more details, see deleting datasets.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...