Migrating from the datalab Python package
The datalab
Python package is used to interact with Google Cloud
services via Datalab notebooks. The datalab
Python package
includes Jupyter magics and Python modules, such as google.datalab.bigquery
,
that support a subset of the BigQuery API methods.
The BigQuery client library,
google-cloud-bigquery
, is the official Python library that is used to interact
with BigQuery. The client library provides a Jupyter cell magic
for running queries, functions that allow sending and retrieving data by using
pandas DataFrames,
and the library supports full BigQuery functionality. The
following code examples illustrate how to perform common BigQuery
operations using the google-cloud-bigquery
library for developers who are
already familiar with the datalab
Python package.
See the requirements.txt file to view the versions of the libraries used for these code snippets.
Using Jupyter magics and shell commands
Both libraries support querying data stored in BigQuery with a cell magic. Key differences in the two libraries' approaches to magics include:
datalab |
google-cloud-bigquery |
|
---|---|---|
Magic name | bq |
bigquery |
Jupyter extension name (used for loading the magics) | google.datalab.kernel |
google.cloud.bigquery |
Query execution | Query definition and execution can be performed in separate steps. | The query is always immediately executed when the magic command is run. |
Functionality covered by magics | Some features not supported. | Only queries can be performed through magics. For other BigQuery functionality, use the command-line tool or google.cloud.bigquery.Client methods. |
Saving query results | Query results can be saved to a destination table through the query magic, but cannot be saved to a variable. To save query results for a variable, execute the query using python instead of magics (see example). | Query results can be saved to a variable through the query magic, but cannot be saved to a destination table. To save the query results to a destination table, run the query using python instead of magics (see example). |
Installing the Python client library
To install the BigQuery client library along with the dependencies required for working with pandas DataFrames, enter the following command in your notebook:
!pip install --upgrade 'google-cloud-bigquery[bqstorage,pandas]'
Restart your kernel after installing the package.
Loading magics
Jupyter magics are notebook-specific shortcuts that allow you to run commands
with minimal syntax. Jupyter notebooks come pre-loaded with many
built-in commands.
The datalab
and google-cloud-python
Python packages implement additional
magic commands that you can load into Jupyter notebooks (including
Datalab) to interact with Google Cloud.
datalab
The datalab
magics are pre-loaded into Datalab notebooks. To
load the magics in a Jupyter notebook, enter the following command:
%load_ext google.datalab.kernel
See more options for loading the magics in the datalab
library
source repo.
google-cloud-bigquery
To load the BigQuery magic, enter the following command:
%load_ext google.cloud.bigquery
The BigQuery cell magic will work in any notebook where the
google-cloud-bigquery
package is installed.
Running queries
The following examples show how to use a cell magic to run a query. Both examples run the query and display the results below the input cell.
datalab
google-cloud-bigquery
Running a query and storing the results in a variable
The following example shows how to run a query, and how to store the results in
a variable called my_variable
.
datalab
datalab
query magic can save a SQL query without
running it by passing a name with the --name
or -n
flags.
google-cloud-bigquery
Additional commands
The datalab
library contains magics for many types of BigQuery
operations, while the google-cloud-bigquery
library has one cell magic for
running queries. To run commands for oparations other than querying, use the
bq
command-line tool. The following examples show how to list all tables in the samples
dataset of the bigquery-public-data
project using a datalab
cell magic or a
BigQuery shell command.
datalab
bq
command-line tool
To get started with the bq
command-line tool, which is installed as part of the
Google Cloud CLI, follow the Google Cloud CLI installation instructions.
Note that shell commands in a notebook must be prepended with a !
. Once the
bq
command-line tool is set up and available from your notebook, enter the following
command, which is equivalent to the datalab
cell magic above.
!bq ls bigquery-public-data:samples
For a full list of commands, enter the following:
!bq help
Using Python code
In addition to Jupyter magics, you can also perform BigQuery
operations using Python methods in both the datalab
and
google-cloud-bigquery
packages.
Running a Query
Both libraries support running queries and returning the results as a pandas DataFrame.
datalab
google-cloud-bigquery
Loading data into a BigQuery table
The following example shows how to create a new dataset and load data from a CSV file from Cloud Storage into a new table.
datalab
google-cloud-bigquery
For more examples of using the BigQuery Python client library, see Batch loading data and Streaming data into BigQuery.
Loading a pandas DataFrame to a BigQuery table
The following example shows how to create a new dataset and load data from a pandas DataFrame into a new table.
Locations are required for certain BigQuery operations such as
creating a dataset. If a location is provided to a google-cloud-bigquery
client when it is initialized, it will be the default location for jobs,
datasets, and tables created with the client. The datalab
library does not
provide a way to specify dataset or job locations, which may lead to unexpected
behavior. See Dataset locations for more
information.
datalab
datalab
library performs a streaming insert when loading data
from a pandas DataFrame into a BigQuery table. Because of this,
the data may not be immediately available to queries. See
Streaming Data into BigQuery
for more information.