Query data in BigQuery tables from within JupyterLab

Stay organized with collections Save and categorize content based on your preferences.

This page shows you how to query data that is stored in BigQuery from within the JupyterLab interface of your Vertex AI Workbench managed notebooks instance.

Overview

Vertex AI Workbench managed notebooks instances include a BigQuery integration that lets you browse and query data from within the JupyterLab interface.

The BigQuery integration also generates a code segment that imports the BigQuery client library for Python, runs your query in a notebook cell, and stores the results in a pandas dataframe.

This decreases the time it takes to use data from BigQuery within JupyterLab notebook files.

Methods for querying BigQuery data in notebook (ipynb) files

The BigQuery integration for managed notebooks provides two query editors that let you query data from within the JupyterLab interface. This guide describes how to use these query editors and other related features of the BigQuery integration.

You can also use the BigQuery client library for Python to run queries, but this guide does not cover that topic. See Visualizing BigQuery data in a Jupyter notebook.

Before you begin

If you haven't already, create a managed notebooks instance.

Open JupyterLab

  1. In the Google Cloud console, go to the Managed notebooks page.

    Go to Managed notebooks

  2. Next to your managed notebooks instance's name, click Open JupyterLab.

    Your managed notebooks instance opens JupyterLab.

Browse BigQuery resources

The BigQuery integration provides a pane for browsing the BigQuery resources that you have access to.

  1. In JupyterLab, click the  BigQuery in Notebooks button.

    The BigQuery integration lists the available datasets.

  2. Double-click a dataset name to view a description of the dataset.

  3. Expand the dataset to show the dataset's tables, views, and models.

  4. Double-click a table, view, or model to open a summary description as a tab in JupyterLab.

See the next section of this guide to query tables.

Query tables

The BigQuery integration provides the folowing methods for writing queries:

  • The In-cell query editor is a cell type that you can use within your notebook files.

  • The Stand-alone query editor opens as a separate tab in JupyterLab.

In-cell

To use the in-cell query editor to query data in a BigQuery table, complete the following steps:

  1. In JupyterLab, open a notebook (ipynb) file or create a new one.

  2. To create an in-cell query editor, click the cell, and then to the right of the cell, click the  BigQuery Integration button. Or in a markdown cell, type #@BigQuery.

    The BigQuery integration converts the cell into an in-cell query editor.

  3. On a new line below #@BigQuery, write your query using the supported statements and SQL dialects of BigQuery. If errors are detected in your query, an error message appears in the top right corner of the query editor. If the query is valid, the estimated number of bytes to be processed appears.

  4. Click Submit Query. Your query results appear. By default, query results are paginated at 100 rows per page and limited to 1,000 rows total, but you can change these settings at the bottom of the results table. In the query editor, keep the query limited to only the data you need to verify your query. You'll run this query again in a notebook cell, where you can adjust the limit to retrieve the full results set if you want.

  5. You can click Query and load as DataFrame to automatically add a new cell that contains a code segment that imports the BigQuery client library for Python, runs your query in a notebook cell, and stores the results in a pandas dataframe named df.

Stand-alone

To use the stand-alone query editor to query data in a BigQuery table, complete the following steps:

  1. In JupyterLab, in the BigQuery in Notebooks pane, right-click a table, and select Query table, or double-click a table to open a description in a separate tab, and then click the Query table link.

  2. Write your query using the supported statements and SQL dialects of BigQuery. If errors are detected in your query, an error message appears in the top right corner of the query editor. If the query is valid, the estimated number of bytes to be processed appears.

  3. Click Submit Query. Your query results appear. By default, query results are paginated at 100 rows per page and limited to 1,000 rows total, but you can change these settings at the bottom of the results table. In the query editor, keep the query limited to only the data you need to verify your query. You'll run this query again in a notebook cell, where you can adjust the limit to retrieve the full results set if you want.

  4. You can click Copy code for DataFrame to copy a code segment that imports the BigQuery client library for Python, runs your query in a notebook cell, and stores the results in a pandas dataframe named df. Paste this code into a notebook cell where you want to run it.

View query history

To view your query history as a tab in JupyterLab, complete the following steps:

  1. In JupyterLab, click the  BigQuery in Notebooks button.

  2. At the bottom of the BigQuery in Notebooks pane, click Query history.

  3. A list of your queries opens in a new tab.

  4. Click a query to view details such as its job ID, when the query was run, and how long it took.

  5. Click Open query in editor to revise the query and run it again.

What's next