Use SQL cells
This guide describes how to use SQL cells to query data from within a Colab Enterprise notebook.
Overview
A SQL cell is a code cell for writing, editing, and running SQL queries from within your Colab Enterprise notebook. SQL cells provide an alternative workflow to IPython Magics for BigQuery.
Capabilities
SQL cells provide the following capabilities:
- Dry-run support: SQL statement validation and an approximation of the number of bytes processed by the query
- Formatting: Keyword linting and syntax highlighting
- BigQuery DataFrame output variable naming: Refer to the output variable from within other notebook cells
- Variable replacement: Refer to Python variables and SQL cells to support parameterization and the ability to query the results of a previous query
- Result set viewer: Lightweight tabular result set viewer with pagination for large result sets
Supported SQL dialect and data source
Colab Enterprise SQL cells support GoogleSQL.
You can run SQL queries on BigQuery data.
Limitations
Consider the following limitations when you plan your project:
- You can run multiple SQL statements in a single SQL cell, but only the results of the last SQL statement are saved to a DataFrame.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator
(
roles/resourcemanager.projectCreator
), which contains theresourcemanager.projects.create
permission. Learn how to grant roles.
-
Verify that billing is enabled for your Google Cloud project.
-
Enable the BigQuery, Compute Engine, Dataform, and Vertex AI APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin
), which contains theserviceusage.services.enable
permission. Learn how to grant roles. -
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator
(
roles/resourcemanager.projectCreator
), which contains theresourcemanager.projects.create
permission. Learn how to grant roles.
-
Verify that billing is enabled for your Google Cloud project.
-
Enable the BigQuery, Compute Engine, Dataform, and Vertex AI APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin
), which contains theserviceusage.services.enable
permission. Learn how to grant roles.
Required roles
To get the permissions that you need to create a Colab Enterprise notebook, run the notebook's code on a runtime, and use BigQuery data in the notebook, ask your administrator to grant you the following IAM roles on the project:
-
BigQuery User (
roles/bigquery.user
) -
Colab Enterprise User (
roles/aiplatform.colabEnterpriseUser
)
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Create a SQL cell
To create a SQL cell in Colab Enterprise, do the following:
-
In the Google Cloud console, go to the Colab Enterprise My notebooks page.
-
In the Region menu, select the region that contains your notebook.
-
Click the notebook that you want to open. If you haven't created a notebook yet, create a notebook.
-
In the toolbar, to add a SQL cell, click the
Insert code cell options menu and select Add SQL cell.Your SQL cell is added to your notebook.
Enter and run a query
-
In your SQL cell, enter a SQL query. For an overview of supported statements and SQL dialects, see Introduction to SQL in BigQuery.
You can refer to Python variables in expressions by enclosing the variable name in braces (
{ }
). For example, if you specified a value in a Python variable namedmy_threshold
, you might limit your result set with a query similar to the following:SELECT * FROM my_dataset.my_table WHERE x > {my_threshold};
-
Hold the pointer over the SQL cell that you want to run, and then click the
Run cell button.
The output of the query is automatically saved as a BigQuery DataFrame with the same name as the title of the SQL cell.
Interact with the result set
You can interact with the result set as a BigQuery DataFrame or a pandas DataFrame.
You can chain SQL statements using the same SQL cell variable name. For
example, you can use BigQuery DataFrames generated by the result set
as tables in a following query by enclosing the DataFrame name in braces
({ }
). See the following example, which references a previous query's
output that was saved as a DataFrame named df
:
SELECT * FROM {df};