Introduction to BigQuery administration
This document provides an introduction to BigQuery administration tasks, and the BigQuery features that help you accomplish them.
BigQuery administrators typically do the following types of tasks:
- Manage resources, such as projects, datasets, and tables.
- Secure resources, so that access is limited to the principals who need it.
- Manage workloads, such as jobs, queries, and compute capacity (reservations).
- Monitor resources, including quotas, jobs, and compute usage.
- Optimize workloads for best performance while controlling costs.
- Troubleshoot error messages, billing issues, and quotas.
This document gives an overview of the features that BigQuery provides to help you do these tasks.
BigQuery provides several interfaces you can use for administration tasks.
Often a particular task can be done with
multiple tools, letting you choose the tool that works best for you. For
example, you can create a table by using the Explorer pane in the
Google Cloud console, a
bq mk --table command, or a
CREATE TABLE SQL
- console. The Google Cloud console has several pages dedicated to BigQuery administration. For more information, see Use the console.
SQL statements. The BigQuery page in the console has a query editor where you can do administrative tasks by using DDL and DCL statements. For more information, see Data definition language (DDL) and Data control language (DCL).
You can use stored procedures to automate administration tasks that use SQL statements. For more information, see Work with stored procedures.
bqcommand-line tool lets you do many administrative tasks by using
bqcommands. You can use the
bqcommand-line tool to do tasks that aren't supported in the console, to prototype functionality before encoding it in queries or API methods, or if you prefer working in a command-line interface. For more information, see Use the
BigQuery resources include organizations, folders, projects, datasets, and tables. This section describes how to manage your organization's resources.
For information about the BigQuery resource hierarchy, see Organize BigQuery resources. In particular, you can create an Organization resource, which lets you do some tasks, such as setting access controls, at the organization level.
Datasets are containers for tables. You can create tables in a dataset, and then manage them as a group. For example, you can configure a dataset's default table expiration time, which applies to all tables in the dataset unless you override it. You can copy a group of tables by making a copy of their dataset, and you can control access to tables at the dataset level.
Refer to the following documents for more information about dataset administration:
- For more information about creating, copying, moving, and updating datasets, see Introduction to datasets.
- For more information about dataset-level access control, see Controlling access to datasets and Authorized datasets.
In BigQuery, data is stored in tables, where it can be queried. You can create tables, load data into tables from various types of sources and in various formats, partition tables based on a specific column or by ingestion time, cluster tables, update table properties, and export table data.
Refer to the following documents for more information about table administration:
- For more information about loading data into BigQuery tables, see Introduction to loading tables.
- For more information about managing tables and exporting table data, see Introduction to tables.
- For more information about partitioning and clustering tables, see Introduction to partitioned tables and Introduction to clustered tables.
To help organize your BigQuery resources, you can add labels to
your datasets, tables, and views. Labels are key-value pairs that you can attach
to a resource. After labeling your resources, you can search for them based on
label values. For example, you could use labels to group datasets by department
by adding labels like
Then you could break down your billed charges
by department using the labels.
For more information, see Introduction to labels.
Get resource metadata
You can get information about your BigQuery resources by querying
BigQuery provides a view on this
table for each type of resource. The view contains metadata for the resource;
for example, the
INFORMATION_SCHEMA.TABLES view contains table metadata.
BigQuery provides an
INFORMATION_SCHEMA view for each of the
following types of resources:
The following are a few examples of information you can obtain by querying
- See when a table was created.
- Get the names and data types of each column in a table.
- Find all the jobs currently running in a project.
- Get a list of the table snapshots that were created from a base table.
- For a dataset, table, view, or routine, get the DDL statement that can be used to create the resource.
- Get the options that were used to create a table (for example, table expiration).
- Find the partitioning and clustering columns on a table.
- Get a project's currently assigned reservation and its slot capacity.
For more information, see Introduction to BigQuery
You might want to create copies of your data for various reasons, such as protection from human error, or to retain data for comparison in the future. BigQuery provides several options for copying a table's data from a particular point in time.
Time travel. You might need to access a table's state as it was sometime within the past week; for example, if the data became corrupted due to human error. BigQuery retains historical data for your tables for seven days. You can access a table's recent historical data by using the time travel feature.
For more information, see Access historical data using time travel.
Table snapshots. If you want to be able to access a table's state from earlier than one week in the past, consider creating table snapshots periodically. Table snapshots are lightweight, read-only copies that let you preserve the state of your tables indefinitely. With table snapshots, for example, you can compare a table's current data to the data from the beginning of the year, which isn't possible by using time travel. You are only charged for storing the data that differs between the base table and its table snapshot.
For more information, see Introduction to table snapshots.
Table clones (Preview). If you want to make a lightweight, mutable copy of a table, you can use table clones. You only pay for storage of the data that differs between a base table and its table clone. For example, you could create table clones in a test environment so that you can experiment with copies of production data without impacting the production data and without having to pay for storage of full copies of the tables.
For more information, see Introduction to table clones.
BigQuery security is based on Google Cloud Identity and Access Management. BigQuery lets you control access to your resources at many levels, including access to the organization, folders, projects, datasets, tables, table columns, and table rows.
For information about controlling access to your BigQuery resources, see Overview of data security and governance.
BigQuery does many tasks on behalf of your users, including ingesting, querying, and exporting data. Each task is done by a BigQuery job. This section describes how you can monitor and manage your organization's jobs.
Jobs are actions that BigQuery runs on a user's behalf to load,
export, query, or copy data. When a user initiates one of these tasks by using
the Google Cloud console, the
bq command-line tool, a
SQL statement, or an
API call, BigQuery automatically
creates a job to execute the task.
As a BigQuery administrator, you can monitor, manage, and troubleshoot your organization's jobs to ensure they are running smoothly.
For more information, see Jobs overview.
When BigQuery executes queries, it uses units of computing called slots. BigQuery calculates how many slots are needed to run each query, depending on the size and complexity of the query.
BigQuery has two pricing models for charging for the slots that run your queries:
- On-demand billing. Your queries use a shared pool of slots, and you are charged for the number of bytes your queries process.
- Flat-rate billing. You purchase dedicated slot capacity to run your queries, and you are charged a fixed price for the dedicated slots.
These pricing models apply per project, so you can have some projects that use on-demand billing and some projects that use flat-rate billing.
With on-demand billing, after your monthly allocation of free usage is consumed, you are charged for the number of bytes that each query processes. Throughput is limited to a predefined slot quota, which is shared among the queries that run in a project.
With flat-rate billing, you purchase slots for your organization, which give you dedicated query processing capacity. Then you assign pools of those dedicated slots to individual projects in your organization. The queries that run in a project share the project's dedicated slots. You are charged based on the number of slots your purchase, rather than the number of bytes processed.
Refer to the following documents for more information about managing compute capacity for processing your queries:
For slots, and the trade-offs between on-demand billing and flat-rate billing, see Introduction to reservations.
For the different options for flat-rate billing (flex slots, monthly, or annual commitments), see Capacity commitment plans.
For creating pools of flat-rate slots, called slot reservations, see Work with slot reservations.
For allocating slot reservations to specific projects, see Work with reservation assignments.
For estimating the right number of slots to allocate to your workloads, see Estimate slot capacity requirements.
Google Cloud provides the capability to monitor and audit your resources, including BigQuery resources. This section describes the Google Cloud monitoring and auditing capabilities that apply to BigQuery.
For more information, see Introduction to BigQuery monitoring.
The Cloud Monitoring dashboard
Cloud Monitoring provides a dashboard for monitoring BigQuery. Use this dashboard to view information about BigQuery incidents, datasets, tables, projects, query times, and slot utilization.
For more information, see View the Monitoring dashboard.
Administration charts and alerts
You can use Cloud Monitoring to create custom charts based on the resources, metrics, and any aggregation that you specify.
For more information, see Dashboards and charts.
You can also create alert policies that notify you if the configured alert is triggered. For example, you could create an alert that sends an email to a specified email address if the execution time of a query exceeds a specified limit.
For more information, see Creating an alert.
You can monitor your slot usage on the Capacity management page of the Google Cloud console. You can view your capacity commitments and see where your slot reservations have been assigned. You can also use the Slot estimator (Preview) to help you estimate your organization's capacity requirements based on historical performance metrics.
For more information, see Monitoring BigQuery reservations.
Google Cloud sets limits on the use of resources, including BigQuery resources, both to ensure fair usage of shared resources, and to protect you from runaway costs. You can view your usage of BigQuery resources that have quotas, and request a higher quota, if needed, by using the console.
For more information, see BigQuery quotas and limits.
Cloud Audit Logs maintain a record of Google Cloud events, including BigQuery events. You can use the Logs Explorer to query the logs about events related to BigQuery jobs, datasets, transfers, and more. The Logs Dashboard displays information about recent errors, and you can use Logs-based metrics to count the log entries that match a given filter.
For more information, see the Google Cloud logging documentation.
You can optimize your BigQuery configuration to control both storage and query processing costs.
For assistance managing your BigQuery storage costs, see Optimize storage in BigQuery.
For assistance managing your BigQuery processing costs, see Control costs in BigQuery.
For assistance optimizing your BigQuery queries, see Introduction to optimizing query performance.
This guide provides an understanding of BigQuery reliability. Each document explores a critical facet of reliability with a use case of BigQuery implementation. Reliability topics include the following:
- An overview of reliability features - Insight into availability, durability, data consistency, consistency of performance, and data recovery inBigQuery, as well as a review of error handling considerations.
- Importing data - An in-depth look at how these dimensions of reliability apply to importing data into the managed storage system in BigQuery.
- Querying data - How to reliably query data within your BigQuery environment.
- Reading data - How reliability applies to reading data out of the BigQuery managed storage system.
- Disaster recovery - Specific failures from the loss of an individual machine all the way through catastrophic loss of a region.
In addition to the features described in this document for monitoring and managing your organization's BigQuery system, the following resources are available for troubleshooting problems that might arise:
If you need additional assistance, see Get support.
- For a video series on various topics related to BigQuery
BigQuery admin reference guide.