Introduction to datasets

This page provides an overview of datasets in BigQuery.

Datasets

A dataset is contained within a specific project. Datasets are top-level containers that are used to organize and control access to your tables and views. A table or view must belong to a dataset, so you need to create at least one dataset before loading data into BigQuery. Use the format projectname.datasetname to fully qualify a dataset name when using GoogleSQL, or the format projectname:datasetname to fully qualify a dataset name when using the bq command-line tool.

Location

You specify a location for storing your BigQuery data when you create a dataset. For a list of BigQuery dataset locations, see BigQuery locations. After you create the dataset, the location cannot be changed, but you can copy datasets to different locations, or manually move (recreate) the dataset in a different location.

BigQuery processes queries in the same location as the dataset that contains the tables you're querying. BigQuery stores your data in the selected location in accordance with the Service Specific Terms.

Limitations

BigQuery datasets are subject to the following limitations:

  • The dataset location can only be set at creation time. After a dataset is created, its location cannot be changed.
  • All tables that are referenced in a query must be stored in datasets in the same location.

  • When you copy a table, the datasets that contain the source table and destination table must reside in the same location.

  • Dataset names must be unique for each project.

Quotas

For more information on dataset quotas and limits, see Quotas and limits.

Data retention

Datasets use time travel in conjunction with the fail-safe period to retain deleted and modified data for a short time, in case you need to recover it. For more information, see Data retention with time travel and fail-safe.

Storage billing models

When you create a dataset, the storage used by that dataset is billed to you using logical bytes as the default unit of consumption. However, you can choose to use physical bytes for billing instead. You can also change an existing dataset's storage billing model to use physical bytes.

When you change a dataset's billing model, it takes 24 hours for the change to take effect. Any tables or table partitions in long-term storage are not reset to active storage when you change a dataset's billing model. Query performance and query latency are not affected by changing a dataset's billing model.

Once you change a dataset's storage billing model, you must wait 14 days before you can change the storage billing model again.

When you set your storage billing model to use physical bytes, the total active storage costs you are billed for include the bytes used for time travel and fail-safe storage. You can configure the time travel window to balance storage costs with your data retention needs. For more information on forecasting your storage costs, see Forecast storage billing.

Eligibility criteria:

The dataset storage billing model is only available for your datasets if your organization does not have any existing flat-rate slot commitments located in the same region as the dataset. Your organization can enroll datasets for physical storage billing when there are no flat-rate commitments located in the same region as the dataset.

Pricing

You are not charged for creating, updating, or deleting a dataset.

For more information on BigQuery pricing, see Pricing.

Security

To control access to datasets in BigQuery, see Controlling access to datasets. For information about data encryption, see Encryption at rest.

What's next