This document describes how to create datasets in BigQuery.
You can create datasets by:
- Using the Cloud Console or the classic BigQuery web UI
- Using the
bq mk
CLI command - Calling the
datasets.insert
API method - Using the client libraries
- Copying an existing dataset.
To see steps for copying a dataset, including across regions, see Copying datasets.
Dataset limitations
BigQuery datasets are subject to the following limitations:
- You can set the geographic location at creation time only. After a dataset has
been created, the location becomes immutable and can't be changed in the
Cloud Console, the classic BigQuery web UI, the command-line tool, or by
calling the
patch
orupdate
API methods. All tables referenced in a query must be stored in datasets in the same location.
When copying a table, the datasets containing the source table and destination table must reside in the same location.
Dataset names must be unique per project.
As you approach thousands of datasets in a project, classic UI performance begins to degrade, and listing datasets becomes slower.
Naming datasets
When you create a dataset in BigQuery, the dataset name must be unique per project. The dataset name:
- May contain up to 1,024 characters
Can contain letters (upper or lower case), numbers, and underscores
Is case-sensitive:
mydataset
andMyDataset
can co-exist in the same project
Dataset names cannot:
- Contain spaces or special characters such as -, &, @, or %
Required permissions
At a minimum, to create a dataset, you must be granted
bigquery.datasets.create
permissions. The following predefined
Cloud IAM roles include bigquery.datasets.create
permissions:
bigquery.dataEditor
bigquery.dataOwner
bigquery.user
bigquery.admin
For more information on Cloud IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Creating a dataset
To create a dataset:
Console
Open the BigQuery web UI in the Cloud Console.
In the navigation panel, in the Resources section, select your project.
On the right side of the window, in the details panel, click Create dataset.
On the Create dataset page:
- For Dataset ID, enter a unique dataset name.
(Optional) For Data location, choose a geographic location for the dataset. If you leave the value set to Default, the location is set to
US
. After a dataset is created, the location can't be changed.For Default table expiration, choose one of the following options:
- Never: (Default) Tables created in the dataset are never automatically deleted. You must delete them manually.
- Number of days after table creation: This value determines when a newly created table in the dataset is deleted. This value is applied if you do not set a table expiration when the table is created.
Click Create dataset.
Classic UI
Click the down arrow icon
next to your project name in the navigation and click Create new dataset.
In the Create Dataset dialog:
- For Dataset ID, enter a unique dataset name.
For Data location, choose a location for the dataset. The default value is Unspecified which sets the dataset location to
US
. After a dataset is created, the location can't be changed.For Data expiration, choose one of the following options:
- Never: (Default) Tables created in the dataset are never automatically deleted. You must delete them manually.
In integer days: Any table created in the dataset is deleted after integer days from its creation time. This value is applied if you do not set a table expiration when the table is created.
Data expiration refers to the default table expiration for new tables created in the dataset. You cannot currently set a default partition expiration in the BigQuery web UI when you create a dataset. You can set a default partition expiration after the dataset is created by using the command-line tool or the API.
Click OK.
CLI
Use the bq mk
command with the --location
flag to create a new dataset.
Optional parameters include --default_table_expiration
,
--default_partition_expiration
, and --description
.
To create a dataset in a project other than your default project, add the
project ID to the dataset name in the following format:
project_id:dataset
.
bq --location=location mk \ --dataset \ --default_table_expiration integer1 \ --default_partition_expiration integer2 \ --description description \ project_id:dataset
Where:
location is the dataset's location. After a dataset is created, the location can't be changed. You can set a default value for the location by using the .bigqueryrc file.
integer1 is the default lifetime (in seconds) for newly-created tables. The minimum value is 3600 seconds (one hour). The expiration time evaluates to the current time plus the integer value. Any table created in the dataset is deleted after [INTEGER] seconds from its creation time. This value is applied if you do not set a table expiration when the table is created.
integer2 is the default lifetime (in seconds) for partitions in newly-created partitioned tables. The default partition expiration has no minimum value. The expiration time evaluates to the partition's date plus the integer value. Any partition created in a partitioned table in the dataset is deleted after [INTEGER2] seconds from the partition's date. If you supply the
--time_partitioning_expiration
flag when you create or update a partitioned table, the table-level partition expiration takes precedence over the dataset-level default partition expiration.description is a description of the dataset in quotes.
project_id is your project ID.
dataset is the name of the dataset you're creating.
For example, the following command creates a dataset named mydataset
with data
location set to US
, a default table expiration of 3600 seconds (1 hour), and a
description of This is my dataset
. Instead of using the --dataset
flag, the
command uses the -d
shortcut. If you omit -d
and --dataset
, the command
defaults to creating a dataset.
bq --location=US mk -d \
--default_table_expiration 3600 \
--description "This is my dataset." \
mydataset
API
Call the datasets.insert
method with a defined dataset resource.
C#
Before trying this sample, follow the C# setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery C# API reference documentation .
Go
Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Go API reference documentation .
Java
Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Java API reference documentation .
Node.js
Before trying this sample, follow the Node.js setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Node.js API reference documentation .
PHP
Before trying this sample, follow the PHP setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery PHP API reference documentation .
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .
Ruby
Before trying this sample, follow the Ruby setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Ruby API reference documentation .
Next steps
- For more information on assigning access controls to datasets, see Controlling access to datasets.
- For more information on listing datasets in a project, see Listing datasets.
- For more information on dataset metadata, see Getting information about datasets.
- For more information on changing dataset properties, see Updating datasets.
- For more information on creating and managing labels, see Creating and managing labels.