Creating datasets

This document describes how to create datasets in BigQuery.

You can create datasets by:

  • Using the GCP Console or the classic BigQuery web UI
  • Using the bq mk CLI command
  • Calling the datasets.insert API method.

Dataset limitations

BigQuery datasets are subject to the following limitations:

  • You can set the geographic location at creation time only. After a dataset has been created, the location becomes immutable and can't be changed in the GCP Console, the classic BigQuery web UI, the command-line tool, or by calling the patch or update API methods.
  • All tables referenced in a query must be stored in datasets in the same location.

  • When copying a table, the datasets containing the source table and destination table must reside in the same location.

  • Dataset names must be unique per project.

  • As you approach thousands of datasets in a project, classic UI performance begins to degrade, and listing datasets becomes slower.

Naming datasets

When you create a dataset in BigQuery, the dataset name must be unique per project. The dataset name:

  • May contain up to 1,024 characters
  • Can contain letters (upper or lower case), numbers, and underscores

  • Is case-sensitive: mydataset and MyDataset can co-exist in the same project

Dataset names cannot:

  • Contain spaces or special characters such as -, &, @, or %

Required permissions

To create a dataset, you must be assigned a project-level primitive role or a predefined IAM role that provides bigquery.datasets.create permissions. The following predefined, project-level IAM roles include bigquery.datasets.create permissions:

  • bigquery.dataEditor
  • bigquery.dataOwner
  • bigquery.user
  • bigquery.admin

For more information on IAM roles and permissions in BigQuery, see Access control.

Creating a dataset

To create a dataset:

Console

  1. Open the BigQuery web UI in the GCP Console.

    Go to the BigQuery web UI

  2. In the navigation panel, in the Resources section, select your project.

  3. On the right side of the window, in the details panel, click Create dataset.

  4. On the Create dataset page:

    • For Dataset ID, enter a unique dataset name.
    • (Optional) For Data location, choose a geographic location for the dataset. If you leave the value set to Default, the location is set to US. After a dataset is created, the location can't be changed.

    • For Default table expiration, choose one of the following options:

      • Never: (Default) Tables created in the dataset are never automatically deleted. You must delete them manually.
      • Number of days after table creation: This value determines when a newly created table in the dataset is deleted. This value is applied if you do not set a table expiration when the table is created.
    • Click Create dataset.

Classic UI

  1. Click the down arrow icon down arrow icon next to your project name in the navigation and click Create new dataset.

  2. In the Create Dataset dialog:

    • For Dataset ID, enter a unique dataset name.
    • For Data location, choose a location for the dataset. The default value is Unspecified which sets the dataset location to US. After a dataset is created, the location can't be changed.

    • For Data expiration, choose one of the following options:

      • Never: (Default) Tables created in the dataset are never automatically deleted. You must delete them manually.
      • In [INTEGER] days: Any table created in the dataset is deleted after [INTEGER] days from its creation time. This value is applied if you do not set a table expiration when the table is created.

        Create dataset

        Data expiration refers to the default table expiration for new tables created in the dataset. You cannot currently set a default partition expiration in the BigQuery web UI when you create a dataset. You can set a default partition expiration after the dataset is created by using the command-line tool or the API.

    • Click OK.

Command-line

Use the bq mk command with the --location flag to create a new dataset. Optional parameters include --default_table_expiration, --default_partition_expiration, and --description.

To create a dataset in a project other than your default project, add the project ID to the dataset name in the following format: [PROJECT_ID]:[DATASET].

bq --location=[LOCATION] mk --dataset \
--default_table_expiration [INTEGER] \
--default_partition_expiration [INTEGER2] \
--description [DESCRIPTION] \
[PROJECT_ID]:[DATASET]

Where:

  • [LOCATION] is the dataset's location. After a dataset is created, the location can't be changed. You can set a default value for the location by using the .bigqueryrc file.

  • [INTEGER] is the default lifetime (in seconds) for newly-created tables. The minimum value is 3600 seconds (one hour). The expiration time evaluates to the current time plus the integer value. Any table created in the dataset is deleted after [INTEGER] seconds from its creation time. This value is applied if you do not set a table expiration when the table is created.

  • [INTEGER2] is the default lifetime (in seconds) for partitions in newly-created partitioned tables. The default partition expiration has no minimum value. The expiration time evaluates to the partition's date plus the integer value. Any partition created in a partitioned table in the dataset is deleted after [INTEGER2] seconds from the partition's date. If you supply the --time_partitioning_expiration flag when you create or update a partitioned table, the table-level partition expiration takes precedence over the dataset-level default partition expiration.

  • [DESCRIPTION] is a description of the dataset in quotes.

  • [PROJECT_ID] is your project ID.

  • [DATASET] is the name of the dataset you're creating.

For example, the following command creates a dataset named mydataset with data location set to US, a default table expiration of 3600 seconds (1 hour), and a description of This is my dataset. Instead of using the --dataset flag, the command uses the -d shortcut. If you omit -d and --dataset, the command defaults to creating a dataset.

bq --location=US mk -d \
--default_table_expiration 3600 \
--description "This is my dataset." \
mydataset

You can enter the bq ls command to confirm that the dataset was created. Also, you can create a table when you create a new dataset using the format: bq mk -t [DATASET].[TABLE]. For more information on creating tables, see Creating a table.

API

Call the datasets.insert method with a defined dataset resource.

C#

Before trying this sample, follow the C# setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery C# API reference documentation .


using Google.Cloud.BigQuery.V2;
using System;

public class BigQueryCreateDataset
{
    public BigQueryDataset CreateDataset(
        string projectId = "your-project-id",
        string location = "US"
    )
    {
        BigQueryClient client = BigQueryClient.Create(projectId);
        var createDatasetOptions = new CreateDatasetOptions()
        {
            // Specify the geographic location where the dataset should reside.
            Location = location
        };
        // Create the dataset
        return client.CreateDataset(
            datasetId: "your_new_dataset_id", options: createDatasetOptions);
    }
}

Go

Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Go API reference documentation .

// To run this sample, you will need to create (or reuse) a context and
// an instance of the bigquery client.  For example:
// import "cloud.google.com/go/bigquery"
// ctx := context.Background()
// client, err := bigquery.NewClient(ctx, "your-project-id")
meta := &bigquery.DatasetMetadata{
	Location: "US", // Create the dataset in the US.
}
if err := client.Dataset(datasetID).Create(ctx, meta); err != nil {
	return err
}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Java API reference documentation .

Dataset dataset = null;
DatasetInfo datasetInfo = DatasetInfo.newBuilder(datasetName).build();
try {
  // the dataset was created
  dataset = bigquery.create(datasetInfo);
} catch (BigQueryException e) {
  // the dataset was not created
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Node.js API reference documentation .

// Import the Google Cloud client library
const {BigQuery} = require('@google-cloud/bigquery');

async function createDataset() {
  // Creates a new dataset named "my_dataset".

  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  // const datasetId = "my_new_dataset";

  // Create a client
  const bigqueryClient = new BigQuery();

  // Specify the geographic location where the dataset should reside
  const options = {
    location: 'US',
  };

  // Create a new dataset
  const [dataset] = await bigqueryClient.createDataset(datasetId, options);
  console.log(`Dataset ${dataset.id} created.`);
}

createDataset();

PHP

Before trying this sample, follow the PHP setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery PHP API reference documentation .

use Google\Cloud\BigQuery\BigQueryClient;

/** Uncomment and populate these variables in your code */
// $projectId = 'The Google project ID';
// $datasetId = 'The BigQuery dataset ID';

$bigQuery = new BigQueryClient([
    'projectId' => $projectId,
]);
$dataset = $bigQuery->createDataset($datasetId);
printf('Created dataset %s' . PHP_EOL, $datasetId);

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

from google.cloud import bigquery

# TODO(developer): Construct a BigQuery client object.
# client = bigquery.Client()

# TODO(developer): Set dataset_id to the ID of the dataset to create.
# dataset_id = "{}.your_dataset".format(client.project)

# Construct a full Dataset object to send to the API.
dataset = bigquery.Dataset(dataset_id)

# TODO(developer): Specify the geographic location where the dataset should reside.
dataset.location = "US"

# Send the dataset to the API for creation.
# Raises google.api_core.exceptions.Conflict if the Dataset already
# exists within the project.
dataset = client.create_dataset(dataset)  # API request
print("Created dataset {}.{}".format(client.project, dataset.dataset_id))

Ruby

Before trying this sample, follow the Ruby setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Ruby API reference documentation .

require "google/cloud/bigquery"

def create_dataset dataset_id = "my_dataset", location = "US"
  bigquery = Google::Cloud::Bigquery.new

  # Create the dataset in a specified geographic location
  bigquery.create_dataset dataset_id, location: location

  puts "Created dataset: #{dataset_id}"
end

Next steps

Oliko tästä sivusta apua? Kerro mielipiteesi

Palautteen aihe:

Tämä sivu
BigQuery
Tarvitsetko apua? Siirry tukisivullemme.