Managing datasets

This document describes how to manage datasets in BigQuery. After creating a dataset, you can manage the dataset in the following ways:

Renaming datasets

Currently, you cannot change the name of an existing dataset, but you can copy a dataset. See Copying datasets.

Copying datasets

To see steps for copying a dataset, including across regions, see Copying datasets.

Moving a dataset

To manually move a dataset from one location to another, follow this process:

  1. Export the data from your BigQuery tables to a regional or multi-region Cloud Storage bucket in the same location as your dataset. For example, if your dataset is in the EU multi-region location, export your data into a regional or multi-region bucket in the EU.

    There are no charges for exporting data from BigQuery, but you do incur charges for storing the exported data in Cloud Storage. BigQuery exports are subject to the limits on export jobs.

  2. Copy or move the data from your Cloud Storage bucket to a regional or multi-region bucket in the new location. For example, if you are moving your data from the US multi-region location to the Tokyo regional location, you would transfer the data to a regional bucket in Tokyo. For information on transferring Cloud Storage objects, see Renaming, copying, and moving objects in the Cloud Storage documentation.

    Note that transferring data between regions incurs network egress charges in Cloud Storage.

  3. After you transfer the data to a Cloud Storage bucket in the new location, create a new BigQuery dataset (in the new location). Then, load your data from the Cloud Storage bucket into BigQuery.

    You are not charged for loading the data into BigQuery, but you will incur charges for storing the data in Cloud Storage until you delete the data or the bucket. You are also charged for storing the data in BigQuery after it is loaded. Loading data into BigQuery is subject to the limits on load jobs.

You can also use Cloud Composer to move and copy large datasets programmatically.

For more information on using Cloud Storage to store and move large datasets, see Using Cloud Storage with big data.

Deleting datasets

You can delete a dataset in the following ways:

  • Using the Cloud Console.
  • Using the bq rm command in the bq command-line tool.
  • Calling the datasets.delete API method.
  • Using the client libraries.

Required permissions

At a minimum, to delete a dataset, you must be granted bigquery.datasets.delete permissions. If the dataset contains tables or views, bigquery.tables.delete is also required. The following predefined IAM roles include both bigquery.datasets.delete and bigquery.tables.delete permissions:

  • bigquery.dataOwner
  • bigquery.admin

In addition, if a user has bigquery.datasets.create permissions, when that user creates a dataset, they are granted bigquery.dataOwner access to it. bigquery.dataOwner access gives the user the ability to delete datasets and tables they create.

For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Checking whether a dataset exists

Java

Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Java API reference documentation.

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.Dataset;
import com.google.cloud.bigquery.DatasetId;

// Sample to check dataset exist
public class DatasetExists {

  public static void main(String[] args) {
    // TODO(developer): Replace these variables before running the sample.
    String datasetName = "MY_DATASET_NAME";
    datasetExists(datasetName);
  }

  public static void datasetExists(String datasetName) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      Dataset dataset = bigquery.getDataset(DatasetId.of(datasetName));
      if (dataset.exists()) {
        System.out.println("Dataset already exist");
      } else {
        System.out.println("Dataset not found");
      }
    } catch (BigQueryException e) {
      System.out.println("Dataset not found. \n" + e.toString());
    }
  }
}

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

from google.cloud import bigquery
from google.cloud.exceptions import NotFound

client = bigquery.Client()

# TODO(developer): Set dataset_id to the ID of the dataset to determine existence.
# dataset_id = "your-project.your_dataset"

try:
    client.get_dataset(dataset_id)  # Make an API request.
    print("Dataset {} already exists".format(dataset_id))
except NotFound:
    print("Dataset {} is not found".format(dataset_id))

Deleting a dataset

When you delete a dataset by using the Cloud Console, tables and views in the dataset (and the data they contain) are deleted. When you delete a dataset by using the bq command-line tool, you must use the -r flag to delete the dataset's tables and views.

To delete a dataset:

Console

  1. Select your dataset from the Resources pane, and then click Delete dataset on the right side of the window.

    Delete dataset

  2. In the Delete dataset dialog, type the name of the dataset into the text box, and then click Delete.

bq

Use the bq rm command with the (optional) --dataset or -d shortcut flag to delete a dataset. When you use the bq command-line tool to remove a dataset, you must confirm the command. You can use the -f flag to skip confirmation.

In addition, if the dataset contains tables, you must use the -r flag to remove all tables in the dataset. If you are deleting a table in a project other than your default project, add the project ID to the dataset name in the following format: project_id:dataset.

<pre>
bq rm -r -f -d <var>project_id</var>:<var>dataset</var>
<pre>

Replace the following:

  • project_id is your project ID.
  • dataset is the name of the dataset you're deleting.

Examples:

Enter the following command to remove mydataset and all the tables in it from your default project. The command uses the optional -d shortcut.

bq rm -r -d mydataset

When prompted, type y and press enter.

Enter the following command to remove mydataset and all the tables in it from myotherproject. The command does not use the optional -d shortcut. The -f flag is used to skip confirmation.

bq rm -r -f myotherproject:mydataset

API

Call the datasets.delete method to delete the dataset and set the deleteContents parameter to true to delete the tables in it.

C#

Before trying this sample, follow the C# setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery C# API reference documentation.


using Google.Cloud.BigQuery.V2;
using System;

public class BigQueryDeleteDataset
{
    public void DeleteDataset(
        string projectId = "your-project-id",
        string datasetId = "your_empty_dataset"
    )
    {
        BigQueryClient client = BigQueryClient.Create(projectId);
        // Delete a dataset that does not contain any tables
        client.DeleteDataset(datasetId: datasetId);
        Console.WriteLine($"Dataset {datasetId} deleted.");
    }
}

Go

Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Go API reference documentation.

import (
	"context"
	"fmt"

	"cloud.google.com/go/bigquery"
)

// deleteDataset demonstrates the deletion of an empty dataset.
func deleteDataset(projectID, datasetID string) error {
	// projectID := "my-project-id"
	// datasetID := "mydataset"
	ctx := context.Background()

	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("bigquery.NewClient: %v", err)
	}
	defer client.Close()

	// To recursively delete a dataset and contents, use DeleteWithContents.
	if err := client.Dataset(datasetID).Delete(ctx); err != nil {
		return fmt.Errorf("Delete: %v", err)
	}
	return nil
}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Java API reference documentation.

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQuery.DatasetDeleteOption;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.DatasetId;

public class DeleteDataset {

  public static void runDeleteDataset() {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "MY_PROJECT_ID";
    String datasetName = "MY_DATASET_NAME";
    deleteDataset(projectId, datasetName);
  }

  public static void deleteDataset(String projectId, String datasetName) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      DatasetId datasetId = DatasetId.of(projectId, datasetName);
      boolean success = bigquery.delete(datasetId, DatasetDeleteOption.deleteContents());
      if (success) {
        System.out.println("Dataset deleted successfully");
      } else {
        System.out.println("Dataset was not found");
      }
    } catch (BigQueryException e) {
      System.out.println("Dataset was not deleted. \n" + e.toString());
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Node.js API reference documentation.

// Import the Google Cloud client library
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();

async function deleteDataset() {
  // Deletes a dataset named "my_dataset".

  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  // const datasetId = 'my_dataset';

  // Create a reference to the existing dataset
  const dataset = bigquery.dataset(datasetId);

  // Delete the dataset and its contents
  await dataset.delete({force: true});
  console.log(`Dataset ${dataset.id} deleted.`);
}

PHP

Before trying this sample, follow the PHP setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery PHP API reference documentation.

use Google\Cloud\BigQuery\BigQueryClient;

/** Uncomment and populate these variables in your code */
// $projectId = 'The Google project ID';
// $datasetId = 'The BigQuery dataset ID';

$bigQuery = new BigQueryClient([
    'projectId' => $projectId,
]);
$dataset = $bigQuery->dataset($datasetId);
$table = $dataset->delete();
printf('Deleted dataset %s' . PHP_EOL, $datasetId);

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.


from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set model_id to the ID of the model to fetch.
# dataset_id = 'your-project.your_dataset'

# Use the delete_contents parameter to delete a dataset and its contents.
# Use the not_found_ok parameter to not receive an error if the dataset has already been deleted.
client.delete_dataset(
    dataset_id, delete_contents=True, not_found_ok=True
)  # Make an API request.

print("Deleted dataset '{}'.".format(dataset_id))

Ruby

Before trying this sample, follow the Ruby setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Ruby API reference documentation.

require "google/cloud/bigquery"

def delete_dataset dataset_id = "my_empty_dataset"
  bigquery = Google::Cloud::Bigquery.new

  # Delete a dataset that does not contain any tables
  dataset = bigquery.dataset dataset_id
  dataset.delete
  puts "Dataset #{dataset_id} deleted."
end

Next steps