Updating dataset properties

This document describes how to update dataset properties in BigQuery. After creating a dataset, you can update the following dataset properties.

Required permissions

At a minimum, to update dataset properties, you must be granted bigquery.datasets.update and bigquery.datasets.get permissions. The following predefined Cloud IAM roles include bigquery.datasets.update and bigquery.datasets.get permissions:

  • bigquery.dataOwner
  • bigquery.admin

In addition, if a user has bigquery.datasets.create permissions, when that user creates a dataset, they are granted bigquery.dataOwner access to it. bigquery.dataOwner access gives users the ability to update properties for datasets they create.

For more information on Cloud IAM roles and permissions in BigQuery, see Access control.

Updating dataset descriptions

You can update a dataset's description by:

  • Using the Cloud Console or the classic BigQuery web UI
  • Using the bq update CLI command
  • Calling the datasets.patch API method
  • Using the client libraries

To update a dataset's description:

Console

  1. In the Resources pane, select your dataset.

  2. In the Details page, click the pencil icon next to Description to edit the description text.

    Query settings

  3. In the dialog, enter a description in the box or edit the existing description. Click Update to save the new description text.

Classic UI

  1. In the navigation pane, select your dataset.

  2. On the Dataset Details page, in the Description section, click Describe this dataset to open the description box if the dataset has no description. Otherwise, click the existing description text.

  3. Enter a description in the box or edit the existing description. When you click away from the box, the text is saved.

    Dataset description

CLI

Issue the bq update command with the --description flag. If you are updating a dataset in a project other than your default project, add the project ID to the dataset name in the following format: project_id:dataset.

bq update \
--description "string" \
project_id:dataset

Where:

  • string is the text that describes the dataset in quotes.
  • project_id is your project ID.
  • dataset is the name of the dataset you're updating.

Examples:

Enter the following command to change the description of mydataset to "Description of mydataset." mydataset is in your default project.

bq update --description "Description of mydataset" mydataset

Enter the following command to change the description of mydataset to "Description of mydataset." The dataset is in myotherproject, not your default project.

bq update \
--description "Description of mydataset" \
myotherproject:mydataset

API

Call datasets.patch and update the description property in the dataset resource. Because the datasets.update method replaces the entire dataset resource, the datasets.patch method is preferred.

Go

Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Go API reference documentation .

ds := client.Dataset(datasetID)
meta, err := ds.Metadata(ctx)
if err != nil {
	return err
}
update := bigquery.DatasetMetadataToUpdate{
	Description: "Updated Description.",
}
if _, err = ds.Update(ctx, update, meta.ETag); err != nil {
	return err
}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Java API reference documentation .

Create a Dataset.Builder instance from an existing Dataset instance with the Dataset.toBuilder() method. Configure the dataset builder object. Build the updated dataset with the Dataset.Builder.build() method, and call the Dataset.update() method to send the update to the API.
Dataset oldDataset = bigquery.getDataset(datasetName);
DatasetInfo datasetInfo = oldDataset.toBuilder().setDescription(newDescription).build();
Dataset newDataset = bigquery.update(datasetInfo);

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Node.js API reference documentation .

// Import the Google Cloud client library
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();

async function updateDatasetDescription() {
  // Updates a dataset's description.

  // Retreive current dataset metadata
  const dataset = bigquery.dataset(datasetId);
  const [metadata] = await dataset.getMetadata();

  // Set new dataset description
  const description = 'New dataset description.';
  metadata.description = description;

  const [apiResponse] = await dataset.setMetadata(metadata);
  const newDescription = apiResponse.description;

  console.log(`${datasetId} description: ${newDescription}`);
}

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

Configure the Dataset.description property and call Client.update_dataset() to send the update to the API.
# TODO(developer): Import the client library.
# from google.cloud import bigquery

# TODO(developer): Construct a BigQuery client object.
# client = bigquery.Client()

# TODO(developer): Set dataset_id to the ID of the dataset to fetch.
# dataset_id = 'your-project.your_dataset'

dataset = client.get_dataset(dataset_id)  # Make an API request.
dataset.description = "Updated description."
dataset = client.update_dataset(dataset, ["description"])  # Make an API request.

full_dataset_id = "{}.{}".format(dataset.project, dataset.dataset_id)
print(
    "Updated dataset '{}' with description '{}'.".format(
        full_dataset_id, dataset.description
    )
)

Updating default table expiration times

You can update a dataset's default table expiration time by:

  • Using the Cloud Console or the classic BigQuery web UI
  • Using the bq update CLI command
  • Calling the datasets.patch API method
  • Using the client libraries

You can set a default table expiration time at the dataset level, or you can set a table's expiration time when the table is created. If you set the expiration when the table is created, the dataset's default table expiration is ignored. If you do not set a default table expiration at the dataset level, and you do not set a table expiration when the table is created, the table never expires and you must delete the table manually.

When you update a dataset's default table expiration setting:

  • If you change the value from Never to a defined expiration time, any tables that already exist in the dataset will not expire unless the expiration time was set on the table when it was created.
  • If you are changing the value for the default table expiration, any tables that already exist expire according to the original table expiration setting. Any new tables created in the dataset have the new table expiration setting applied unless you specify a different table expiration on the table when it is created.

The value for default table expiration is expressed differently depending on where the value is set. Use the method that gives you the appropriate level of granularity:

  • In the Cloud Console and the classic BigQuery web UI, expiration is expressed in days.
  • In the command-line tool, expiration is expressed in seconds.
  • In the API, expiration is expressed in milliseconds.

To update the default expiration time for a dataset:

Console

  1. In the Resources pane, select your dataset.

  2. In the Details page, click the pencil icon next to Dataset info to edit the expiration.

  3. In the Dataset info dialog, in the Default table expiration section, enter a value for Number of days after table creation.

  4. Click Save.

Classic UI

  1. In the navigation pane, select your dataset.

  2. On the Dataset Details page, in the Details section, to the right of Default Table Expiration, click Edit.

    Table expiration

  3. In the Update Expiration dialog, for Data expiration, click In and enter the expiration time in days. The default value is Never.

CLI

To update the default expiration time for newly created tables in a dataset, enter the bq update command with the --default_table_expiration flag. If you are updating a dataset in a project other than your default project, add the project ID to the dataset name in the following format: project_id:dataset.

bq update \
--default_table_expiration integer \
project_id:dataset

Where:

  • integer is the default lifetime (in seconds) for newly-created tables. The minimum value is 3600 seconds (one hour). The expiration time evaluates to the current UTC time plus the integer value. Specify 0 to remove the existing expiration time. Any table created in the dataset is deleted after integer seconds from its creation time. This value is applied if you do not set a table expiration when the table is created.
  • project_id is your project ID.
  • dataset is the name of the dataset you're updating.

Examples:

Enter the following command to set the default table expiration for new tables created in mydataset to two hours (7200 seconds) from the current time. The dataset is in your default project.

bq update --default_table_expiration 7200 mydataset

Enter the following command to set the default table expiration for new tables created in mydataset to two hours (7200 seconds) from the current time. The dataset is in myotherproject, not your default project.

bq update --default_table_expiration 7200 myotherproject:mydataset

API

Call datasets.patch and update the defaultTableExpirationMs property in the dataset resource. The expiration is expressed in in milliseconds in the API. Because the datasets.update method replaces the entire dataset resource, the datasets.patch method is preferred.

Go

Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Go API reference documentation .

ds := client.Dataset(datasetID)
meta, err := ds.Metadata(ctx)
if err != nil {
	return err
}
update := bigquery.DatasetMetadataToUpdate{
	DefaultTableExpiration: 24 * time.Hour,
}
if _, err := client.Dataset(datasetID).Update(ctx, update, meta.ETag); err != nil {
	return err
}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Java API reference documentation .

Create a Dataset.Builder instance from an existing Dataset instance with the Dataset.toBuilder() method. Configure the dataset builder object. Build the updated dataset with the Dataset.Builder.build() method, and call the Dataset.update() method to send the update to the API.

Configure the default expiration time with the Dataset.Builder.setDefaultTableLifetime() method.

Long beforeExpiration = dataset.getDefaultTableLifetime();

Long oneDayMilliseconds = 24 * 60 * 60 * 1000L;
DatasetInfo.Builder builder = dataset.toBuilder();
builder.setDefaultTableLifetime(oneDayMilliseconds);
bigquery.update(builder.build());  // API request.

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Node.js API reference documentation .

// Import the Google Cloud client library
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();

async function updateDatasetExpiration() {
  // Updates the lifetime of all tables in the dataset, in milliseconds.

  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  // const datasetId = "my_dataset";

  // Retreive current dataset metadata
  const dataset = bigquery.dataset(datasetId);
  const [metadata] = await dataset.getMetadata();

  // Set new dataset metadata
  const expirationTime = 24 * 60 * 60 * 1000;
  metadata.defaultTableExpirationMs = expirationTime.toString();

  const [apiResponse] = await dataset.setMetadata(metadata);
  const newExpirationTime = apiResponse.defaultTableExpirationMs;

  console.log(`${datasetId} expiration: ${newExpirationTime}`);
}

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

Configure the Dataset.default_table_expiration_ms property and call Client.update_dataset() to send the update to the API.
# TODO(developer): Import the client library.
# from google.cloud import bigquery

# TODO(developer): Construct a BigQuery client object.
# client = bigquery.Client()

# TODO(developer): Set dataset_id to the ID of the dataset to fetch.
# dataset_id = 'your-project.your_dataset'

dataset = client.get_dataset(dataset_id)  # Make an API request.
dataset.default_table_expiration_ms = 24 * 60 * 60 * 1000  # In milliseconds.

dataset = client.update_dataset(
    dataset, ["default_table_expiration_ms"]
)  # Make an API request.

full_dataset_id = "{}.{}".format(dataset.project, dataset.dataset_id)
print(
    "Updated dataset {} with new expiration {}".format(
        full_dataset_id, dataset.default_table_expiration_ms
    )
)

Updating default partition expiration times

You can update a dataset's default partition expiration by:

  • Using the bq update CLI command
  • Calling the datasets.patch API method
  • Using the client libraries

Setting or updating a dataset's default partition expiration is not currently supported by the Cloud Console or the classic BigQuery web UI.

You can set a default partition expiration time at the dataset level that affects all newly created partitioned tables, or you can set a partition expiration time for individual tables when the partitioned tables are created. If you set the default partition expiration at the dataset level, and you set the default table expiration at the dataset level, new partitioned tables will only have a partition expiration. If both options are set, the default partition expiration overrides the default table expiration.

If you set the partition expiration time when the partitioned table is created, that value overrides the dataset-level default partition expiration if it exists.

If you do not set a default partition expiration at the dataset level, and you do not set a partition expiration when the table is created, the partitions never expire and you must delete the partitions manually.

When you set a default partition expiration on a dataset, the expiration applies to all partitions in all partitioned tables created in the dataset. When you set the partition expiration on a table, the expiration applies to all partitions created in the specified table. Currently, you cannot apply different expiration times to different partitions in the same table.

When you update a dataset's default partition expiration setting:

  • If you change the value from "never" to a defined expiration time, any partitions that already exist in partitioned tables in the dataset will not expire unless the partition expiration time was set on the table when it was created.
  • If you are changing the value for the default partition expiration, any partitions in existing partitioned tables expire according to the original default partition expiration. Any new partitioned tables created in the dataset have the new default partition expiration setting applied unless you specify a different partition expiration on the table when it is created.

The value for default partition expiration is expressed differently depending on where the value is set. Use the method that gives you the appropriate level of granularity:

  • In the command-line tool, expiration is expressed in seconds.
  • In the API, expiration is expressed in milliseconds.

To update the default partition expiration time for a dataset:

Console

Updating a dataset's default partition expiration is not currently supported by the Cloud Console.

Classic UI

Updating a dataset's default partition expiration is not currently supported by the classic BigQuery web UI.

CLI

To update the default expiration time for a dataset, enter the bq update command with the --default_partition_expiration flag. If you are updating a dataset in a project other than your default project, add the project ID to the dataset name in the following format: project_id:dataset.

bq update \
--default_partition_expiration integer \
project_id:dataset

Where:

  • integer is the default lifetime (in seconds) for partitions in newly-created partitioned tables. This flag has no minimum value. Specify 0 to remove the existing expiration time. Any partitions in newly-created partitioned tables are deleted after [INTEGER] seconds from the partition's UTC date. This value is applied if you do not set a partition expiration on the table when it is created.
  • project_id is your project ID.
  • dataset is the name of the dataset you're updating.

Examples:

Enter the following command to set the default partition expiration for new partitioned tables created in mydataset to 26 hours (93,600 seconds). The dataset is in your default project.

bq update --default_partition_expiration 93600 mydataset

Enter the following command to set the default partition expiration for new partitioned tables created in mydataset to 26 hours (93,600 seconds). The dataset is in myotherproject, not your default project.

bq update --default_partition_expiration 93600 myotherproject:mydataset

API

Call datasets.patch and update the defaultPartitionExpirationMs property in the dataset resource. The expiration is expressed in milliseconds. Because the datasets.update method replaces the entire dataset resource, the datasets.patch method is preferred.

Updating dataset access controls

The process for updating a dataset's access controls is very similar to the process for assigning access controls to a dataset. Access controls cannot be applied during dataset creation using the Cloud Console, the classic BigQuery web UI or the command-line tool. You must create the dataset first and then update the dataset's access controls. The API allows you to update dataset access controls by calling the datasets.patch method.

When you update access controls on a dataset, you can modify access for the following entities:

  • Google account e-mail: Grants an individual Google account access to the dataset
  • Google Group: Grants all members of a Google group access to the dataset
  • Google Apps Domain: Grants all users and groups in a Google domain access to the dataset
  • Service account: Grants a service account access to the dataset
  • Anybody: Enter "allUsers" to grant access to the general public
  • All Google accounts: Enter "allAuthenticatedUsers" to grant access to any user signed in to a Google Account
  • Authorized views: Grants an authorized view access to the dataset

To update access controls on a dataset:

Console

  1. In the navigation pane, in the Resources section, click your dataset.

  2. Click Share Dataset.

  3. In the Share Dataset dialog, to delete existing entries, expand the entry and then click the delete icon (trash can).

  4. In the Share Dataset dialog, to add new entries:

    1. Enter the entity in the Add members box.

    2. For Select a role, choose an appropriate Cloud IAM role from the list. For more information on the permissions assigned to each predefined BigQuery role, see the Predefined roles and permissions page.

    3. Click Add.

  5. To add an authorized view, click the Authorized View tab and enter the project, dataset, and view, and then click Add.

  6. When you are done adding or deleting your access controls, click Done.

Classic UI

  1. Click the drop-down arrow to the right of the dataset and choose Share Dataset.

  2. In the Share Dataset dialog, to modify existing entries:

    • Remove existing entries by clicking the X icon to the right of the entity.
    • Change permissions for an entity by clicking the permissions button and choosing an appropriate access level: Is owner (OWNER), Can edit (WRITER), or Can view (READER). For more information on dataset-level roles, see Primitive roles and permissions.
  3. In the Share Dataset dialog, to add new entries:

    1. Click the drop-down to the left of the Add People field and choose the appropriate option.

    2. Type a value in the text box. For example, if you chose User by e-mail, type the user's email address.

    3. To the right of the Add People field, click Can view and choose the appropriate role from the list.

      Add people to dataset

    4. Click Add.

  4. When you are done adding, deleting, or modifying your access controls, click Save changes.

  5. Verify your access controls by clicking the drop-down arrow to the right of the dataset and choosing Share Dataset. You can confirm the settings in the Share Dataset dialog.

CLI

  1. Write the existing dataset information (including access controls) to a JSON file using the show command. If the dataset is in a project other than your default project, add the project ID to the dataset name in the following format: project_id:dataset.

    bq show \
    --format=prettyjson \
    project_id:dataset > path_to_file
    

    Where:

    • project_id is your project ID.
    • dataset is the name of your dataset.
    • path_to_file is the path to the JSON file on your local machine.

    Examples:

    Enter the following command to write the access controls for mydataset to a JSON file. mydataset is in your default project.

    bq show --format=prettyjson mydataset > /tmp/mydataset.json
    

    Enter the following command to write the access controls for mydataset to a JSON file. mydataset is in myotherproject.

    bq show --format=prettyjson \
    myotherproject:mydataset > /tmp/mydataset.json
    
  2. Make your changes to the "access" section of the JSON file. You can add or remove any of the specialGroup entries: projectOwners, projectWriters, projectReaders, and allAuthenticatedUsers. You can also add, remove, or modify any of the following: userByEmail, groupByEmail, and domain.

    For example, the access section of a dataset's JSON file would look like the following:

    {
     "access": [
      {
       "role": "READER",
       "specialGroup": "projectReaders"
      },
      {
       "role": "WRITER",
       "specialGroup": "projectWriters"
      },
      {
       "role": "OWNER",
       "specialGroup": "projectOwners"
      }
      {
       "role": "READER",
       "specialGroup": "allAuthenticatedUsers"
      }
      {
       "role": "READER",
       "domain": "[DOMAIN_NAME]"
      }
      {
       "role": "WRITER",
       "userByEmail": "[USER_EMAIL]"
      }
      {
       "role": "READER",
       "groupByEmail": "[GROUP_EMAIL]"
      }
     ],
    }
    

  3. When your edits are complete, use the update command and include the JSON file using the --source flag. If the dataset is in a project other than your default project, add the project ID to the dataset name in the following format: project_id:dataset.

    bq update --source path_to_file project_id:dataset
    

    Where:

    • path_to_file is the path to the JSON file on your local machine.
    • project_id is your project ID.
    • dataset is the name of your dataset.

    Examples:

    Enter the following command to update the access controls for mydataset. mydataset is in your default project.

    bq update --source /tmp/mydataset.json mydataset
    

    Enter the following command to update the access controls for mydataset. mydataset is in myotherproject.

    bq update --source /tmp/mydataset.json myotherproject:mydataset
    
  4. To verify your access control changes, enter the show command again without writing the information to a file.

    bq show --format=prettyjson dataset
    

    or

    bq show --format=prettyjson project_id:dataset
    

API

Call the datasets.patch and update the access property in the dataset reource.

Because the datasets.update method replaces the entire dataset resource, datasets.patch is the preferred method for updating access controls.

Go

Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Go API reference documentation .

ds := client.Dataset(datasetID)
meta, err := ds.Metadata(ctx)
if err != nil {
	return err
}
// Append a new access control entry to the existing access list.
update := bigquery.DatasetMetadataToUpdate{
	Access: append(meta.Access, &bigquery.AccessEntry{
		Role:       bigquery.ReaderRole,
		EntityType: bigquery.UserEmailEntity,
		Entity:     "sample.bigquery.dev@gmail.com"},
	),
}

// Leverage the ETag for the update to assert there's been no modifications to the
// dataset since the metadata was originally read.
if _, err := ds.Update(ctx, update, meta.ETag); err != nil {
	return err
}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Java API reference documentation .

Create a Dataset.Builder instance from an existing Dataset instance with the Dataset.toBuilder() method. Configure the dataset builder object. Build the updated dataset with the Dataset.Builder.build() method, and call the Dataset.update() method to send the update to the API.

Configure the access controls with the Dataset.Builder.setAcl() method.

List<Acl> beforeAcls = dataset.getAcl();

// Make a copy of the ACLs so that they can be modified.
ArrayList<Acl> acls = new ArrayList<>(beforeAcls);
acls.add(Acl.of(new Acl.User("sample.bigquery.dev@gmail.com"), Acl.Role.READER));
DatasetInfo.Builder builder = dataset.toBuilder();
builder.setAcl(acls);

bigquery.update(builder.build());  // API request.

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Node.js API reference documentation .

// Import the Google Cloud client library
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();

async function updateDatasetAccess() {
  // Updates a datasets's access controls.

  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  // const datasetId = "my_dataset";

  // Create new role metadata
  const newRole = {
    role: 'READER',
    entity_type: 'userByEmail',
    userByEmail: 'sample.bigquery.dev@gmail.com',
  };

  // Retreive current dataset metadata
  const dataset = bigquery.dataset(datasetId);
  const [metadata] = await dataset.getMetadata();

  // Add new role to role acess array
  metadata.access.push(newRole);
  const [apiResponse] = await dataset.setMetadata(metadata);
  const newAccessRoles = apiResponse.access;
  newAccessRoles.forEach(role => console.log(role));
}

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

Set the dataset.access_entries property with the access controls for a dataset. Then call the client.update_dataset() function to update the property.
from google.cloud import bigquery

# TODO(developer): Construct a BigQuery client object.
# client = bigquery.Client()

# TODO(developer): Set dataset_id to the ID of the dataset to fetch.
# dataset_id = 'your-project.your_dataset'

dataset = client.get_dataset(dataset_id)  # Make an API request.

entry = bigquery.AccessEntry(
    role="READER",
    entity_type="userByEmail",
    entity_id="sample.bigquery.dev@gmail.com",
)

entries = list(dataset.access_entries)
entries.append(entry)
dataset.access_entries = entries

dataset = client.update_dataset(dataset, ["access_entries"])  # Make an API request.

full_dataset_id = "{}.{}".format(dataset.project, dataset.dataset_id)
print(
    "Updated dataset '{}' with modified user permissions.".format(full_dataset_id)
)

Next steps

Trang này có hữu ích không? Hãy cho chúng tôi biết đánh giá của bạn:

Gửi phản hồi về...

Bạn cần trợ giúp? Truy cập trang hỗ trợ của chúng tôi.