创建分区表

本页面介绍如何在 BigQuery 中创建分区表。如需分区表的概览，请参阅分区表简介。

准备工作

授予为用户提供执行本文档中的每个任务所需权限的 Identity and Access Management (IAM) 角色。

所需权限

所需的角色

如需获得创建表所需的权限，请让管理员向您授予以下 IAM 角色：

针对项目的 BigQuery Job User (roles/bigquery.jobUser)（如果您要通过加载数据或将查询结果保存到表来创建表）。
针对要在其中创建表的数据集的 BigQuery Data Editor (roles/bigquery.dataEditor)。

如需详细了解如何授予角色，请参阅管理对项目、文件夹和组织的访问权限。

这些预定义角色包含创建表所需的权限。如需查看所需的确切权限，请展开所需权限部分：

所需权限

您需要具备以下权限才能创建表：

针对要在其中创建表的数据集的 bigquery.tables.create。
针对查询引用的所有表和视图的 bigquery.tables.getData（如果您要将查询结果保存为表）。
针对项目的 bigquery.jobs.create（如果您要通过加载数据或将查询结果保存到表中来创建表）。
针对表的 bigquery.tables.updateData（如果您要使用查询结果附加到表或覆盖表）。

您也可以使用自定义角色或其他预定义角色来获取这些权限。

创建空分区表

在 BigQuery 中创建分区表的步骤与创建标准表类似，不同之处在于您需要指定分区选项以及任何其他表选项。

创建时间单位列分区表

如需创建具有架构定义的空时间单位列分区表，请执行以下操作：

控制台

在 Google Cloud 控制台中，前往 BigQuery 页面。

转到 BigQuery
在浏览器窗格中，展开您的项目，然后选择数据集。
在数据集信息部分中，点击 创建表。
在创建表面板中，指定以下详细信息：

在来源部分，在基于以下数据源创建表列表中选择空表。
在目标部分，指定以下详细信息：
1. 在数据集部分，选择您要在其中创建表的数据集。
2. 在表字段中，输入您要创建的表的名称。
3. 确认表类型字段是否设置为原生表。
在架构部分，输入架构定义。架构必须包含有 DATE、TIMESTAMP 或 DATETIME 列用于分区列。如需了解详情，请参阅指定架构。您可以使用以下任一方法手动输入架构信息：
- 选项 1：点击以文本形式修改，并以 JSON 数组的形式粘贴架构。使用 JSON 数组时，您要使用与创建 JSON 架构文件相同的流程生成架构。您可以输入以下命令，以 JSON 格式查看现有表的架构：
```
    bq show --format=prettyjson dataset.table
    
```
- 选项 2：点击 添加字段，然后输入表架构。指定每个字段的名称、类型和模式。
在分区和聚簇设置部分的分区列表中，选择按字段分区，然后选择分区列。只有当架构包含 DATE、TIMESTAMP 或 DATETIME 列时，此选项才可用。
可选：如需要求对此表的所有查询使用分区过滤条件，请选中要求使用分区过滤条件复选框。使用分区过滤条件可以减少费用并提高性能。如需了解详情，请参阅设置分区过滤条件要求。
选择分区类型。仅支持按天。
可选：如果要使用客户管理的加密密钥，在高级选项部分，选择使用客户管理的加密密钥 (CMEK) 选项。默认情况下，BigQuery 会使用 Google-owned and Google-managed encryption key对以静态方式存储的客户内容进行加密。
点击创建表。

SQL

如需创建时间单位列分区表，请将 CREATE TABLE DDL 语句与 PARTITION BY 子句搭配使用。

以下示例会基于 transaction_date 列创建一个包含每日分区的表：

在 Google Cloud 控制台中，前往 BigQuery 页面。

转到 BigQuery

在查询编辑器中，输入以下语句：

CREATE TABLE
  mydataset.newtable (transaction_id INT64, transaction_date DATE)
PARTITION BY
  transaction_date
  OPTIONS (
    partition_expiration_days = 3,
    require_partition_filter = TRUE);

使用 OPTIONS 子句设置表选项，例如分区过期时间和分区过滤条件要求。

点击运行。

如需详细了解如何运行查询，请参阅运行交互式查询。

DATE 列的默认分区类型是每日分区。如需指定其他分区类型，请在 PARTITION BY 子句中添加 DATE_TRUNC 函数。例如，以下查询会创建一个包含每月分区的表：

CREATE TABLE
  mydataset.newtable (transaction_id INT64, transaction_date DATE)
PARTITION BY
  DATE_TRUNC(transaction_date, MONTH)
  OPTIONS (
    partition_expiration_days = 3,
    require_partition_filter = TRUE);

您还可以指定 TIMESTAMP 或 DATETIME 列作为分区列。在这种情况中，请在 PARTITION BY 子句中添加 TIMESTAMP_TRUNC 或 DATETIME_TRUNC 函数来指定分区类型。例如，以下语句会基于 TIMESTAMP 列创建一个包含每日分区的表：

CREATE TABLE
  mydataset.newtable (transaction_id INT64, transaction_ts TIMESTAMP)
PARTITION BY
  TIMESTAMP_TRUNC(transaction_ts, DAY)
  OPTIONS (
    partition_expiration_days = 3,
    require_partition_filter = TRUE);

bq

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
将 bq mk 命令与 --table 标志（或 -t 快捷方式）结合使用：
```
bq mk \
   --table \
   --schema SCHEMA \
   --time_partitioning_field COLUMN \
   --time_partitioning_type UNIT_TIME \
   --time_partitioning_expiration EXPIRATION_TIME \
   --require_partition_filter=BOOLEAN
   PROJECT_ID:DATASET.TABLE
```
请替换以下内容：
- SCHEMA：采用 column:data_type,column:data_type 格式的架构定义或本地机器上的 JSON 架构文件的路径。如需了解详情，请参阅指定架构。
- COLUMN：分区列的名称。在表架构中，此列必须为 TIMESTAMP、DATETIME 或 DATE 类型。
- UNIT_TIME：分区类型。支持的值包括 DAY、HOUR、MONTH 或 YEAR。
- EXPIRATION_TIME：表分区的到期时间（以秒为单位）。--time_partitioning_expiration 是可选标志。如需了解详情，请参阅设置分区过期时间。
- BOOLEAN：如果为 true，则对该表的查询必须包含分区过滤条件。--require_partition_filter 是可选标志。如需了解详情，请参阅设置分区过滤条件。
- PROJECT_ID：项目 ID。如果省略，则系统会使用默认项目。
- DATASET：项目中的数据集的名称。
- TABLE：要创建的表的名称。
如需了解其他命令行选项，请参阅 bq mk。

以下示例会创建一个名为 mytable 的表，该表基于 ts 列使用每小时分区进行分区。分区到期时间为 259200 秒（3 天）。
```
bq mk \
   -t \
   --schema 'ts:TIMESTAMP,qtr:STRING,sales:FLOAT' \
   --time_partitioning_field ts \
   --time_partitioning_type HOUR \
   --time_partitioning_expiration 259200  \
   mydataset.mytable
```

Terraform

使用 google_bigquery_table 资源。

如需向 BigQuery 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为客户端库设置身份验证。

以下示例会创建一个名为 mytable 的表，该表按天分区：

resource "google_bigquery_dataset" "default" {
  dataset_id                      = "mydataset"
  default_partition_expiration_ms = 2592000000  # 30 days
  default_table_expiration_ms     = 31536000000 # 365 days
  description                     = "dataset description"
  location                        = "US"
  max_time_travel_hours           = 96 # 4 days

  labels = {
    billing_group = "accounting",
    pii           = "sensitive"
  }
}

resource "google_bigquery_table" "default" {
  dataset_id          = google_bigquery_dataset.default.dataset_id
  table_id            = "mytable"
  deletion_protection = false # set to "true" in production

  time_partitioning {
    type          = "DAY"
    field         = "Created"
    expiration_ms = 432000000 # 5 days
  }
  require_partition_filter = true

  schema = <<EOF
[
  {
    "name": "ID",
    "type": "INT64",
    "mode": "NULLABLE",
    "description": "Item ID"
  },
  {
    "name": "Created",
    "type": "TIMESTAMP",
    "description": "Record creation timestamp"
  },
  {
    "name": "Item",
    "type": "STRING",
    "mode": "NULLABLE"
  }
]
EOF

}

如需在 Google Cloud 项目中应用 Terraform 配置，请完成以下部分中的步骤。

准备 Cloud Shell

启动 Cloud Shell。
设置要应用 Terraform 配置的默认 Google Cloud 项目。

您只需为每个项目运行一次以下命令，即可在任何目录中运行它。
```
export GOOGLE_CLOUD_PROJECT=PROJECT_ID
```
如果您在 Terraform 配置文件中设置显式值，则环境变量会被替换。

准备目录

每个 Terraform 配置文件都必须有自己的目录（也称为“根模块”）。

在 Cloud Shell 中，创建一个目录，并在该目录中创建一个新文件。文件名必须具有 .tf 扩展名，例如 main.tf。在本教程中，该文件称为 main.tf。
```
mkdir DIRECTORY && cd DIRECTORY && touch main.tf
```
如果您按照教程进行操作，可以在每个部分或步骤中复制示例代码。

将示例代码复制到新创建的 main.tf 中。

（可选）从 GitHub 中复制代码。如果端到端解决方案包含 Terraform 代码段，则建议这样做。
查看和修改要应用到您的环境的示例参数。
保存更改。
初始化 Terraform。您只需为每个目录执行一次此操作。
```
terraform init
```
（可选）如需使用最新的 Google 提供程序版本，请添加 -upgrade 选项：
```
terraform init -upgrade
```

应用更改

查看配置并验证 Terraform 将创建或更新的资源是否符合您的预期：
```
terraform plan
```
根据需要更正配置。
通过运行以下命令并在提示符处输入 yes 来应用 Terraform 配置：
```
terraform apply
```
等待 Terraform 显示“应用完成！”消息。
打开您的 Google Cloud 项目以查看结果。在 Google Cloud 控制台的界面中找到资源，以确保 Terraform 已创建或更新它们。

API

使用指定了 timePartitioning 属性和 schema 属性的已定义表资源调用 tables.insert 方法。

Go

试用此示例之前，请按照 BigQuery 快速入门：使用客户端库中的 Go 设置说明进行操作。如需了解详情，请参阅 BigQuery Go API 参考文档。

如需向 BigQuery 进行身份验证，请设置应用默认凭证。如需了解详情，请参阅为客户端库设置身份验证。

import (
	"context"
	"fmt"
	"time"

	"cloud.google.com/go/bigquery"
)

// createTablePartitioned demonstrates creating a table and specifying a time partitioning configuration.
func createTablePartitioned(projectID, datasetID, tableID string) error {
	// projectID := "my-project-id"
	// datasetID := "mydatasetid"
	// tableID := "mytableid"
	ctx := context.Background()

	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("bigquery.NewClient: %v", err)
	}
	defer client.Close()

	sampleSchema := bigquery.Schema{
		{Name: "name", Type: bigquery.StringFieldType},
		{Name: "post_abbr", Type: bigquery.IntegerFieldType},
		{Name: "date", Type: bigquery.DateFieldType},
	}
	metadata := &bigquery.TableMetadata{
		TimePartitioning: &bigquery.TimePartitioning{
			Field:      "date",
			Expiration: 90 * 24 * time.Hour,
		},
		Schema: sampleSchema,
	}
	tableRef := client.Dataset(datasetID).Table(tableID)
	if err := tableRef.Create(ctx, metadata); err != nil {
		return err
	}
	return nil
}

Java

试用此示例之前，请按照 BigQuery 快速入门：使用客户端库中的 Java 设置说明进行操作。如需了解详情，请参阅 BigQuery Java API 参考文档。

如需向 BigQuery 进行身份验证，请设置应用默认凭证。如需了解详情，请参阅为客户端库设置身份验证。

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.Field;
import com.google.cloud.bigquery.Schema;
import com.google.cloud.bigquery.StandardSQLTypeName;
import com.google.cloud.bigquery.StandardTableDefinition;
import com.google.cloud.bigquery.TableId;
import com.google.cloud.bigquery.TableInfo;
import com.google.cloud.bigquery.TimePartitioning;

// Sample to create a partition table
public class CreatePartitionedTable {

  public static void main(String[] args) {
    // TODO(developer): Replace these variables before running the sample.
    String datasetName = "MY_DATASET_NAME";
    String tableName = "MY_TABLE_NAME";
    Schema schema =
        Schema.of(
            Field.of("name", StandardSQLTypeName.STRING),
            Field.of("post_abbr", StandardSQLTypeName.STRING),
            Field.of("date", StandardSQLTypeName.DATE));
    createPartitionedTable(datasetName, tableName, schema);
  }

  public static void createPartitionedTable(String datasetName, String tableName, Schema schema) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      TableId tableId = TableId.of(datasetName, tableName);

      TimePartitioning partitioning =
          TimePartitioning.newBuilder(TimePartitioning.Type.DAY)
              .setField("date") //  name of column to use for partitioning
              .setExpirationMs(7776000000L) // 90 days
              .build();

      StandardTableDefinition tableDefinition =
          StandardTableDefinition.newBuilder()
              .setSchema(schema)
              .setTimePartitioning(partitioning)
              .build();
      TableInfo tableInfo = TableInfo.newBuilder(tableId, tableDefinition).build();

      bigquery.create(tableInfo);
      System.out.println("Partitioned table created successfully");
    } catch (BigQueryException e) {
      System.out.println("Partitioned table was not created. \n" + e.toString());
    }
  }
}

Node.js

试用此示例之前，请按照 BigQuery 快速入门：使用客户端库中的 Node.js 设置说明进行操作。如需了解详情，请参阅 BigQuery Node.js API 参考文档。

如需向 BigQuery 进行身份验证，请设置应用默认凭证。如需了解详情，请参阅为客户端库设置身份验证。

// Import the Google Cloud client library
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();

async function createTablePartitioned() {
  // Creates a new partitioned table named "my_table" in "my_dataset".

  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  // const datasetId = "my_dataset";
  // const tableId = "my_table";
  const schema = 'Name:string, Post_Abbr:string, Date:date';

  // For all options, see https://cloud.google.com/bigquery/docs/reference/v2/tables#resource
  const options = {
    schema: schema,
    location: 'US',
    timePartitioning: {
      type: 'DAY',
      expirationMS: '7776000000',
      field: 'date',
    },
  };

  // Create a new table in the dataset
  const [table] = await bigquery
    .dataset(datasetId)
    .createTable(tableId, options);
  console.log(`Table ${table.id} created with partitioning: `);
  console.log(table.metadata.timePartitioning);
}

Python

试用此示例之前，请按照 BigQuery 快速入门：使用客户端库中的 Python 设置说明进行操作。如需了解详情，请参阅 BigQuery Python API 参考文档。

如需向 BigQuery 进行身份验证，请设置应用默认凭证。如需了解详情，请参阅为客户端库设置身份验证。

from google.cloud import bigquery

client = bigquery.Client()

# Use format "your-project.your_dataset.your_table_name" for table_id
table_id = your_fully_qualified_table_id
schema = [
    bigquery.SchemaField("name", "STRING"),
    bigquery.SchemaField("post_abbr", "STRING"),
    bigquery.SchemaField("date", "DATE"),
]
table = bigquery.Table(table_id, schema=schema)
table.time_partitioning = bigquery.TimePartitioning(
    type_=bigquery.TimePartitioningType.DAY,
    field="date",  # name of column to use for partitioning
    expiration_ms=1000 * 60 * 60 * 24 * 90,
)  # 90 days

table = client.create_table(table)

print(
    f"Created table {table.project}.{table.dataset_id}.{table.table_id}, "
    f"partitioned on column {table.time_partitioning.field}."
)

创建注入时间分区表

如需创建具有架构定义的空提取时间分区表，请执行以下操作：

控制台

在 Google Cloud 控制台中打开 BigQuery 页面。

转到 BigQuery 页面
在浏览器面板中，展开您的项目并选择数据集。
展开操作选项，然后点击打开。
在详情面板中，点击创建表 。
在创建表页面的来源部分，选择空白表。
在目标部分中执行如下设置：
- 在数据集名称部分，选择相应数据集。
- 在表名称字段中，输入表的名称。
- 确认表类型设置为原生表。
在架构部分中，输入架构定义。
在分区和聚簇设置部分，对于分区，点击按注入时间分区。
（可选）如需对此表的所有查询使用分区过滤条件，请选中需要分区过滤条件复选框。要求使用分区过滤条件可以减少费用并提高性能。如需了解详情，请参阅设置分区过滤条件。
点击创建表。

SQL

如需创建提取时间分区表，请将 CREATE TABLE 语句与基于 _PARTITIONDATE 分区的 PARTITION BY 子句结合使用。

以下示例会创建一个包含每日分区的表：

在 Google Cloud 控制台中，前往 BigQuery 页面。

转到 BigQuery

在查询编辑器中，输入以下语句：

CREATE TABLE
  mydataset.newtable (transaction_id INT64)
PARTITION BY
  _PARTITIONDATE
  OPTIONS (
    partition_expiration_days = 3,
    require_partition_filter = TRUE);

使用 OPTIONS 子句设置表选项，例如分区过期时间和分区过滤条件要求。

点击运行。

如需详细了解如何运行查询，请参阅运行交互式查询。

注入时间分区的默认分区类型是每日分区。如需指定其他分区类型，请在 PARTITION BY 子句中添加 DATE_TRUNC 函数。例如，以下查询会创建一个包含每月分区的表：

CREATE TABLE
  mydataset.newtable (transaction_id INT64)
PARTITION BY
  DATE_TRUNC(_PARTITIONTIME, MONTH)
  OPTIONS (
    partition_expiration_days = 3,
    require_partition_filter = TRUE);

bq

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
将 bq mk 命令与 --table 标志（或 -t 快捷方式）结合使用：
```
bq mk \
   --table \
   --schema SCHEMA \
   --time_partitioning_type UNIT_TIME \
   --time_partitioning_expiration EXPIRATION_TIME \
   --require_partition_filter=BOOLEAN  \
   PROJECT_ID:DATASET.TABLE
```
请替换以下内容：
- SCHEMA：采用 column:data_type,column:data_type 格式的定义或本地机器上的 JSON 架构文件的路径。如需了解详情，请参阅指定架构。
- UNIT_TIME：分区类型。支持的值包括 DAY、HOUR、MONTH 或 YEAR。
- EXPIRATION_TIME：表分区的到期时间（以秒为单位）。--time_partitioning_expiration 是可选标志。如需了解详情，请参阅设置分区过期时间。
- BOOLEAN：如果为 true，则对该表的查询必须包含分区过滤条件。--require_partition_filter 是可选标志。如需了解详情，请参阅设置分区过滤条件。
- PROJECT_ID：项目 ID。如果省略，则系统会使用默认项目。
- DATASET：项目中的数据集的名称。
- TABLE：要创建的表的名称。
如需了解其他命令行选项，请参阅 bq mk。

以下示例会创建一个名为 mytable 的注入时间分区表。该表包含每日分区，分区有效期为 259200 秒（3 天）。
```
bq mk \
   -t \
   --schema qtr:STRING,sales:FLOAT,year:STRING \
   --time_partitioning_type DAY \
   --time_partitioning_expiration 259200 \
   mydataset.mytable
```

Terraform

使用 google_bigquery_table 资源。

如需向 BigQuery 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为客户端库设置身份验证。

以下示例会创建一个名为 mytable 的表，该表按注入时间分区：

resource "google_bigquery_dataset" "default" {
  dataset_id                      = "mydataset"
  default_partition_expiration_ms = 2592000000  # 30 days
  default_table_expiration_ms     = 31536000000 # 365 days
  description                     = "dataset description"
  location                        = "US"
  max_time_travel_hours           = 96 # 4 days

  labels = {
    billing_group = "accounting",
    pii           = "sensitive"
  }
}

resource "google_bigquery_table" "default" {
  dataset_id          = google_bigquery_dataset.default.dataset_id
  table_id            = "mytable"
  deletion_protection = false # set to "true" in production

  time_partitioning {
    type          = "MONTH"
    expiration_ms = 604800000 # 7 days
  }
  require_partition_filter = true

  schema = <<EOF
[
  {
    "name": "ID",
    "type": "INT64",
    "mode": "NULLABLE",
    "description": "Item ID"
  },
  {
    "name": "Item",
    "type": "STRING",
    "mode": "NULLABLE"
  }
]
EOF

}

如需在 Google Cloud 项目中应用 Terraform 配置，请完成以下部分中的步骤。

准备 Cloud Shell

启动 Cloud Shell。
设置要应用 Terraform 配置的默认 Google Cloud 项目。

您只需为每个项目运行一次以下命令，即可在任何目录中运行它。
```
export GOOGLE_CLOUD_PROJECT=PROJECT_ID
```
如果您在 Terraform 配置文件中设置显式值，则环境变量会被替换。

准备目录

每个 Terraform 配置文件都必须有自己的目录（也称为“根模块”）。

在 Cloud Shell 中，创建一个目录，并在该目录中创建一个新文件。文件名必须具有 .tf 扩展名，例如 main.tf。在本教程中，该文件称为 main.tf。
```
mkdir DIRECTORY && cd DIRECTORY && touch main.tf
```
如果您按照教程进行操作，可以在每个部分或步骤中复制示例代码。

将示例代码复制到新创建的 main.tf 中。

（可选）从 GitHub 中复制代码。如果端到端解决方案包含 Terraform 代码段，则建议这样做。
查看和修改要应用到您的环境的示例参数。
保存更改。
初始化 Terraform。您只需为每个目录执行一次此操作。
```
terraform init
```
（可选）如需使用最新的 Google 提供程序版本，请添加 -upgrade 选项：
```
terraform init -upgrade
```

应用更改

查看配置并验证 Terraform 将创建或更新的资源是否符合您的预期：
```
terraform plan
```
根据需要更正配置。
通过运行以下命令并在提示符处输入 yes 来应用 Terraform 配置：
```
terraform apply
```
等待 Terraform 显示“应用完成！”消息。
打开您的 Google Cloud 项目以查看结果。在 Google Cloud 控制台的界面中找到资源，以确保 Terraform 已创建或更新它们。

API

使用指定了 timePartitioning 属性和 schema 属性的已定义表资源调用 tables.insert 方法。

创建整数范围分区表

如需创建具有架构定义的空整数范围分区表，请执行以下操作：

控制台

在 Google Cloud 控制台中打开 BigQuery 页面。

转到 BigQuery 页面
在浏览器面板中，展开您的项目并选择数据集。
展开操作选项，然后点击打开。
在详情面板中，点击创建表 。
在创建表页面的来源部分，选择空白表。
在目标部分中执行如下设置：
- 在数据集名称部分，选择相应数据集。
- 在表名称字段中，输入表的名称。
- 确认表类型设置为原生表。
在架构部分，输入架构定义。确保架构包含 INTEGER 列作为分区列。如需了解详情，请参阅指定架构。
在分区和聚簇设置部分的分区下拉列表中，选择按字段分区，然后选择分区列。只有在架构包含 INTEGER 列时，此选项才可用。
为起始值、终止值和间隔值提供值。
- 起始值是第一个分区范围（含）的起始值。
- 终止值是最后一个分区范围（不含）的结尾。
- 间隔值是每个分区范围的宽度。
超出这些范围的值会归入特殊的 __UNPARTITIONED__ 分区。
（可选）如需对此表的所有查询使用分区过滤条件，请选中需要分区过滤条件复选框。要求使用分区过滤条件可以减少费用并提高性能。如需了解详情，请参阅设置分区过滤条件。
点击创建表。

SQL

如需创建整数范围分区表，请将 CREATE TABLE DDL 语句与 PARTITION BY 子句搭配使用。

以下示例会创建一个按 customer_id 列分区的表，该列的起始值为 0，终止值为 100，间隔值为 10：

在 Google Cloud 控制台中，前往 BigQuery 页面。

转到 BigQuery

在查询编辑器中，输入以下语句：

CREATE TABLE mydataset.newtable (customer_id INT64, date1 DATE)
PARTITION BY
  RANGE_BUCKET(customer_id, GENERATE_ARRAY(0, 100, 10))
  OPTIONS (
    require_partition_filter = TRUE);

使用 OPTIONS 子句设置表选项，例如分区过滤条件要求。

点击运行。

如需详细了解如何运行查询，请参阅运行交互式查询。

bq

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
将 bq mk 命令与 --table 标志（或 -t 快捷方式）结合使用：
```
bq mk \
   --schema schema \
   --range_partitioning=COLUMN_NAME,START,END,INTERVAL \
   --require_partition_filter=BOOLEAN  \
   PROJECT_ID:DATASET.TABLE
```
请替换以下内容：
- SCHEMA：采用 column:data_type,column:data_type 格式的内嵌架构定义或本地机器上 JSON 架构文件的路径。如需了解详情，请参阅指定架构。
- COLUMN_NAME：分区列的名称。在表架构中，此列必须是 INTEGER 类型。
- START：第一个分区范围的起始值（含边界值）。
- END：最后一个分区范围的终止值（不含边界值）。
- INTERVAL：每个分区范围的宽度。
- BOOLEAN：如果为 true，则对该表的查询必须包含分区过滤条件。--require_partition_filter 是可选标志。如需了解详情，请参阅设置分区过滤条件。
- PROJECT_ID：项目 ID。如果省略，则系统会使用默认项目。
- DATASET：项目中的数据集的名称。
- TABLE：要创建的表的名称。
超出分区范围的值会归入特殊的 __UNPARTITIONED__ 分区。

如需了解其他命令行选项，请参阅 bq mk。

以下示例会创建一个基于 customer_id 列分区的名为 mytable 的表。
```
bq mk \
   -t \
   --schema 'customer_id:INTEGER,qtr:STRING,sales:FLOAT' \
   --range_partitioning=customer_id,0,100,10 \
   mydataset.mytable
```

Terraform

使用 google_bigquery_table 资源。

如需向 BigQuery 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为客户端库设置身份验证。

以下示例会创建一个名为 mytable 的表，该表按整数范围分区：

resource "google_bigquery_dataset" "default" {
  dataset_id                      = "mydataset"
  default_partition_expiration_ms = 2592000000  # 30 days
  default_table_expiration_ms     = 31536000000 # 365 days
  description                     = "dataset description"
  location                        = "US"
  max_time_travel_hours           = 96 # 4 days

  labels = {
    billing_group = "accounting",
    pii           = "sensitive"
  }
}

resource "google_bigquery_table" "default" {
  dataset_id          = google_bigquery_dataset.default.dataset_id
  table_id            = "mytable"
  deletion_protection = false # set to "true" in production

  range_partitioning {
    field = "ID"
    range {
      start    = 0
      end      = 1000
      interval = 10
    }
  }
  require_partition_filter = true

  schema = <<EOF
[
  {
    "name": "ID",
    "type": "INT64",
    "description": "Item ID"
  },
  {
    "name": "Item",
    "type": "STRING",
    "mode": "NULLABLE"
  }
]
EOF

}

如需在 Google Cloud 项目中应用 Terraform 配置，请完成以下部分中的步骤。

准备 Cloud Shell

启动 Cloud Shell。
设置要应用 Terraform 配置的默认 Google Cloud 项目。

您只需为每个项目运行一次以下命令，即可在任何目录中运行它。
```
export GOOGLE_CLOUD_PROJECT=PROJECT_ID
```
如果您在 Terraform 配置文件中设置显式值，则环境变量会被替换。

准备目录

每个 Terraform 配置文件都必须有自己的目录（也称为“根模块”）。

在 Cloud Shell 中，创建一个目录，并在该目录中创建一个新文件。文件名必须具有 .tf 扩展名，例如 main.tf。在本教程中，该文件称为 main.tf。
```
mkdir DIRECTORY && cd DIRECTORY && touch main.tf
```
如果您按照教程进行操作，可以在每个部分或步骤中复制示例代码。

将示例代码复制到新创建的 main.tf 中。

（可选）从 GitHub 中复制代码。如果端到端解决方案包含 Terraform 代码段，则建议这样做。
查看和修改要应用到您的环境的示例参数。
保存更改。
初始化 Terraform。您只需为每个目录执行一次此操作。
```
terraform init
```
（可选）如需使用最新的 Google 提供程序版本，请添加 -upgrade 选项：
```
terraform init -upgrade
```

应用更改

查看配置并验证 Terraform 将创建或更新的资源是否符合您的预期：
```
terraform plan
```
根据需要更正配置。
通过运行以下命令并在提示符处输入 yes 来应用 Terraform 配置：
```
terraform apply
```
等待 Terraform 显示“应用完成！”消息。
打开您的 Google Cloud 项目以查看结果。在 Google Cloud 控制台的界面中找到资源，以确保 Terraform 已创建或更新它们。

API

使用指定了 rangePartitioning 属性和 schema 属性的已定义表资源调用 tables.insert 方法。

Java

试用此示例之前，请按照 BigQuery 快速入门：使用客户端库中的 Java 设置说明进行操作。如需了解详情，请参阅 BigQuery Java API 参考文档。

如需向 BigQuery 进行身份验证，请设置应用默认凭证。如需了解详情，请参阅为客户端库设置身份验证。

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.Field;
import com.google.cloud.bigquery.RangePartitioning;
import com.google.cloud.bigquery.Schema;
import com.google.cloud.bigquery.StandardSQLTypeName;
import com.google.cloud.bigquery.StandardTableDefinition;
import com.google.cloud.bigquery.TableId;
import com.google.cloud.bigquery.TableInfo;

// Sample to create a range partitioned table
public class CreateRangePartitionedTable {

  public static void main(String[] args) {
    // TODO(developer): Replace these variables before running the sample.
    String datasetName = "MY_DATASET_NAME";
    String tableName = "MY_TABLE_NAME";
    Schema schema =
        Schema.of(
            Field.of("integerField", StandardSQLTypeName.INT64),
            Field.of("stringField", StandardSQLTypeName.STRING),
            Field.of("booleanField", StandardSQLTypeName.BOOL),
            Field.of("dateField", StandardSQLTypeName.DATE));
    createRangePartitionedTable(datasetName, tableName, schema);
  }

  public static void createRangePartitionedTable(
      String datasetName, String tableName, Schema schema) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      TableId tableId = TableId.of(datasetName, tableName);

      // Note: The field must be a top- level, NULLABLE/REQUIRED field.
      // The only supported type is INTEGER/INT64
      RangePartitioning partitioning =
          RangePartitioning.newBuilder()
              .setField("integerField")
              .setRange(
                  RangePartitioning.Range.newBuilder()
                      .setStart(1L)
                      .setInterval(2L)
                      .setEnd(10L)
                      .build())
              .build();

      StandardTableDefinition tableDefinition =
          StandardTableDefinition.newBuilder()
              .setSchema(schema)
              .setRangePartitioning(partitioning)
              .build();
      TableInfo tableInfo = TableInfo.newBuilder(tableId, tableDefinition).build();

      bigquery.create(tableInfo);
      System.out.println("Range partitioned table created successfully");
    } catch (BigQueryException e) {
      System.out.println("Range partitioned table was not created. \n" + e.toString());
    }
  }
}

Node.js

试用此示例之前，请按照 BigQuery 快速入门：使用客户端库中的 Node.js 设置说明进行操作。如需了解详情，请参阅 BigQuery Node.js API 参考文档。

如需向 BigQuery 进行身份验证，请设置应用默认凭证。如需了解详情，请参阅为客户端库设置身份验证。

// Import the Google Cloud client library
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();

async function createTableRangePartitioned() {
  // Creates a new integer range partitioned table named "my_table"
  // in "my_dataset".

  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  // const datasetId = "my_dataset";
  // const tableId = "my_table";

  const schema = [
    {name: 'fullName', type: 'STRING'},
    {name: 'city', type: 'STRING'},
    {name: 'zipcode', type: 'INTEGER'},
  ];

  // To use integer range partitioning, select a top-level REQUIRED or
  // NULLABLE column with INTEGER / INT64 data type. Values that are
  // outside of the range of the table will go into the UNPARTITIONED
  // partition. Null values will be in the NULL partition.
  const rangePartition = {
    field: 'zipcode',
    range: {
      start: 0,
      end: 100000,
      interval: 10,
    },
  };

  // For all options, see https://cloud.google.com/bigquery/docs/reference/v2/tables#resource
  const options = {
    schema: schema,
    rangePartitioning: rangePartition,
  };

  // Create a new table in the dataset
  const [table] = await bigquery
    .dataset(datasetId)
    .createTable(tableId, options);

  console.log(`Table ${table.id} created with integer range partitioning: `);
  console.log(table.metadata.rangePartitioning);
}

Python

试用此示例之前，请按照 BigQuery 快速入门：使用客户端库中的 Python 设置说明进行操作。如需了解详情，请参阅 BigQuery Python API 参考文档。

如需向 BigQuery 进行身份验证，请设置应用默认凭证。如需了解详情，请参阅为客户端库设置身份验证。

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of the table to create.
# table_id = "your-project.your_dataset.your_table_name"

schema = [
    bigquery.SchemaField("full_name", "STRING"),
    bigquery.SchemaField("city", "STRING"),
    bigquery.SchemaField("zipcode", "INTEGER"),
]

table = bigquery.Table(table_id, schema=schema)
table.range_partitioning = bigquery.RangePartitioning(
    # To use integer range partitioning, select a top-level REQUIRED /
    # NULLABLE column with INTEGER / INT64 data type.
    field="zipcode",
    range_=bigquery.PartitionRange(start=0, end=100000, interval=10),
)
table = client.create_table(table)  # Make an API request.
print(
    "Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
)

基于查询结果创建分区表

您可以通过以下方式基于查询结果创建分区表：

在 SQL 中，使用 CREATE TABLE ... AS SELECT 语句。您可以使用此方法创建按时间单位列或整数范围分区的表，但不能创建按注入时间分区的表。
使用 bp 命令行工具或 BigQuery API 为查询设置目标表。查询运行时，BigQuery 会将结果写入目标表。您可以对任何分区类型使用此方法。
调用 jobs.insert API 方法并在 timePartitioning 属性或 rangePartitioning 属性中指定分区。

SQL

使用 CREATE TABLE 语句。添加 PARTITION BY 子句以配置分区。

以下示例会创建一个基于 transaction_date 列分区的表：

在 Google Cloud 控制台中，前往 BigQuery 页面。

转到 BigQuery

在查询编辑器中，输入以下语句：

CREATE TABLE
  mydataset.newtable (transaction_id INT64, transaction_date DATE)
PARTITION BY
  transaction_date
AS (
  SELECT
    transaction_id, transaction_date
  FROM
    mydataset.mytable
);

使用 OPTIONS 子句设置表选项，例如分区过滤条件要求。

点击运行。

如需详细了解如何运行查询，请参阅运行交互式查询。

bq

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

如需通过查询创建分区表，请将 bq query 命令与 --destination_table 标志和 --time_partitioning_type 标志结合使用。

时间单位列分区：

bq query \
   --use_legacy_sql=false \
   --destination_table TABLE_NAME \
   --time_partitioning_field COLUMN \
   --time_partitioning_type UNIT_TIME \
   'QUERY_STATEMENT'

注入时间分区：

bq query \
   --use_legacy_sql=false \
   --destination_table TABLE_NAME \
   --time_partitioning_type UNIT_TIME \
   'QUERY_STATEMENT'

整数范围分区：

bq query \
   --use_legacy_sql=false \
   --destination_table PROJECT_ID:DATASET.TABLE \
   --range_partitioning COLUMN,START,END,INTERVAL \
   'QUERY_STATEMENT'

请替换以下内容：

PROJECT_ID：项目 ID。如果省略，则系统会使用默认项目。
DATASET：项目中的数据集的名称。
TABLE：要创建的表的名称。
COLUMN：分区列的名称。
UNIT_TIME：分区类型。支持的值包括 DAY、HOUR、MONTH 或 YEAR。
START：范围分区的起始值（含边界值）。
END：范围分区的终止值（不含边界值）。
INTERVAL：分区中每个范围的宽度。
QUERY_STATEMENT：用于填充表的查询。

以下示例会创建一个基于 transaction_date 列使用每月分区进行分区的表。

bq query \
   --use_legacy_sql=false \
   --destination_table mydataset.newtable \
   --time_partitioning_field transaction_date \
   --time_partitioning_type MONTH \
   'SELECT transaction_id, transaction_date FROM mydataset.mytable'

以下示例会创建一个基于 customer_id 列使用整数范围分区进行分区的表。

bq query \
   --use_legacy_sql=false \
   --destination_table mydataset.newtable \
   --range_partitioning customer_id,0,100,10 \
   'SELECT * FROM mydataset.ponies'

对于提取时间分区表，您还可以使用分区修饰器将数据加载到特定分区中。以下示例会创建一个新的提取时间分区表，并将数据加载到 20180201（2018 年 2 月 1 日）分区中：

bq query \
   --use_legacy_sql=false  \
   --time_partitioning_type=DAY \
   --destination_table='newtable$20180201' \
   'SELECT * FROM mydataset.mytable'

API

如需将查询结果保存到分区表中，请调用 jobs.insert 方法。配置 query 作业。在 destinationTable 中指定目标表。在 timePartitioning 属性或 rangePartitioning 属性中指定分区。

将日期分片表转换为提取时间分区表

如果以前创建了带日期分割的表，则可以使用 bp 命令行工具中的 partition 命令将整个相关表集转换为单个提取时间分区表。

bq --location=LOCATION partition \
    --time_partitioning_type=PARTITION_TYPE \
    --time_partitioning_expiration INTEGER \
    PROJECT_ID:SOURCE_DATASET.SOURCE_TABLE \
    PROJECT_ID:DESTINATION_DATASET.DESTINATION_TABLE

请替换以下内容：

LOCATION：您所在位置的名称。--location 是可选标志。
PARTITION_TYPE：分区类型。可能的值包括 DAY、HOUR、MONTH 或 YEAR。
INTEGER：分区到期时间（以秒为单位）。它没有最小值。到期时间以分区的世界协调时间 (UTC) 日期加上这个整数值为准。time_partitioning_expiration 是可选标志。
PROJECT_ID：您的项目 ID。
SOURCE_DATASET：包含日期分片表的数据集。
SOURCE_TABLE：日期分片表的前缀。
DESTINATION_DATASET：新分区表的数据集。
DESTINATION_TABLE：需要创建的分区表的名称。

partition 命令不支持 --label、--expiration --add_tags 或 --description 标志。在创建表后，您可以为其添加标签、表过期时间、标记和说明。

运行 partition 命令时，BigQuery 会创建一个根据分片表生成分区的复制作业。

以下示例会通过一组以 sourcetable_ 为前缀的日期分片表创建一个名为 mytable_partitioned 的提取时间分区表。新表每天进行分区，分区到期时间为 259200 秒（3 天）。

bq partition \
    --time_partitioning_type=DAY \
    --time_partitioning_expiration 259200 \
    mydataset.sourcetable_ \
    mydataset.mytable_partitioned

如果日期分片表为 sourcetable_20180126 和 sourcetable_20180127，则此命令将创建以下分区：mydataset.mytable_partitioned$20180126 和 mydataset.mytable_partitioned$20180127。

分区表安全性

分区表的访问权限控制与标准表的访问权限控制相同。如需了解详情，请参阅表访问权限控制简介。

后续步骤

如需了解如何管理和更新分区表，请参阅管理分区表。
如需了解如何查询分区表，请参阅查询分区表。