本頁面由 Cloud Translation API 翻譯而成。

從 Cloud Storage 載入 ORC 資料

本頁面提供將 ORC 資料從 Cloud Storage 載入至 BigQuery 的總覽。

ORC 是一種開放原始碼資料欄導向的資料格式，在 Apache Hadoop 生態系統中被廣泛使用。

從 Cloud Storage 載入 ORC 資料時，可將資料載入至新的資料表或分區，或將資料附加到現有資料表或分區，或覆寫現有資料表或分區。將資料載入 BigQuery 時，資料會轉換為 Capacitor 資料欄格式 (BigQuery 的儲存格式)。

將資料從 Cloud Storage 載入至 BigQuery 資料表時，包含該資料表的資料集必須位於與 Cloud Storage 值區相同的地區或多地區位置。

如需從本機檔案載入 ORC 資料的相關資訊，請參閱將資料從本機資料來源載入至 BigQuery。

限制

將資料從 Cloud Storage 值區載入 BigQuery 時有下列限制：

BigQuery 不保證外部資料來源的資料一致性。如果基礎資料在查詢執行期間遭到變更，可能會導致非預期的行為。
BigQuery 不支援 Cloud Storage 物件版本控管。如果 Cloud Storage URI 中包含產生編號，載入作業就會失敗。

事前準備

授予身分與存取權管理 (IAM) 角色，讓使用者擁有執行本文中各項工作所需的權限，並建立資料集來儲存資料。

所需權限

如要將資料載入 BigQuery，您需要具備執行載入工作，以及將資料載入 BigQuery 資料表和分區的 IAM 權限。如要從 Cloud Storage 載入資料，您也需要 IAM 權限，才能存取包含資料的值區。

將資料載入 BigQuery 的權限

如要將資料載入新的 BigQuery 資料表或分區，或是附加或覆寫現有的資料表或分區，您需要下列 IAM 權限：

bigquery.tables.create
bigquery.tables.updateData
bigquery.tables.update
bigquery.jobs.create

下列每個預先定義的 IAM 角色都包含將資料載入 BigQuery 資料表或分區所需的權限：

roles/bigquery.dataEditor
roles/bigquery.dataOwner
roles/bigquery.admin (包括 bigquery.jobs.create 權限)
bigquery.user (包括 bigquery.jobs.create 權限)
bigquery.jobUser (包括 bigquery.jobs.create 權限)

此外，如果您具備 bigquery.datasets.create 權限，就能在您建立的資料集中，使用載入工作建立及更新資料表。

如要進一步瞭解 BigQuery 中的 IAM 角色和權限，請參閱預先定義的角色與權限一文。

從 Cloud Storage 載入資料的權限

如要取得從 Cloud Storage 值區載入資料所需的權限，請要求管理員為您授予值區的儲存空間管理員 (roles/storage.admin) IAM 角色。如要進一步瞭解如何授予角色，請參閱「管理專案、資料夾和機構的存取權」。

這個預先定義的角色具備從 Cloud Storage 值區載入資料所需的權限。如要查看確切的必要權限，請展開「必要權限」部分：

所需權限

如要從 Cloud Storage 值區載入資料，您必須具備下列權限：

storage.buckets.get
storage.objects.get
storage.objects.list (required if you are using a URI wildcard)

您或許還可透過自訂角色或其他預先定義的角色取得這些權限。

建立資料集

建立 BigQuery 資料集來儲存資料。

ORC 結構定義

您將 ORC 檔案載入至 BigQuery 時，系統會透過自述式來源資料自動擷取資料表結構定義。當 BigQuery 從來源資料擷取結構定義時，會按照字母順序使用最後一個檔案。

舉例來說，Cloud Storage 中有下列 ORC 檔案：

gs://mybucket/00/
  a.orc
  z.orc
gs://mybucket/01/
  b.orc

在 bq 指令列工具中執行這項指令，即可載入所有檔案 (以逗號分隔的清單)，且結構定義衍生自 mybucket/01/b.orc：

bq load \
--source_format=ORC \
dataset.table \
"gs://mybucket/00/*.orc","gs://mybucket/01/*.orc"

當 BigQuery 偵測到結構定義時，部分 ORC 資料類型會轉換為 BigQuery 資料類型，確保與 GoogleSQL 語法相容。在偵測到的結構定義中，所有欄位均為 NULLABLE。詳情請參閱 ORC 轉換一節。

載入具有不同結構定義的多個 ORC 檔案時，多個結構定義中所指定的相同欄位 (具有相同名稱與相同巢狀層級) 必須對應至各個結構定義中相同的已轉換 BigQuery 資料類型。

如要提供資料表結構定義來建立外部資料表，請在 BigQuery API 中設定 referenceFileSchemaUri 屬性，或在 bq 指令列工具中設定
--reference_file_schema_uri 參數，指向參照檔案的網址。

例如 --reference_file_schema_uri="gs://mybucket/schema.orc"。

ORC 壓縮

BigQuery 支援下列 ORC 檔案內容的壓縮轉碼器：

Zlib
Snappy
LZO
LZ4
ZSTD

上傳至 BigQuery 後，ORC 檔案中的資料不會保持壓縮狀態。資料儲存空間會以邏輯位元組或實際位元組回報，視資料集儲存空間計費模式而定。如要取得儲存空間用量資訊，請查詢 INFORMATION_SCHEMA.TABLE_STORAGE 檢視區塊。

將 ORC 資料載入至新的資料表

您可以透過以下方式將 ORC 資料載入至新的資料表：

使用 Google Cloud 控制台
使用 bq 指令列工具的 bq load 指令
呼叫 jobs.insert API 方法並設定 load 工作
使用用戶端程式庫

如要將 ORC 資料從 Google Cloud Storage 載入至新的 BigQuery 資料表：

主控台

前往 Google Cloud 控制台的「BigQuery」頁面。

前往 BigQuery
在「Explorer」窗格中展開專案，然後選取資料集。
在「資料集資訊」部分，按一下「建立資料表」。
在「建立資料表」面板中，指定下列詳細資料：

在「來源」部分中，從「建立資料表來源」清單中選取「Google Cloud Storage」。接著，按照下列步驟操作：
1. 從 Cloud Storage 值區選取檔案，或輸入 Cloud Storage URI。您無法在 Google Cloud 控制台中加入多個 URI，但支援使用萬用字元。Cloud Storage 值區的位置必須與要建立、附加或覆寫的表格所在的資料集位置相同。
2. 在「File format」(檔案格式) 部分選取 [ORC]。
在「目的地」部分，指定下列詳細資料：
1. 在「Dataset」(資料集) 部分，選取要建立資料表的資料集。
2. 在「Table」(資料表) 欄位中，輸入要建立的資料表名稱。
3. 確認「Table type」(資料表類型) 欄位已設為「Native table」(原生資料表)。
在「Schema」(結構定義) 區段中，無需採取任何行動。結構定義自述於 ORC 檔案中。
選用：指定「分區與叢集設定」。詳情請參閱「建立分區資料表」和「建立及使用叢集資料表」。
按一下「進階選項」，然後執行下列操作：
- 讓「Write preference」(寫入偏好設定) 的 [Write if empty] (空白時寫入) 選項維持在已選取狀態。這個選項能建立新的資料表，並將您的資料載入其中。
- 如要忽略不在資料表結構定義中的資料列值，請選取「Unknown values」(不明的值)。
- 針對「Encryption」(加密)，請按一下「Customer-managed key」(客戶管理的金鑰)，以使用 Cloud Key Management Service 金鑰。如果您保留 Google-managed key 設定，BigQuery 會加密靜態資料。
點選「建立資料表」。

SQL

使用 LOAD DATA DDL 陳述式。以下範例會將 ORC 檔案載入至新資料表 mytable：

前往 Google Cloud 控制台的「BigQuery」頁面。

前往「BigQuery」

在查詢編輯器中輸入下列陳述式：

LOAD DATA OVERWRITE mydataset.mytable
FROM FILES (
  format = 'ORC',
  uris = ['gs://bucket/path/file.orc']);

按一下「執行」。

如要進一步瞭解如何執行查詢，請參閱「執行互動式查詢」。

bq

使用 bq load 指令將 ORC 指定為 source_format，並加入 Cloud Storage URI。您可以加入單一 URI、以逗號分隔的 URI 清單，或包含萬用字元的 URI。

(選用) 提供 --location 旗標，並將值設為您的位置。

其他選用標記包括：

--time_partitioning_type：針對資料表啟用時間分區並設定分區類型。可能的值為 HOUR、DAY、MONTH 和 YEAR。如果您在 DATE、DATETIME 或 TIMESTAMP 資料欄建立分區資料表，則不一定要使用這個旗標。時間分區的預設分區類型為 DAY。您無法變更現有資料表的分區規格。
--time_partitioning_expiration：這是一個整數，用來指定系統應在何時刪除時間分區 (以秒為單位)。到期時間為分區的世界標準時間日期加上整數值。
--time_partitioning_field：用於建立分區資料表的 DATE 或 TIMESTAMP 資料欄。如果啟用時間分區時沒有這個值，系統就會建立擷取時間分區資料表。
--require_partition_filter：這個選項啟用後，系統會要求使用者加入 WHERE 子句，以指定要查詢的分區。使用分區篩選器可以降低成本並提升效能。詳情請參閱在查詢中要求使用分區篩選器。
--clustering_fields：以逗號分隔的資料欄名稱清單 (最多四個名稱)，可用來建立叢集資料表。
--destination_kms_key：用來加密資料表資料的 Cloud KMS 金鑰。

如要進一步瞭解分區資料表，請參閱：
- 建立分區資料表
如要進一步瞭解叢集資料表，請參閱：
- 建立及使用叢集資料表
如要進一步瞭解資料表加密作業，請參閱：
- 使用 Cloud KMS 金鑰保護資料

如要將 ORC 資料載入 BigQuery，請輸入下列指令：

bq --location=location load \
--source_format=format \
dataset.table \
path_to_source

其中：

location 是您的位置。--location 是選用旗標。舉例來說，如果您在東京地區使用 BigQuery，就可以將該旗標的值設定為 asia-northeast1。您可以使用 .bigqueryrc 檔案設定位置的預設值。
format為 ORC。
dataset 是現有資料集。
table 是您正在載入資料的資料表名稱。
path_to_source 是完整的 Cloud Storage URI，或是以逗號分隔的 URI 清單。您也可以使用萬用字元。

範例：

下列指令會將資料從 gs://mybucket/mydata.orc 載入到 mydataset 中名為 mytable 的資料表。

    bq load \
    --source_format=ORC \
    mydataset.mytable \
    gs://mybucket/mydata.orc

下列指令會將 gs://mybucket/mydata.orc 中的資料載入至 mydataset 中名為 mytable 的新擷取時間分區資料表。

    bq load \
    --source_format=ORC \
    --time_partitioning_type=DAY \
    mydataset.mytable \
    gs://mybucket/mydata.orc

下列指令會將資料從 gs://mybucket/mydata.orc 載入到 mydataset 中名為 mytable 的分區資料表。資料表會依 mytimestamp 資料欄進行分區。

    bq load \
    --source_format=ORC \
    --time_partitioning_field mytimestamp \
    mydataset.mytable \
    gs://mybucket/mydata.orc

下列指令會將 gs://mybucket/ 中多個檔案的資料載入到 mydataset 中名為 mytable 的資料表。指令中的 Cloud Storage URI 使用萬用字元。

    bq load \
    --source_format=ORC \
    mydataset.mytable \
    gs://mybucket/mydata*.orc

下列指令會將 gs://mybucket/ 中多個檔案的資料載入到 mydataset 中名為 mytable 的資料表。指令包含以逗號分隔且帶有萬用字元的 Cloud Storage URI 清單。

    bq load --autodetect \
    --source_format=ORC \
    mydataset.mytable \
    "gs://mybucket/00/*.orc","gs://mybucket/01/*.orc"

API

建立指向 Cloud Storage 中來源資料的 load 工作。
(選用) 在工作資源的 jobReference 區段中，於 location 屬性指定您的位置。
source URIs 屬性必須是完整的，且必須符合下列格式：gs://bucket/object。每個 URI 可包含一個「*」萬用字元。
將 sourceFormat 屬性設為 ORC，以指定 ORC 資料格式。
如要檢查工作狀態，請呼叫 jobs.get(job_id*)，其中 job_id 是初始要求傳回的工作 ID。
- 如果是 status.state = DONE，代表工作已順利完成。
- 如果出現 status.errorResult 屬性，代表要求執行失敗，且該物件會包含描述問題的相關資訊。如果要求執行失敗，系統就不會建立任何資料表，也不會載入任何資料。
- 如果未出現 status.errorResult，代表工作順利完成，但可能有一些不嚴重的錯誤，例如少數資料列在匯入時發生問題。不嚴重的錯誤都會列在已傳回工作物件的 status.errors 屬性中。

API 附註：

載入工作不可部分完成，且資料狀態具一致性。如果載入工作失敗，所有資料都無法使用；如果載入工作成功，則所有資料都可以使用。
最佳做法就是產生唯一識別碼，並在呼叫 jobs.insert 建立載入工作時，將該唯一識別碼當做 jobReference.jobId 傳送。這個方法較不受網路故障問題的影響，因為用戶端可使用已知的工作 ID 進行輪詢或重試。
對指定的工作 ID 呼叫 jobs.insert 是一種冪等作業。也就是說，您可以對同一個工作 ID 重試無數次，最多會有一個作業成功。

C#

在試行這個範例之前，請先按照 BigQuery 快速入門導覽課程：使用用戶端程式庫中的 C# 設定說明進行操作。詳情請參閱 BigQuery C# API 參考說明文件。

如要向 BigQuery 進行驗證，請設定應用程式預設憑證。詳情請參閱「設定用戶端程式庫的驗證機制」。


using Google.Apis.Bigquery.v2.Data;
using Google.Cloud.BigQuery.V2;
using System;

public class BigQueryLoadTableGcsOrc
{
    public void LoadTableGcsOrc(
        string projectId = "your-project-id",
        string datasetId = "your_dataset_id"
    )
    {
        BigQueryClient client = BigQueryClient.Create(projectId);
        var gcsURI = "gs://cloud-samples-data/bigquery/us-states/us-states.orc";
        var dataset = client.GetDataset(datasetId);
        TableReference destinationTableRef = dataset.GetTableReference(
            tableId: "us_states");
        // Create job configuration
        var jobOptions = new CreateLoadJobOptions()
        {
            SourceFormat = FileFormat.Orc
        };
        // Create and run job
        var loadJob = client.CreateLoadJob(
            sourceUri: gcsURI,
            destination: destinationTableRef,
            // Pass null as the schema because the schema is inferred when
            // loading Orc data
            schema: null,
            options: jobOptions
        );
        loadJob = loadJob.PollUntilCompleted().ThrowOnAnyError();  // Waits for the job to complete.
        // Display the number of rows uploaded
        BigQueryTable table = client.GetTable(destinationTableRef);
        Console.WriteLine(
            $"Loaded {table.Resource.NumRows} rows to {table.FullyQualifiedId}");
    }
}

Go

在試行這個範例之前，請先按照 BigQuery 快速入門導覽課程：使用用戶端程式庫中的 Go 設定說明進行操作。詳情請參閱 BigQuery Go API 參考說明文件。

如要向 BigQuery 進行驗證，請設定應用程式預設憑證。詳情請參閱「設定用戶端程式庫的驗證機制」。

import (
	"context"
	"fmt"

	"cloud.google.com/go/bigquery"
)

// importORCTruncate demonstrates loading Apache ORC data from Cloud Storage into a table.
func importORC(projectID, datasetID, tableID string) error {
	// projectID := "my-project-id"
	// datasetID := "mydataset"
	// tableID := "mytable"
	ctx := context.Background()
	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("bigquery.NewClient: %v", err)
	}
	defer client.Close()

	gcsRef := bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/us-states/us-states.orc")
	gcsRef.SourceFormat = bigquery.ORC
	loader := client.Dataset(datasetID).Table(tableID).LoaderFrom(gcsRef)

	job, err := loader.Run(ctx)
	if err != nil {
		return err
	}
	status, err := job.Wait(ctx)
	if err != nil {
		return err
	}

	if status.Err() != nil {
		return fmt.Errorf("job completed with error: %v", status.Err())
	}
	return nil
}

Java

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.Field;
import com.google.cloud.bigquery.FormatOptions;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.JobInfo;
import com.google.cloud.bigquery.LoadJobConfiguration;
import com.google.cloud.bigquery.Schema;
import com.google.cloud.bigquery.StandardSQLTypeName;
import com.google.cloud.bigquery.TableId;

// Sample to load ORC data from Cloud Storage into a new BigQuery table
public class LoadOrcFromGCS {

  public static void runLoadOrcFromGCS() {
    // TODO(developer): Replace these variables before running the sample.
    String datasetName = "MY_DATASET_NAME";
    String tableName = "MY_TABLE_NAME";
    String sourceUri = "gs://cloud-samples-data/bigquery/us-states/us-states.orc";
    Schema schema =
        Schema.of(
            Field.of("name", StandardSQLTypeName.STRING),
            Field.of("post_abbr", StandardSQLTypeName.STRING));
    loadOrcFromGCS(datasetName, tableName, sourceUri, schema);
  }

  public static void loadOrcFromGCS(
      String datasetName, String tableName, String sourceUri, Schema schema) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      TableId tableId = TableId.of(datasetName, tableName);
      LoadJobConfiguration loadConfig =
          LoadJobConfiguration.newBuilder(tableId, sourceUri, FormatOptions.orc())
              .setSchema(schema)
              .build();

      // Load data from a GCS ORC file into the table
      Job job = bigquery.create(JobInfo.of(loadConfig));
      // Blocks until this load table job completes its execution, either failing or succeeding.
      job = job.waitFor();
      if (job.isDone() && job.getStatus().getError() == null) {
        System.out.println("ORC from GCS successfully added during load append job");
      } else {
        System.out.println(
            "BigQuery was unable to load into the table due to an error:"
                + job.getStatus().getError());
      }
    } catch (BigQueryException | InterruptedException e) {
      System.out.println("Column not added during load append \n" + e.toString());
    }
  }
}

Node.js

在試行這個範例之前，請先按照 BigQuery 快速入門導覽課程：使用用戶端程式庫中的 Node.js 設定說明進行操作。詳情請參閱 BigQuery Node.js API 參考說明文件。

如要向 BigQuery 進行驗證，請設定應用程式預設憑證。詳情請參閱「設定用戶端程式庫的驗證機制」。

// Import the Google Cloud client libraries
const {BigQuery} = require('@google-cloud/bigquery');
const {Storage} = require('@google-cloud/storage');

// Instantiate clients
const bigquery = new BigQuery();
const storage = new Storage();

/**
 * This sample loads the ORC file at
 * https://storage.googleapis.com/cloud-samples-data/bigquery/us-states/us-states.orc
 *
 * TODO(developer): Replace the following lines with the path to your file.
 */
const bucketName = 'cloud-samples-data';
const filename = 'bigquery/us-states/us-states.orc';

async function loadTableGCSORC() {
  // Imports a GCS file into a table with ORC source format.

  /**
   * TODO(developer): Uncomment the following line before running the sample.
   */
  // const datasetId = 'my_dataset';
  // const tableId = 'my_table'

  // Configure the load job. For full list of options, see:
  // https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad
  const metadata = {
    sourceFormat: 'ORC',
    location: 'US',
  };

  // Load data from a Google Cloud Storage file into the table
  const [job] = await bigquery
    .dataset(datasetId)
    .table(tableId)
    .load(storage.bucket(bucketName).file(filename), metadata);

  // load() waits for the job to finish
  console.log(`Job ${job.id} completed.`);

  // Check the job's status for errors
  const errors = job.status.errors;
  if (errors && errors.length > 0) {
    throw errors;
  }
}

PHP

在試行這個範例之前，請先按照 BigQuery 快速入門導覽課程：使用用戶端程式庫中的 PHP 設定說明進行操作。詳情請參閱 BigQuery PHP API 參考說明文件。

如要向 BigQuery 進行驗證，請設定應用程式預設憑證。詳情請參閱「設定用戶端程式庫的驗證機制」。

use Google\Cloud\BigQuery\BigQueryClient;
use Google\Cloud\Core\ExponentialBackoff;

/** Uncomment and populate these variables in your code */
// $projectId  = 'The Google project ID';
// $datasetId  = 'The BigQuery dataset ID';

// instantiate the bigquery table service
$bigQuery = new BigQueryClient([
    'projectId' => $projectId,
]);
$dataset = $bigQuery->dataset($datasetId);
$table = $dataset->table('us_states');

// create the import job
$gcsUri = 'gs://cloud-samples-data/bigquery/us-states/us-states.orc';
$loadConfig = $table->loadFromStorage($gcsUri)->sourceFormat('ORC');
$job = $table->runJob($loadConfig);
// poll the job until it is complete
$backoff = new ExponentialBackoff(10);
$backoff->execute(function () use ($job) {
    print('Waiting for job to complete' . PHP_EOL);
    $job->reload();
    if (!$job->isComplete()) {
        throw new Exception('Job has not yet completed', 500);
    }
});
// check if the job has errors
if (isset($job->info()['status']['errorResult'])) {
    $error = $job->info()['status']['errorResult']['message'];
    printf('Error running job: %s' . PHP_EOL, $error);
} else {
    print('Data imported successfully' . PHP_EOL);
}

Python

在試行這個範例之前，請先按照 BigQuery 快速入門導覽課程：使用用戶端程式庫中的 Python 設定說明進行操作。詳情請參閱 BigQuery Python API 參考說明文件。

如要向 BigQuery 進行驗證，請設定應用程式預設憑證。詳情請參閱「設定用戶端程式庫的驗證機制」。

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of the table to create.
# table_id = "your-project.your_dataset.your_table_name

job_config = bigquery.LoadJobConfig(source_format=bigquery.SourceFormat.ORC)
uri = "gs://cloud-samples-data/bigquery/us-states/us-states.orc"

load_job = client.load_table_from_uri(
    uri, table_id, job_config=job_config
)  # Make an API request.

load_job.result()  # Waits for the job to complete.

destination_table = client.get_table(table_id)
print("Loaded {} rows.".format(destination_table.num_rows))

Ruby

在試行這個範例之前，請先按照 BigQuery 快速入門導覽課程：使用用戶端程式庫中的 Ruby 設定說明進行操作。詳情請參閱 BigQuery Ruby API 參考說明文件。

如要向 BigQuery 進行驗證，請設定應用程式預設憑證。詳情請參閱「設定用戶端程式庫的驗證機制」。

require "google/cloud/bigquery"

def load_table_gcs_orc dataset_id = "your_dataset_id"
  bigquery = Google::Cloud::Bigquery.new
  dataset  = bigquery.dataset dataset_id
  gcs_uri  = "gs://cloud-samples-data/bigquery/us-states/us-states.orc"
  table_id = "us_states"

  load_job = dataset.load_job table_id, gcs_uri, format: "orc"
  puts "Starting job #{load_job.job_id}"

  load_job.wait_until_done! # Waits for table load to complete.
  puts "Job finished."

  table = dataset.table table_id
  puts "Loaded #{table.rows_count} rows to table #{table.id}"
end

使用 ORC 資料附加到資料表或覆寫資料表

如要將其他資料載入資料表，您可以指定來源檔案或附加查詢結果。

在 Google Cloud 主控台中，使用「寫入偏好設定」選項，指定從來源檔案或查詢結果載入資料時採取的動作。

將額外資料載入資料表時，可以選擇下列選項：

主控台選項	bq 工具標記	BigQuery API 屬性	說明
空白時寫入	不支援	`WRITE_EMPTY`	資料表空白時才會寫入資料。
附加到資料表中	`--noreplace` 或 `--replace=false`；如果未指定 `--[no]replace`，則預設動作為附加	`WRITE_APPEND`	(預設) 將資料附加至資料表尾端。
覆寫資料表	`--replace`或`--replace=true`	`WRITE_TRUNCATE`	先清除資料表中所有現有資料，再寫入新的資料。這項操作也會刪除資料表結構定義、資料列層級安全性，並移除所有 Cloud KMS 金鑰。

如果您將資料載入現有資料表，該載入工作可附加資料，或覆寫資料表。

您可以透過下列方式來對資料表進行附加或覆寫作業：

使用 Google Cloud 控制台
使用 bq 指令列工具的 bq load 指令
呼叫 jobs.insert API 方法並設定 load 工作
使用用戶端程式庫

使用 ORC 資料附加或覆寫資料表的方式如下：

主控台

前往 Google Cloud 控制台的「BigQuery」頁面。

前往 BigQuery
在「Explorer」窗格中展開專案，然後選取資料集。
在「資料集資訊」部分，按一下「建立資料表」。
在「建立資料表」面板中，指定下列詳細資料：

在「來源」部分中，從「建立資料表來源」清單中選取「Google Cloud Storage」。接著，按照下列步驟操作：
1. 從 Cloud Storage 值區選取檔案，或輸入 Cloud Storage URI。您無法在 Google Cloud 控制台中加入多個 URI，但支援使用萬用字元。Cloud Storage 值區的位置必須與要建立、附加或覆寫的表格所在的資料集位置相同。
2. 在「File format」(檔案格式) 部分選取 [ORC]。

在「目的地」部分，指定下列詳細資料：
1. 在「Dataset」(資料集) 部分，選取要建立資料表的資料集。
2. 在「Table」(資料表) 欄位中，輸入要建立的資料表名稱。
3. 確認「Table type」(資料表類型) 欄位已設為「Native table」(原生資料表)。
在「Schema」(結構定義) 區段中，無需採取任何行動。結構定義自述於 ORC 檔案中。

選用：指定「分區與叢集設定」。詳情請參閱「建立分區資料表」和「建立及使用叢集資料表」。您無法藉由附加或覆寫的方式，將資料表轉換為分區資料表或叢集資料表。 Google Cloud 主控台不支援在載入工作中附加資料到分區或叢集資料表，也不支援覆寫這類資料表。
按一下「進階選項」，然後執行下列操作：
- 針對「Write preference」(寫入偏好設定)，請選擇「Append to table」(附加到資料表中) 或「Overwrite table」(覆寫資料表)。
- 如要忽略不在資料表結構定義中的資料列值，請選取「Unknown values」(不明的值)。
- 針對「Encryption」(加密)，請按一下「Customer-managed key」(客戶管理的金鑰)，以使用 Cloud Key Management Service 金鑰。如果您保留 Google-managed key 設定，BigQuery 會加密靜態資料。
點選「建立資料表」。

SQL

使用 LOAD DATA DDL 陳述式。以下範例會將 ORC 檔案附加至 mytable 資料表：

前往 Google Cloud 控制台的「BigQuery」頁面。

前往「BigQuery」

在查詢編輯器中輸入下列陳述式：

LOAD DATA INTO mydataset.mytable
FROM FILES (
  format = 'ORC',
  uris = ['gs://bucket/path/file.orc']);

按一下「執行」。

如要進一步瞭解如何執行查詢，請參閱「執行互動式查詢」。

bq

如要覆寫資料表，請輸入 bq load 指令並加上 --replace 旗標。如要附加資料至資料表，使用 --noreplace 旗標。若未指定任何旗標，預設動作為附加資料。提供 --source_format 旗標，並將其設為 ORC。由於系統會自動從自述來源資料中擷取 ORC 結構定義，所以您不需要提供結構定義。

(選用) 提供 --location 旗標，並將值設為您的位置。

其他選用標記包括：

--destination_kms_key：用來加密資料表資料的 Cloud KMS 金鑰。

bq --location=location load \
--[no]replace \
--source_format=format \
dataset.table \
path_to_source

其中：

location 是您的位置。--location 是選用旗標。您可以使用 .bigqueryrc 檔案來設定位置的預設值。
format為 ORC。
dataset 是現有資料集。
table 是您正在載入資料的資料表名稱。
path_to_source 是完整的 Cloud Storage URI，或是以逗號分隔的 URI 清單。您也可以使用萬用字元。

範例：

下列指令會從 gs://mybucket/mydata.orc 載入資料，並覆寫 mydataset 中名為 mytable 的資料表。

    bq load \
    --replace \
    --source_format=ORC \
    mydataset.mytable \
    gs://mybucket/mydata.orc

下列指令會從 gs://mybucket/mydata.orc 載入資料，並將資料附加至 mydataset 中名為 mytable 的資料表。

    bq load \
    --noreplace \
    --source_format=ORC \
    mydataset.mytable \
    gs://mybucket/mydata.orc

如要瞭解如何使用 bq 指令列工具附加和覆寫分區資料表，請參閱對分區資料表中的資料執行附加或覆寫操作一節。

API

建立指向 Cloud Storage 中來源資料的 load 工作。
(選用) 在工作資源的 jobReference 區段中，於 location 屬性指定您的位置。
source URIs 屬性必須是完整的，且必須符合下列格式：gs://bucket/object。您可以使用逗號分隔清單的形式加入多個 URI。請注意，系統也支援使用萬用字元。
藉由將 configuration.load.sourceFormat 屬性設為 ORC，以指定資料格式。
藉由將 configuration.load.writeDisposition 屬性設為 WRITE_TRUNCATE 或 WRITE_APPEND，以指定寫入偏好設定。

C#

在試行這個範例之前，請先按照 BigQuery 快速入門導覽課程：使用用戶端程式庫中的 C# 設定說明進行操作。詳情請參閱 BigQuery C# API 參考說明文件。

如要向 BigQuery 進行驗證，請設定應用程式預設憑證。詳情請參閱「設定用戶端程式庫的驗證機制」。


using Google.Apis.Bigquery.v2.Data;
using Google.Cloud.BigQuery.V2;
using System;

public class BigQueryLoadTableGcsOrcTruncate
{
    public void LoadTableGcsOrcTruncate(
        string projectId = "your-project-id",
        string datasetId = "your_dataset_id",
        string tableId = "your_table_id"
    )
    {
        BigQueryClient client = BigQueryClient.Create(projectId);
        var gcsURI = "gs://cloud-samples-data/bigquery/us-states/us-states.orc";
        var dataset = client.GetDataset(datasetId);
        TableReference destinationTableRef = dataset.GetTableReference(
            tableId: "us_states");
        // Create job configuration
        var jobOptions = new CreateLoadJobOptions()
        {
            SourceFormat = FileFormat.Orc,
            WriteDisposition = WriteDisposition.WriteTruncate
        };
        // Create and run job
        var loadJob = client.CreateLoadJob(
            sourceUri: gcsURI,
            destination: destinationTableRef,
            // Pass null as the schema because the schema is inferred when
            // loading Orc data
            schema: null, options: jobOptions);
        loadJob = loadJob.PollUntilCompleted().ThrowOnAnyError();  // Waits for the job to complete.
        // Display the number of rows uploaded
        BigQueryTable table = client.GetTable(destinationTableRef);
        Console.WriteLine(
            $"Loaded {table.Resource.NumRows} rows to {table.FullyQualifiedId}");
    }
}

Go

在試行這個範例之前，請先按照 BigQuery 快速入門導覽課程：使用用戶端程式庫中的 Go 設定說明進行操作。詳情請參閱 BigQuery Go API 參考說明文件。

如要向 BigQuery 進行驗證，請設定應用程式預設憑證。詳情請參閱「設定用戶端程式庫的驗證機制」。

import (
	"context"
	"fmt"

	"cloud.google.com/go/bigquery"
)

// importORCTruncate demonstrates loading Apache ORC data from Cloud Storage into a table
// and overwriting/truncating existing data in the table.
func importORCTruncate(projectID, datasetID, tableID string) error {
	// projectID := "my-project-id"
	// datasetID := "mydataset"
	// tableID := "mytable"
	ctx := context.Background()
	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("bigquery.NewClient: %v", err)
	}
	defer client.Close()

	gcsRef := bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/us-states/us-states.orc")
	gcsRef.SourceFormat = bigquery.ORC
	loader := client.Dataset(datasetID).Table(tableID).LoaderFrom(gcsRef)
	// Default for import jobs is to append data to a table.  WriteTruncate
	// specifies that existing data should instead be replaced/overwritten.
	loader.WriteDisposition = bigquery.WriteTruncate

	job, err := loader.Run(ctx)
	if err != nil {
		return err
	}
	status, err := job.Wait(ctx)
	if err != nil {
		return err
	}

	if status.Err() != nil {
		return fmt.Errorf("job completed with error: %v", status.Err())
	}
	return nil
}

Java

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.FormatOptions;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.JobInfo;
import com.google.cloud.bigquery.LoadJobConfiguration;
import com.google.cloud.bigquery.TableId;

// Sample to overwrite the BigQuery table data by loading a ORC file from GCS
public class LoadOrcFromGcsTruncate {

  public static void runLoadOrcFromGcsTruncate() {
    // TODO(developer): Replace these variables before running the sample.
    String datasetName = "MY_DATASET_NAME";
    String tableName = "MY_TABLE_NAME";
    String sourceUri = "gs://cloud-samples-data/bigquery/us-states/us-states.orc";
    loadOrcFromGcsTruncate(datasetName, tableName, sourceUri);
  }

  public static void loadOrcFromGcsTruncate(
      String datasetName, String tableName, String sourceUri) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      TableId tableId = TableId.of(datasetName, tableName);
      LoadJobConfiguration loadConfig =
          LoadJobConfiguration.newBuilder(tableId, sourceUri)
              .setFormatOptions(FormatOptions.orc())
              // Set the write disposition to overwrite existing table data
              .setWriteDisposition(JobInfo.WriteDisposition.WRITE_TRUNCATE)
              .build();

      // Load data from a GCS ORC file into the table
      Job job = bigquery.create(JobInfo.of(loadConfig));
      // Blocks until this load table job completes its execution, either failing or succeeding.
      job = job.waitFor();
      if (job.isDone() && job.getStatus().getError() == null) {
        System.out.println("Table is successfully overwritten by ORC file loaded from GCS");
      } else {
        System.out.println(
            "BigQuery was unable to load into the table due to an error:"
                + job.getStatus().getError());
      }
    } catch (BigQueryException | InterruptedException e) {
      System.out.println("Column not added during load append \n" + e.toString());
    }
  }
}

Node.js

如要向 BigQuery 進行驗證，請設定應用程式預設憑證。詳情請參閱「設定用戶端程式庫的驗證機制」。

// Import the Google Cloud client libraries
const {BigQuery} = require('@google-cloud/bigquery');
const {Storage} = require('@google-cloud/storage');

// Instantiate the clients
const bigquery = new BigQuery();
const storage = new Storage();

/**
 * This sample loads the CSV file at
 * https://storage.googleapis.com/cloud-samples-data/bigquery/us-states/us-states.csv
 *
 * TODO(developer): Replace the following lines with the path to your file.
 */
const bucketName = 'cloud-samples-data';
const filename = 'bigquery/us-states/us-states.orc';

async function loadORCFromGCSTruncate() {
  /**
   * Imports a GCS file into a table and overwrites
   * table data if table already exists.
   */

  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  // const datasetId = "my_dataset";
  // const tableId = "my_table";

  // Configure the load job. For full list of options, see:
  // https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad
  const metadata = {
    sourceFormat: 'ORC',
    // Set the write disposition to overwrite existing table data.
    writeDisposition: 'WRITE_TRUNCATE',
    location: 'US',
  };

  // Load data from a Google Cloud Storage file into the table
  const [job] = await bigquery
    .dataset(datasetId)
    .table(tableId)
    .load(storage.bucket(bucketName).file(filename), metadata);
  // load() waits for the job to finish
  console.log(`Job ${job.id} completed.`);

  // Check the job's status for errors
  const errors = job.status.errors;
  if (errors && errors.length > 0) {
    throw errors;
  }
}

PHP

在試行這個範例之前，請先按照 BigQuery 快速入門導覽課程：使用用戶端程式庫中的 PHP 設定說明進行操作。詳情請參閱 BigQuery PHP API 參考說明文件。

如要向 BigQuery 進行驗證，請設定應用程式預設憑證。詳情請參閱「設定用戶端程式庫的驗證機制」。

use Google\Cloud\BigQuery\BigQueryClient;
use Google\Cloud\Core\ExponentialBackoff;

/** Uncomment and populate these variables in your code */
// $projectId = 'The Google project ID';
// $datasetId = 'The BigQuery dataset ID';
// $tableID = 'The BigQuery table ID';

// instantiate the bigquery table service
$bigQuery = new BigQueryClient([
    'projectId' => $projectId,
]);
$table = $bigQuery->dataset($datasetId)->table($tableId);

// create the import job
$gcsUri = 'gs://cloud-samples-data/bigquery/us-states/us-states.orc';
$loadConfig = $table->loadFromStorage($gcsUri)->sourceFormat('ORC')->writeDisposition('WRITE_TRUNCATE');
$job = $table->runJob($loadConfig);

// poll the job until it is complete
$backoff = new ExponentialBackoff(10);
$backoff->execute(function () use ($job) {
    print('Waiting for job to complete' . PHP_EOL);
    $job->reload();
    if (!$job->isComplete()) {
        throw new Exception('Job has not yet completed', 500);
    }
});

// check if the job has errors
if (isset($job->info()['status']['errorResult'])) {
    $error = $job->info()['status']['errorResult']['message'];
    printf('Error running job: %s' . PHP_EOL, $error);
} else {
    print('Data imported successfully' . PHP_EOL);
}

Python

如要向 BigQuery 進行驗證，請設定應用程式預設憑證。詳情請參閱「設定用戶端程式庫的驗證機制」。

如要取代現有資料表中的資料列，請將 LoadJobConfig.write_disposition 屬性設為 WRITE_TRUNCATE。

import io

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of the table to create.
# table_id = "your-project.your_dataset.your_table_name

job_config = bigquery.LoadJobConfig(
    schema=[
        bigquery.SchemaField("name", "STRING"),
        bigquery.SchemaField("post_abbr", "STRING"),
    ],
)

body = io.BytesIO(b"Washington,WA")
client.load_table_from_file(body, table_id, job_config=job_config).result()
previous_rows = client.get_table(table_id).num_rows
assert previous_rows > 0

job_config = bigquery.LoadJobConfig(
    write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE,
    source_format=bigquery.SourceFormat.ORC,
)

uri = "gs://cloud-samples-data/bigquery/us-states/us-states.orc"
load_job = client.load_table_from_uri(
    uri, table_id, job_config=job_config
)  # Make an API request.

load_job.result()  # Waits for the job to complete.

destination_table = client.get_table(table_id)
print("Loaded {} rows.".format(destination_table.num_rows))

Ruby

在試行這個範例之前，請先按照 BigQuery 快速入門導覽課程：使用用戶端程式庫中的 Ruby 設定說明進行操作。詳情請參閱 BigQuery Ruby API 參考說明文件。

如要向 BigQuery 進行驗證，請設定應用程式預設憑證。詳情請參閱「設定用戶端程式庫的驗證機制」。

require "google/cloud/bigquery"

def load_table_gcs_orc_truncate dataset_id = "your_dataset_id",
                                table_id   = "your_table_id"
  bigquery = Google::Cloud::Bigquery.new
  dataset  = bigquery.dataset dataset_id
  gcs_uri  = "gs://cloud-samples-data/bigquery/us-states/us-states.orc"

  load_job = dataset.load_job table_id,
                              gcs_uri,
                              format: "orc",
                              write:  "truncate"
  puts "Starting job #{load_job.job_id}"

  load_job.wait_until_done! # Waits for table load to complete.
  puts "Job finished."

  table = dataset.table table_id
  puts "Loaded #{table.rows_count} rows to table #{table.id}"
end

載入 Hive 分區的 ORC 資料

BigQuery 支援載入儲存在 Cloud Storage 的 Hive 分區 ORC 資料，並且會將目標 BigQuery 代管資料表中的資料欄，填入 Hive 分區的資料欄。詳情請參閱從 Cloud Storage 載入外部分區資料。

ORC 轉換

BigQuery 會將 ORC 資料類型轉換為下列 BigQuery 資料類型：

原始類型

ORC 資料類型	BigQuery 資料類型	附註
boolean	BOOLEAN
byte	INTEGER
short	INTEGER
int	INTEGER
long	INTEGER
float	FLOAT
double	FLOAT
string	STRING	僅限 UTF-8
varchar	STRING	僅限 UTF-8
char	STRING	僅限 UTF-8
binary	BYTES
date	DATE	如果嘗試轉換 ORC 資料中任何小於 -719162 天或大於 2932896 天的值，系統會傳回 `invalid date value` 錯誤。如果這項異動會影響您，請與支援團隊聯絡，請對方將不支援的值轉換為 BigQuery 的最小值 `0001-01-01` 或最大值 `9999-12-31` (視情況而定)。
時間戳記	TIMESTAMP	ORC 支援奈秒精確度，但系統讀取資料時，BigQuery 會將小於微秒的值轉換為微秒。如果嘗試轉換 ORC 資料中任何小於 -719162 天或大於 2932896 天的值，系統會傳回 `invalid date value` 錯誤。如果受到影響，請與支援團隊聯絡，將不支援的值轉換為 BigQuery 的最小值 `0001-01-01` 或最大值 `9999-12-31` (視情況而定)。
decimal	NUMERIC、BIGNUMERIC 或 STRING	請參閱「Decimal 型別」。

十進位類型

Decimal 邏輯型別可以轉換為 NUMERIC、BIGNUMERIC 或 STRING 型別。轉換後的型別取決於 decimal 邏輯型別的精確度和比例參數，以及指定的小數目標型別。請按照下列方式指定十進位目標類型：

如要使用 jobs.insert API 進行載入作業，請使用 JobConfigurationLoad.decimalTargetTypes 欄位。
如要使用 bq 指令列工具中的 bq load 指令執行載入工作，請使用 --decimal_target_types 旗標。
針對含有外部來源的資料表執行查詢：使用 ExternalDataConfiguration.decimalTargetTypes 欄位。
如要使用 DDL 建立永久外部資料表：請使用 decimal_target_types 選項。

複合類型

ORC 資料類型	BigQuery 資料類型	附註
struct	RECORD	所有欄位均為 NULLABLE。系統會忽略欄位順序。欄位名稱必須為有效的欄名稱。
map<K,V>	RECORD	ORC map<K,V> 欄位會轉換為重複的 RECORD，其中包含兩個欄位：相同資料類型的鍵為 K，以及相同資料類型的值為 V。這兩個欄位都是 NULLABLE。
list	重複欄位	不支援巢狀清單與對應清單。
union	RECORD	當 union 只有一個變因時，系統會將其轉換為 NULLABLE 欄位。否則，會將 union 轉換為含有 NULLABLE 欄位清單的 RECORD。NULLABLE 欄位的後置字串為 field_0、field_1，依此類推。系統讀取資料時，只會為其中一個欄位指派值。

欄名稱

欄名可包含英文字母 (a-z、A-Z)、數字 (0-9) 或底線 (_)，且開頭必須是英文字母或底線。如果您使用彈性資料欄名稱，BigQuery 支援以數字開頭的資料欄名稱。請謹慎使用數字開頭的資料欄，因為使用 BigQuery Storage Read API 或 BigQuery Storage Write API 時，如果資料欄名稱開頭是數字，需要特別處理。如要進一步瞭解彈性資料欄名稱支援功能，請參閱彈性資料欄名稱。

欄名的長度上限為 300 個半形字元。資料欄名稱不得使用以下任何一個前置字串：

_TABLE_
_FILE_
_PARTITION
_ROW_TIMESTAMP
__ROOT__
_COLIDENTIFIER

資料欄名稱不得重複，即使大小寫不同也是如此。舉例來說，系統會將 Column1 和 column1 這兩個資料欄名稱視為相同。如要進一步瞭解資料欄命名規則，請參閱 GoogleSQL 參考資料中的「資料欄名稱」。

如果資料表名稱 (例如 test) 與其中一個資料欄名稱 (例如 test) 相同，SELECT 運算式會將 test 資料欄解讀為包含所有其他資料表資料欄的 STRUCT。如要避免發生這種衝突，請使用下列其中一種方法：

請勿為資料表及其資料欄使用相同名稱。
為表格指派其他別名。舉例來說，下列查詢會將資料表別名 t 指派給資料表 project1.dataset.test：
```
SELECT test FROM project1.dataset.test AS t;
```
參照資料欄時，請一併提供資料表名稱。例如：
```
SELECT test.test FROM project1.dataset.test;
```

彈性設定資料欄名稱

資料欄名稱的命名方式更靈活，包括擴大支援非英文語言的字元，以及其他符號。如果彈性資料欄名稱是加上引號的 ID，請務必使用倒引號 (`) 字元括住。

彈性資料欄名稱支援下列字元：

任何語言的任何字母，以 Unicode 規則運算式 \p{L} 表示。
任何語言的任何數字字元，以 Unicode 正規運算式 \p{N} 表示。
任何連接符號字元，包括底線，以 Unicode 規則運算式 \p{Pc} 表示。
連字號或破折號，以 Unicode 規則運算式 \p{Pd} 表示。
任何預期會與另一個字元搭配使用的標記，以 Unicode 規則運算式 \p{M} 表示。例如重音符號、變音符號或外框。
下列特殊字元：
- 以 Unicode 規則運算式 \u0026 表示的連字號 (&)。
- 百分比符號 (%)，以 Unicode 規則運算式 \u0025 表示。
- 等號 (=)，以 Unicode 規則運算式 \u003D 表示。
- 加號 (+)，以 Unicode 規則運算式 \u002B 表示。
- 冒號 (:)，以 Unicode 規則運算式 \u003A 表示。
- 以 Unicode 規則運算式 \u0027 表示的單引號 (')。
- 小於符號 (<)，以 Unicode 正規運算式 \u003C 表示。
- 大於符號 (>)，以 Unicode 規則運算式 \u003E 表示。
- 井號 (#)，以 Unicode 正則運算式 \u0023 表示。
- 以 Unicode 規則運算式 \u007c 表示的垂直線 (|)。
- 空格字元。

彈性資料欄名稱不支援下列特殊字元：

驚嘆號 (!)，以 Unicode 規則運算式 \u0021 表示。
引號 (")，以 Unicode 規則運算式 \u0022 表示。
以 Unicode 規則運算式 \u0024 表示的錢幣符號 ($)。
左括號 (()，以 Unicode 規則運算式 \u0028 表示。
右括號 ())，以 Unicode 規則運算式 \u0029 表示。
星號 (*)，以 Unicode 規則運算式 \u002A 表示。
以 Unicode 規則運算式 \u002C 表示的逗號 (,)。
句號 (.)，以 Unicode 規則運算式 \u002E 表示。使用資料欄名稱字元對應時，Parquet 檔案資料欄名稱中的句號不會替換為底線。詳情請參閱彈性資料欄限制。
以 Unicode 規則運算式 \u002F 表示的斜線 (/)。
以 Unicode 規則運算式 \u003B 表示的分號 (;)。
問號 (?)，以 Unicode 規則運算式 \u003F表示。
以 Unicode 規則運算式 \u0040 表示的 at 符號 (@)。
左方括號 ([)，以 Unicode 規則運算式 \u005B 表示。
反斜線 (\)，以 Unicode 規則運算式 \u005C 表示。
右方括號 (])，以 Unicode 正則運算式 \u005D 表示。
揚抑符號 (^)，以 Unicode 規則運算式 \u005E 表示。
Unicode 規則運算式 \u0060 代表的重音符 (`)。
左大括號 {{)，以 Unicode 規則運算式 \u007B 表示。
右大括號 (})，以 Unicode 正則運算式 \u007D 表示。
波浪號 (~)，以 Unicode 規則運算式 \u007E 表示。

如需其他規範，請參閱「資料欄名稱」。

BigQuery Storage Read API 和 BigQuery Storage Write API 都支援擴充的資料欄字元。如要透過 BigQuery Storage Read API 使用擴充的 Unicode 字元清單，必須設定旗標。您可以使用 displayName 屬性擷取欄名。以下範例說明如何使用 Python 用戶端設定旗標：

from google.cloud.bigquery_storage import types
requested_session = types.ReadSession()

#set avro serialization options for flexible column.
options = types.AvroSerializationOptions()
options.enable_display_name_attribute = True
requested_session.read_options.avro_serialization_options = options

如要透過 BigQuery Storage Write API 使用擴充的 Unicode 字元清單，您必須提供含有 column_name 標記的結構定義，除非您使用 JsonStreamWriter 寫入器物件。以下範例說明如何提供結構定義：

syntax = "proto2";
package mypackage;
// Source protos located in github.com/googleapis/googleapis
import "google/cloud/bigquery/storage/v1/annotations.proto";

message FlexibleSchema {
  optional string item_name_column = 1
  [(.google.cloud.bigquery.storage.v1.column_name) = "name-列"];
  optional string item_description_column = 2
  [(.google.cloud.bigquery.storage.v1.column_name) = "description-列"];
}

在本範例中，item_name_column 和 item_description_column 是預留位置名稱，必須符合通訊協定緩衝區命名慣例。請注意，column_name 註解一律優先於預留位置名稱。

限制

外部資料表不支援彈性資料欄名稱。

`NULL` 個值

請注意，對於載入工作，BigQuery 會忽略 list 複合類型的 NULL 元素，否則會被轉換為 NULL ARRAY 元素且不能保存到資料表中 (詳情請參閱資料類型)。

如要進一步瞭解 ORC 資料類型，請參閱 Apache ORC™ 規格 v1。

從 Cloud Storage 載入 ORC 資料

限制

事前準備

所需權限

將資料載入 BigQuery 的權限

從 Cloud Storage 載入資料的權限

所需權限

建立資料集

ORC 結構定義

ORC 壓縮

將 ORC 資料載入至新的資料表

主控台

SQL

bq

API

C#

Go

Java

Node.js

PHP

Python

Ruby

使用 ORC 資料附加到資料表或覆寫資料表

主控台

SQL

bq

API

C#

Go

Java

Node.js

PHP

Python

Ruby

載入 Hive 分區的 ORC 資料

ORC 轉換

原始類型

十進位類型

複合類型

欄名稱

彈性設定資料欄名稱

限制

NULL 個值

`NULL` 個值