本頁面由 Cloud Translation API 翻譯而成。

使用 Python 建立 Dataflow 管道

本文說明如何使用 Apache Beam SDK for Python 建構定義管道的程式。接著，您可以使用直接本機執行器或雲端執行器 (例如 Dataflow) 執行管道。如要瞭解 WordCount 管道，請觀看「如何在 Apache Beam 中使用 WordCount」影片。

如要直接在 Google Cloud 控制台按照逐步指南操作，請按一下「Guide me」(逐步引導)：

逐步引導

事前準備

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

Install the Google Cloud CLI.

如果您使用外部識別資訊提供者 (IdP)，請先使用聯合身分登入 gcloud CLI。

如要初始化 gcloud CLI，請執行下列指令：

gcloud init

Create or select a Google Cloud project.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Verify that billing is enabled for your Google Cloud project.

Enable the Dataflow, Compute Engine, Cloud Logging, Cloud Storage, Google Cloud Storage JSON, BigQuery, Cloud Pub/Sub, Cloud Datastore, and Cloud Resource Manager APIs:

gcloud services enable dataflow compute_component logging storage_component storage_api bigquery pubsub datastore.googleapis.com cloudresourcemanager.googleapis.com

Create local authentication credentials for your user account:

gcloud auth application-default login

If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.

Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/iam.serviceAccountUser

gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE

Replace the following:

PROJECT_ID: your project ID.
USER_IDENTIFIER: the identifier for your user account—for example, myemail@example.com.
ROLE: the IAM role that you grant to your user account.

Install the Google Cloud CLI.

如果您使用外部識別資訊提供者 (IdP)，請先使用聯合身分登入 gcloud CLI。

如要初始化 gcloud CLI，請執行下列指令：

gcloud init

Create or select a Google Cloud project.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Verify that billing is enabled for your Google Cloud project.

Enable the Dataflow, Compute Engine, Cloud Logging, Cloud Storage, Google Cloud Storage JSON, BigQuery, Cloud Pub/Sub, Cloud Datastore, and Cloud Resource Manager APIs:

gcloud services enable dataflow compute_component logging storage_component storage_api bigquery pubsub datastore.googleapis.com cloudresourcemanager.googleapis.com

Create local authentication credentials for your user account:

gcloud auth application-default login

If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.

Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/iam.serviceAccountUser

gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE

Replace the following:

PROJECT_ID: your project ID.
USER_IDENTIFIER: the identifier for your user account—for example, myemail@example.com.
ROLE: the IAM role that you grant to your user account.

將角色授予 Compute Engine 預設服務帳戶。針對下列每個 IAM 角色，執行一次下列指令：
- roles/dataflow.admin
- roles/dataflow.worker
- roles/storage.objectAdmin
```
gcloud projects add-iam-policy-binding PROJECT_ID --member="serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com" --role=SERVICE_ACCOUNT_ROLE
```
- 將 PROJECT_ID 替換為您的專案 ID。
- 將 PROJECT_NUMBER 替換為專案編號。如要找出專案編號，請參閱「識別專案」一文，或使用 gcloud projects describe 指令。
- 將 SERVICE_ACCOUNT_ROLE 替換為各個角色。
Create a Cloud Storage bucket and configure it as follows:
- Set the storage class to S (標準)。
- 將儲存空間位置設定為下列項目： US (美國)。
- 將 BUCKET_NAME 替換成不重複的值區名稱。請勿在值區名稱中加入任何機密資訊，因為值區命名空間屬於全域性質，而且會公開顯示。
- 複製 Google Cloud 專案 ID 和 Cloud Storage 值區名稱。您會在本文後續步驟中用到這些值。

使用 Python 建立 Dataflow 管道

事前準備

設定環境

取得 Apache Beam SDK

在本機執行管道

在 Dataflow 服務上執行管道

查看結果

Google Cloud 控制台

本機終端機

修改管道程式碼

清除所用資源

後續步驟