Questa pagina è stata tradotta dall'API Cloud Translation.

Crea una pipeline Dataflow utilizzando Java

Questo documento mostra come configurare il progetto Google Cloud Platform, creare una pipeline di esempio creata con l'SDK Apache Beam per Java ed eseguire la pipeline di esempio nel servizio Dataflow. La pipeline legge un file di testo da Cloud Storage, conta il numero di parole uniche nel file e poi scrive i conteggi delle parole in Cloud Storage. Per un'introduzione alla pipeline WordCount, guarda il video Come utilizzare WordCount in Apache Beam.

Questo tutorial richiede Maven, ma è anche possibile convertire il progetto di esempio da Maven a Gradle. Per scoprire di più, consulta (Facoltativo) Convertire da Maven a Gradle.

Per seguire le indicazioni dettagliate per questa attività direttamente nella Google Cloud console, fai clic su Procedura guidata:

Procedura guidata

Prima di iniziare

Sign in to your Google Cloud Platform account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

Install the Google Cloud CLI.

Se utilizzi un provider di identità (IdP) esterno, devi prima accedere a gcloud CLI con la tua identità federata.

Per inizializzare gcloud CLI, esegui questo comando:

gcloud init

Create or select a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Verify that billing is enabled for your Google Cloud project.

Enable the Dataflow, Compute Engine, Cloud Logging, Cloud Storage, Google Cloud Storage JSON, BigQuery, Cloud Pub/Sub, Cloud Datastore, and Cloud Resource Manager APIs:

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

gcloud services enable dataflow compute_component logging storage_component storage_api bigquery pubsub datastore.googleapis.com cloudresourcemanager.googleapis.com

Create local authentication credentials for your user account:

gcloud auth application-default login

If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.

Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/iam.serviceAccountUser

gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE

Replace the following:

PROJECT_ID: your project ID.
USER_IDENTIFIER: the identifier for your user account—for example, myemail@example.com.
ROLE: the IAM role that you grant to your user account.

Install the Google Cloud CLI.

Se utilizzi un provider di identità (IdP) esterno, devi prima accedere a gcloud CLI con la tua identità federata.

Per inizializzare gcloud CLI, esegui questo comando:

gcloud init

Create or select a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Verify that billing is enabled for your Google Cloud project.

Enable the Dataflow, Compute Engine, Cloud Logging, Cloud Storage, Google Cloud Storage JSON, BigQuery, Cloud Pub/Sub, Cloud Datastore, and Cloud Resource Manager APIs:

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

gcloud services enable dataflow compute_component logging storage_component storage_api bigquery pubsub datastore.googleapis.com cloudresourcemanager.googleapis.com

Create local authentication credentials for your user account:

gcloud auth application-default login

If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.

Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/iam.serviceAccountUser

gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE

Replace the following:

PROJECT_ID: your project ID.
USER_IDENTIFIER: the identifier for your user account—for example, myemail@example.com.
ROLE: the IAM role that you grant to your user account.

Concedi ruoli al account di servizio Compute Engine predefinito. Esegui il seguente comando una volta per ciascuno dei seguenti ruoli IAM:
- roles/dataflow.admin
- roles/dataflow.worker
- roles/storage.objectAdmin
```
gcloud projects add-iam-policy-binding PROJECT_ID --member="serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com" --role=SERVICE_ACCOUNT_ROLE
```
- Sostituisci PROJECT_ID con l'ID progetto.
- Sostituisci PROJECT_NUMBER con il numero del tuo progetto. Per trovare il numero del progetto, consulta Identifica i progetti o utilizza il comando gcloud projects describe.
- Sostituisci SERVICE_ACCOUNT_ROLE con ogni singolo ruolo.
Create a Cloud Storage bucket and configure it as follows:
- Set the storage class to S (Standard).
- Imposta la posizione di archiviazione su: US (Stati Uniti).
- Sostituisci BUCKET_NAME con un nome di bucket univoco. Non includere informazioni sensibili nel nome del bucket perché lo spazio dei nomi dei bucket è globale e visibile pubblicamente.
- Copia quanto segue, perché ti servirà in una sezione successiva:
  - Il nome del bucket Cloud Storage.
  - L'ID progetto Google Cloud . Per trovare questo ID, consulta Identificazione dei progetti.
- Scarica e installa la versione 11 del Java Development Kit (JDK). (Dataflow continua a supportare la versione 8.) Verifica che la variabile di ambiente JAVA_HOME sia impostata e punti all'installazione di JDK.
- Scarica e installa Apache Maven seguendo la guida all'installazione di Maven per il tuo sistema operativo specifico.

Crea una pipeline Dataflow utilizzando Java

Prima di iniziare

Recupera il codice della pipeline

Linux o macOS

Windows

Linux o macOS

Windows

Esegui la pipeline in locale

Esegui la pipeline sul servizio Dataflow

Visualizza i tuoi risultati

Esegui la pulizia

Elimina il progetto

Elimina le singole risorse

Passaggi successivi