Quickstart Using Java and Eclipse

This page shows you how to set up your Java development environment with the Cloud Dataflow Plugin for Eclipse and run an example pipeline from within the Eclipse IDE.

Before you begin

  1. Sign in to your Google account.

    If you don't already have one, sign up for a new account.

  2. Select or create a Cloud Platform project.

    Go to the Projects page

  3. Enable billing for your project.

    Enable billing

  4. Enable the Google Dataflow, Compute Engine, Stackdriver Logging, Google Cloud Storage, Google Cloud Storage JSON, BigQuery, Google Cloud Pub/Sub, and Google Cloud Datastore APIs.

    Enable the APIs

  5. Install and initialize the Cloud SDK.
  6. Authenticate with the Cloud Platform. Run the following command to get Application Default Credentials.
        gcloud auth application-default login
  7. Ensure you have installed Eclipse IDE version 4.4 (Luna) or later.
  8. Download and install the Java Development Kit (JDK) version 1.7 or later. Verify that the JAVA_HOME environment variable is set and points to your JDK installation.

Install the Cloud Dataflow plugin for Eclipse

  1. In the Eclipse IDE, select Help > Install New Software.
  2. In the Work with text box, enter https://dl.google.com/dataflow/eclipse/.
    Install the Dataflow Plugin
  3. Click Next.
  4. Select Google Cloud Dataflow and click Next.
  5. Review the installation details and click Next.
  6. Review the license, click I accept the terms of the license agreement and click Finish.
  7. If prompted to restart Eclipse, click Yes.

When you install the Cloud Dataflow Plugin for Eclipse, the Eclipse IDE will automatically download the Cloud Dataflow SDK for Java.

Create a Cloud Dataflow project in Eclipse

  1. Choose the File menu and select New -> Project.
  2. In the wizard list, expand Google Cloud Platform and select Cloud Dataflow Java Project.
    Create a New Project
  3. Click Next. You should see the Create a Cloud Dataflow Project wizard dialog.
  4. Enter my.group.id for Group ID.
  5. Enter my-artifact for Artifact ID.
  6. Select Example Pipelines for Project Template.
  7. Enter com.google.cloud.dataflow.examples for Package.
  8. Click Next.

Configure execution options

You should now see the Set Default Cloud Dataflow Run Options dialog.

  1. Enter your Cloud Platform Project ID for Cloud Platform Project ID.
  2. Enter a unique name for a Cloud Storage bucket for Cloud Storage Staging Location. Do not include sensitive information in the bucket name, because the bucket namespace is global and publicly visible.
  3. Click Create. This will create a new Google Cloud Storage bucket for staging your pipeline's code and storing your pipeline's output.
  4. Click Finish.

Run the WordCount example pipeline on the Cloud Dataflow service

  1. Choose the Run menu and select Run Configurations.
  2. Select Dataflow Pipeline and click the New Launch Configuration button.
    Launch Configuration Types
  3. Click the Main tab.
  4. Click the Select button and select WordCount. Click OK.
  5. Click the Pipeline Arguments tab.
  6. In the Runner category, select BlockingDataflowPipelineRunner.
    Run the generated pipeline on the Dataflow Service
  7. Click Run.
  8. When the execution finishes, among other output, you should see the following line in the Eclipse console:
      Submitted job: <job_id>
  9. Check that your job succeeded:

    1. Open the Cloud Dataflow Monitoring UI in the Google Cloud Platform Console.
      Go to the Cloud Dataflow Monitoring UI

      You should see your wordcount job with a status of Running at first, and then Succeeded:

      Cloud Dataflow Jobs
    2. Open the Cloud Storage Browser in the Google Cloud Platform Console.
      Go to the Cloud Storage browser

      In your bucket, you should see the output files and staging files that your job created:

      Cloud Storage bucket

Clean up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this quickstart:

  1. Open the Cloud Storage browser in the Google Cloud Platform Console.
  2. Select the checkbox next to the bucket that you created.
  3. Click DELETE.
  4. Click Delete to permanently delete the bucket and its contents.

What's next

Send feedback about...

Cloud Dataflow Documentation