Setting up Cloud Dataflow in Eclipse

This page shows you how to create a Cloud Dataflow project and run an example pipeline from within Eclipse.

Before you begin

  1. Install Eclipse IDE for Java EE Developers, version 4.5 or later.

    Download Eclipse

  2. Sign in to your Google account.

    If you don't already have one, sign up for a new account.

  3. Select or create a Cloud Platform project.

    Go to the Manage resources page

  4. Enable billing for your project.

    Enable billing

  5. Enable the Cloud Dataflow, Compute Engine, Stackdriver Logging, Google Cloud Storage, Google Cloud Storage JSON, BigQuery, Google Cloud Pub/Sub, Google Cloud Datastore, and Google Cloud Resource Manager APIs.

    Enable the APIs

  6. Ensure you have installed the latest version of the Google Cloud SDK.

  7. Ensure you have installed Java Development Kit version 1.7 or later.

  8. Ensure you have installed the latest version of the Cloud Tools for Eclipse plugin.

    1. If you have not done so already, follow the Quickstart guide to install the plugin.

    2. Or, select Help -> Check for Updates to update your plugin to the latest version.

Create a Cloud Dataflow project in Eclipse

To create a new project, use the New Project wizard to generate a template application that you can use as the start for your own application.

If you don't have an application, you can run the WordCount sample app to complete the rest of these procedures.

  1. Select File -> New -> Project.

  2. In the Google Cloud Platform directory, select Google Cloud Dataflow Java Project.

    A wizard to
select the type of project you are creating. There are directories for
General, Eclipse Modeling Framework, EJB, Java, and Java EE. There is also a
Google Cloud Platform directory that is expanded, showing options for
creating a Google App Engine Flexible Java Project, Google App Engine
Standard Java Project, and Google Cloud Dataflow Java Project.

  3. Enter the Group ID.

  4. Enter the Artifact ID.

  5. Select the Project Template. For the WordCount sample, select Example pipelines.

  6. Select the Project Dataflow Version. For the WordCount sample, select 2.0.0.

  7. Enter the Package name. For the WordCount sample, enter com.google.cloud.dataflow.examples.

    A wizard to
create a new dataflow project. Provides fields to enter group ID, artifact
ID, Project template, Dataflow version, package name, workspace location,
and name template. Has buttons to go back, move to next, cancel the
operation, and to finish.

  8. Click Next.

Configure execution options

You should now see the Set Default Cloud Dataflow Run Options dialog.

  1. Select the account associated with your Google Cloud Platform project or add a new account. To add a new account:

    1. Select Add a new account... in the Account drop-down menu.

    2. A new browser window opens to complete the sign in process.

  2. Enter your Cloud Platform Project ID.

  3. Select a Cloud Storage Staging Location or create a new staging location. To create a new staging location:

    1. Enter a unique name for the location. For the location name:

      • Do not include sensitive information in the bucket name since the bucket namespace is global and publicly visible.

      • Bucket names must contain only lowercase letters, numbers, dashes (-), underscores (_), and dots (.). Names containing dots require verification.
      • Bucket names must start and end with a number or letter.
      • Bucket names must contain 3 to 63 characters. Names containing dots can contain up to 222 characters, but each dot-separated component can be no longer than 63 characters.
      • Bucket names cannot be represented as an IP address in dotted-decimal notation (for example, 192.168.5.4).
      • Bucket names cannot begin with the "goog" prefix.
      • Bucket names cannot contain "google" or close misspellings of "google".

      Also, for DNS compliance and future compatibility, you should not use underscores (_) or have a period adjacent to another period or dash. For example, ".." or "-." or ".-" are not valid in DNS names.

    2. Click Create.

      A
dialog to enter GCP account, Cloud Platform ID, and Cloud Storage
Staging Location. A Create button allows you to create a new staging
location. Buttons exist to go back, advance to the next window, cancel
the operation, or finish the operation.

  4. Click Finish.

Run the WordCount example pipeline on Cloud Dataflow

After creating your Cloud Dataflow project, you can create pipelines that run on the Cloud Dataflow service. As an example, you can run the WordCount sample pipeline.

  1. Select Run -> Run Configurations.

  2. In the left menu, select Dataflow Pipeline.

  3. Click New Launch Configuration.

    A dialog
to select the Dataflow Pipeline run configuration. Options include Apache
Tomcat, App Engine Local Server, Dataflow Pipeline, Eclipse Application,
Eclipse Data Tools. The mouse pointer hovers over the New Launch
Configuration button, and the New launch configuration tooltip for that
button displays.

  4. Click the Main tab.

  5. Click Browse to select your Dataflow project.

  6. Click Search... and select the WordCount Main Type.

  7. Click the Pipeline Arguments tab.

  8. Select the DataflowRunner runner.

  9. Click the Arguments tab.

  10. In the Program arguments field, set the output to your Cloud Storage Staging Location.

    A dialog with
the Arguments tab selected. In the Program arguments field, the --output
option is set to the writable staging location.

  11. Click Run.

    When the job finishes, among other output, you should see the following line in the Eclipse console:

     Submitted job: <job_id>

Clean up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this quickstart:

  1. Open the Cloud Storage browser in the Google Cloud Platform Console.

  2. Select the checkbox next to the bucket that you created.

  3. Click Delete.

  4. Click Delete to confirm that you want to permanently delete the bucket and its contents.

What's next

Send feedback about...

Cloud Tools for Eclipse