Quickstart Using Java and Eclipse

This page describes how to create a Cloud Dataflow project and run an example pipeline from within Eclipse.

The Cloud Dataflow Eclipse plugin only works with the Cloud Dataflow SDK distribution, versions 2.0.0 to 2.5.0. The Cloud Dataflow Eclipse plugin does not work with the Apache Beam SDK distribution.

Before you begin

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. Select or create a GCP project.

    Go to the Manage resources page

  3. Make sure that billing is enabled for your project.

    Learn how to enable billing

  4. Enable the Cloud Dataflow, Compute Engine, Stackdriver Logging, Google Cloud Storage, Google Cloud Storage JSON, BigQuery, Google Cloud Pub/Sub, Google Cloud Datastore, and Google Cloud Resource Manager APIs.

    Enable the APIs

  5. Install and initialize the Cloud SDK.
  6. Ensure you have installed Eclipse IDE version 4.6 or later.
  7. Ensure you have installed the Java Development Kit (JDK) version 1.7 or later.
  8. Ensure you have installed the latest version of the Cloud Dataflow plugin.
    1. If you have not done so already, follow the Cloud Dataflow Quickstart to install the plugin.
    2. Or, select Help -> Check for Updates to update your plugin to the latest version.

Create a Cloud Dataflow project in Eclipse

To create a new project, use the New Project wizard to generate a template application that you can use as the start for your own application.

If you don't have an application, you can run the WordCount sample app to complete the rest of these procedures.

  1. Select File -> New -> Project.
  2. In the Google Cloud Platform directory, select Cloud Dataflow Java Project.
  3. A wizard to select the
    type of project you are creating. There are directories for General, Eclipse Modeling Framework,
    EJB, Java, and Java EE. There is also a Google Cloud Platform directory that is expanded,
    showing options for creating a App Engine Flexible Java Project, App Engine
    Standard Java Project, and Cloud Dataflow Java Project.
  4. Enter the Group ID.
  5. Enter the Artifact ID.
  6. Select the Project Template. For the WordCount sample, select Example pipelines.
  7. Select the Project Dataflow Version. For the WordCount sample, select 2.5.0.
  8. Enter the Package name. For the WordCount sample, enter com.google.cloud.dataflow.examples.
  9. A wizard to
    create a new dataflow project. Provides fields to enter group ID, artifact
    ID, Project template, Dataflow version, package name, workspace location,
    and name template. Has buttons to go back, move to next, cancel the
    operation, and to finish.
  10. Click Next.

Configure execution options

You should now see the Set Default Cloud Dataflow Run Options dialog.

  1. Select the account associated with your Google Cloud Platform project or add a new account. To add a new account:
    1. Select Add a new account... in the Account drop-down menu.
    2. A new browser window opens to complete the sign in process.
  2. Enter your Cloud Platform Project ID.
  3. Select a Cloud Storage Staging Location or create a new staging location. To create a new staging location:
    1. Enter a unique name for Cloud Storage Staging Location. Location name must include the bucket name and a folder. Objects are created in your Cloud Storage bucket inside the specified folder. Do not include sensitive information in the bucket name because the bucket namespace is global and publicly visible.
    2. Click Create Bucket.
    3. A
        dialog to enter GCP account, Cloud Platform ID, and
        Cloud Storage Staging Location. A Create button allows you to create a new
        staging location. Buttons exist to go back, advance to the next window, cancel
        the operation, or finish the operation.
  4. Click Browse to navigate to your service account key.
  5. Click Finish.

Run the WordCount example pipeline on the Cloud Dataflow service

After creating your Cloud Dataflow project, you can create pipelines that run on the Cloud Dataflow service. As an example, you can run the WordCount sample pipeline.

  1. Select Run -> Run Configurations.
  2. In the left menu, select Dataflow Pipeline.
  3. Click New Launch Configuration.
  4. A dialog
    to select the Dataflow Pipeline run configuration. Options include Apache
    Tomcat, App Engine Local Server, Dataflow Pipeline, Eclipse Application,
    Eclipse Data Tools. The mouse pointer hovers over the New Launch
    Configuration button, and the New launch configuration tooltip for that
    button displays.
  5. Click the Main tab.
  6. Click Browse to select your Cloud Dataflow project.
  7. Click Search... and select the WordCount Main Type.
  8. Click the Pipeline Arguments tab.
  9. Select the DataflowRunner runner.
  10. Click the Arguments tab.
  11. In the Program arguments field, set the output to your Cloud Storage Staging Location.
  12. A dialog with
    the Arguments tab selected. In the Program arguments field, the --output
    option is set to the writable staging location.
  13. Click Run.
  14. When the job finishes, among other output, you should see the following line in the Eclipse console:
    Submitted job: <job_id>

Clean up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this quickstart:

  1. Open the Cloud Storage browser in the Google Cloud Platform Console.
  2. Select the checkbox next to the bucket that you created.
  3. Click DELETE.
  4. Click Delete to confirm that you want to permanently delete the bucket and its contents.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataflow Documentation