Quickstart Using Java and Eclipse

This page shows you how to create a Cloud Dataflow project and run an example pipeline from within Eclipse.

Before you begin

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. Select or create a GCP project.

    Go to the Manage resources page

  3. Make sure that billing is enabled for your project.

    Learn how to enable billing

  4. Enable the Cloud Dataflow, Compute Engine, Stackdriver Logging, Google Cloud Storage, Google Cloud Storage JSON, BigQuery, Google Cloud Pub/Sub, Google Cloud Datastore, and Google Cloud Resource Manager APIs.

    Enable the APIs

  5. Install and initialize the Cloud SDK.
  6. Ensure you have installed Eclipse IDE version 4.5 or later.
  7. Ensure you have installed the Java Development Kit (JDK) version 1.7 or later.
  8. Ensure you have installed the latest version of the Cloud Tools for Eclipse plugin.
    1. If you have not done so already, follow the Cloud Tools for Eclipse Quickstart to install the plugin.
    2. Or, select Help -> Check for Updates to update your plugin to the latest version.

Create a Cloud Dataflow project in Eclipse

To create a new project, use the New Project wizard to generate a template application that you can use as the start for your own application.

If you don't have an application, you can run the WordCount sample app to complete the rest of these procedures.

  1. Select File -> New -> Project.
  2. In the Google Cloud Platform directory, select Google Cloud Dataflow Java Project.
  3. A wizard to
    select the type of project you are creating. There are directories for
    General, Eclipse Modeling Framework, EJB, Java, and Java EE. There is also a
    Google Cloud Platform directory that is expanded, showing options for
    creating a Google App Engine Flexible Java Project, Google App Engine
    Standard Java Project, and Google Cloud Dataflow Java Project.
  4. Enter the Group ID.
  5. Enter the Artifact ID.
  6. Select the Project Template. For the WordCount sample, select Example pipelines.
  7. Select the Project Dataflow Version. For the WordCount sample, select 2.4.0.
  8. Enter the Package name. For the WordCount sample, enter com.google.cloud.dataflow.examples.
  9. A wizard to
    create a new dataflow project. Provides fields to enter group ID, artifact
    ID, Project template, Dataflow version, package name, workspace location,
    and name template. Has buttons to go back, move to next, cancel the
    operation, and to finish.
  10. Click Next.

Configure execution options

You should now see the Set Default Cloud Dataflow Run Options dialog.

  1. Select the account associated with your Google Cloud Platform project or add a new account. To add a new account:
    1. Select Add a new account... in the Account drop-down menu.
    2. A new browser window opens to complete the sign in process.
  2. Enter your Cloud Platform Project ID.
  3. Select a Cloud Storage Staging Location or create a new staging location. To create a new staging location:
    1. Enter a unique name for a Cloud Storage bucket for Cloud Storage Staging Location. Do not include sensitive information in the bucket name because the bucket namespace is global and publicly visible.
    2. Click Create.
    3. A
      dialog to enter GCP account, Cloud Platform ID, and Cloud Storage
      Staging Location. A Create button allows you to create a new staging
      location. Buttons exist to go back, advance to the next window, cancel
      the operation, or finish the operation.
  4. Click Finish.

Run the WordCount example pipeline on the Cloud Dataflow service

After creating your Cloud Dataflow project, you can create pipelines that run on the Cloud Dataflow service. As an example, you can run the WordCount sample pipeline.

  1. Select Run -> Run Configurations.
  2. In the left menu, select Dataflow Pipeline.
  3. Click New Launch Configuration.
  4. A dialog to select the
   Dataflow Pipeline run configuration. Options include Apache Tomcat, App
   Engine Local Server, Dataflow Pipeline, Eclipse Application, Eclipse Data
   Tools. The mouse pointer hovers over the New Launch Configuration button, and
   the New launch configuration tooltip for that button displays.
  5. Click the Main tab.
  6. Click Browse to select your Dataflow project.
  7. Click Search... and select the WordCount Main Type.
  8. Click the Pipeline Arguments tab.
  9. Select the DataflowRunner runner.
  10. Click the Arguments tab.
  11. In the Program arguments field, set the output to your Cloud Storage Staging Location.
  12. A dialog with the Arguments tab
   selected. In the Program arguments field, the --output option is set to the
   writable staging location.
  13. Click Run.
  14. When the job finishes, among other output, you should see the following line in the Eclipse console:
    Submitted job: <job_id>

Clean up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this quickstart:

  1. Open the Cloud Storage browser in the Google Cloud Platform Console.
  2. Select the checkbox next to the bucket that you created.
  3. Click DELETE.
  4. Click Delete to confirm that you want to permanently delete the bucket and its contents.

What's next

Send feedback about...

Cloud Dataflow Documentation