This page shows you how to create a Cloud Dataflow project and run an example pipeline from within Eclipse.
Before you begin
Sign in to your Google account.
If you don't already have one, sign up for a new account.
- Select or create a Cloud Platform project.
- Enable billing for your project.
- Enable the Cloud Dataflow, Compute Engine, Stackdriver Logging, Google Cloud Storage, Google Cloud Storage JSON, BigQuery, Google Cloud Pub/Sub, Google Cloud Datastore, and Google Cloud Resource Manager APIs.
- Install and initialize the Cloud SDK.
- Ensure you have installed Eclipse IDE version 4.5 or later.
- Ensure you have installed the Java Development Kit (JDK) version 1.7 or later.
- Ensure you have installed the latest version of the Cloud Tools for Eclipse plugin.
- If you have not done so already, follow the Cloud Tools for Eclipse Quickstart to install the plugin.
- Or, select Help -> Check for Updates to update your plugin to the latest version.
Create a Cloud Dataflow project in Eclipse
To create a new project, use the New Project wizard to generate a template application that you can use as the start for your own application.
If you don't have an application, you can run the WordCount sample app to complete the rest of these procedures.
- Select File -> New -> Project.
- In the Google Cloud Platform directory, select Google Cloud Dataflow Java Project.
- Enter the Group ID.
- Enter the Artifact ID.
- Select the Project Template. For the WordCount sample, select Example pipelines.
- Select the Project Dataflow Version. For the WordCount sample, select 2.0.0.
- Enter the Package name. For the WordCount sample, enter com.google.cloud.dataflow.examples.
- Click Next.
Configure execution options
You should now see the Set Default Cloud Dataflow Run Options dialog.
- Select the account associated with your Google Cloud Platform project or add a new account. To add a new account:
- Select Add a new account... in the Account drop-down menu.
- A new browser window opens to complete the sign in process.
- Enter your Cloud Platform Project ID.
- Select a Cloud Storage Staging Location or create a new staging location. To create a new staging location:
- Enter a unique name for a Cloud Storage bucket for Cloud Storage Staging Location. Do not include sensitive information in the bucket name because the bucket namespace is global and publicly visible.
- Bucket names must contain only lowercase letters, numbers, dashes (
-), underscores (
_), and dots (
.). Names containing dots require verification.
- Bucket names must start and end with a number or letter.
- Bucket names must contain 3 to 63 characters. Names containing dots can contain up to 222 characters, but each dot-separated component can be no longer than 63 characters.
- Bucket names cannot be represented as an IP address in dotted-decimal notation (for example, 192.168.5.4).
- Bucket names cannot begin with the "goog" prefix.
- Bucket names cannot contain "google" or close misspellings of "google".
- Click Create.
- Click Finish.
Also, for DNS compliance and future compatibility, you should not use underscores
_) or have a period adjacent to another period or dash. For example, ".." or "-." or
".-" are not valid in DNS names.
Run the WordCount example pipeline on the Cloud Dataflow service
After creating your Cloud Dataflow project, you can create pipelines that run on the Cloud Dataflow service. As an example, you can run the WordCount sample pipeline.
- Select Run -> Run Configurations.
- In the left menu, select Dataflow Pipeline.
- Click New Launch Configuration.
- Click the Main tab.
- Click Browse to select your Dataflow project.
- Click Search... and select the WordCount Main Type.
- Click the Pipeline Arguments tab.
- Select the DataflowRunner runner.
- Click the Arguments tab.
- In the Program arguments field, set the output to your Cloud Storage Staging Location.
- Click Run.
- When the job finishes, among other output, you should see the following
line in the Eclipse console:
Submitted job: <job_id>
To avoid incurring charges to your Google Cloud Platform account for the resources used in this quickstart:
- Open the Cloud Storage browser in the Google Cloud Platform Console.
- Select the checkbox next to the bucket that you created.
- Click DELETE.
- Click Delete to confirm that you want to permanently delete the bucket and its contents .
- Read about the Dataflow Programming Model.
- Learn how to design and create your own pipeline.
- Work through the WordCount and Mobile Gaming examples.