This page explains how to create a Cloud Data Fusion instance.
Before you begin
- Enable the Cloud Data Fusion API.
- The following permission is required to create Cloud Data Fusion
instances.create. For more information, see Access control.
- Cloud Data Fusion instances run as the Compute Engine default service account. For information about the types and roles available, see Service accounts.
- By default, Cloud Data Fusion executes pipelines using a Dataproc cluster in your project. Ensure that your project meets the Dataproc networking requirements.
- New projects start with a default network. The default network is pre-populated with a firewall rule, default-allow-ssh, that allows incoming connections on TCP port 22 from any source to any instance in the network. If this rule to allow ingress on TCP port 22 rule doesn't exist in the network used by your Cloud Data Fusion, you must create this rule.
Creating an instance
If the API is enabled, the Cloud Data Fusion section in the Cloud Console shows an Instances page where you can manage your Cloud Data Fusion instances. When no instances exist, the page has a link to create an instance, along with some useful links to documentation and samples.
- Click Create Instance.
- Enter an Instance name.
- Enter a Description for your instance.
- Specify the Region in which to create the instance.
- Specify the Cloud Data Fusion Version you prefer.
- Select the Cloud Data Fusion Edition you prefer.
- Specify the Dataproc service account to use for running your Cloud Data Fusion pipeline in Dataproc. You must also authorize Cloud Data Fusion to grant the Service Account User and Data Fusion Runner roles to the Cloud Data Fusion service account.
Specify any additional settings. If you do not specify anything for the additional settings, the following defaults are used:
Category Setting Description Default Edition Developer, Basic or Enterprise Instance and pipeline features Basic Advanced Options Private IP Enable private IP addresses Create a Cloud Data Fusion instance that uses private IP addresses. Disabled Logging and Monitoring Enable Cloud Logging service Option to enable Cloud Logging Disabled Enable Cloud Monitoring service Option to enable Cloud Monitoring Disabled Labels <Key> <Value> pair(s) The resource labels for the instance to use to annotate any related underlying resources, such as Compute Engine VMs. Label keys and label values can only contain letters, numbers, dashes, and underscores. Label keys must start with a letter or number. None
Click Create. It takes up to 30 minutes for the instance creation process to complete.
Create an instance: