Introduction to Cloud Data Fusion: Console

This page introduces the Cloud Data Fusion: Console, also known as the control plane. It's a set of API operations and a Google Cloud console interface that let you manage a Cloud Data Fusion instance. For example, using the console, you can create, delete, restart, or update an instance.

Before you begin

Cloud Data Fusion: Console overview

The following sections describe important aspects of the console.

Instances

An instance is a unique deployment of Cloud Data Fusion. To start using Cloud Data Fusion, you create an instance in the Google Cloud console. You can create multiple Cloud Data Fusion instances in a single Google Cloud project. You can specify a Google Cloud region for each instance. Each instance is a unique, independent Cloud Data Fusion deployment, which contains a set of services that handle pipeline lifecycle management, orchestration, coordination, and metadata management. These services run using long-running resources in a tenant project.

When you create the instance, consider the following options.

Edition

You create the instance in one of the following Cloud Data Fusion editions: Developer, Basic, or Enterprise. Choose the edition based on the following criteria:

  • Cost
  • Concurrency limits for pipeline execution
  • Role-based access control (RBAC) availability

The editions are intended for the following use cases:

Cloud Data Fusion edition Use case
Developer edition For development, testing, or small-scale integrations
Basic edition For production with moderate needs
Enterprise edition For large-scale, mission-critical data pipelines with RBAC

Public or private instance

Depending on your requirements, decide if you need a public or a private instance. The key differences between private and public instances in Cloud Data Fusion are network connectivity and security:

Cloud Data Fusion instance type Behavior
Public instance
  • Network connectivity: uses public IP addresses to connect to the internet.
  • Data access: directly accesses data sources on the public internet.
For more information, see Create a public instance.
Private instance
  • Network connectivity: uses private IP addresses within a Virtual Private Cloud (VPC) network.
  • Data access: requires preconfigured connections to access data sources. The following connections are supported:
    • On-premises data sources connected through VPN or Cloud Interconnect.
    • Other Google Cloud services running privately within the same VPC.

Authorization and service account

Cloud Data Fusion typically has two service accounts:

Design-time service account
This Google-managed service account, called the Cloud Data Fusion API Service Agent, is used in the tenant project of Cloud Data Fusion to access customer project resources.
Execution-time service account
This is the default Compute Engine service account that Cloud Data Fusion creates to deploy jobs that access other Google Cloud resources. By default, it attaches to a Dataproc cluster VM to enable Cloud Data Fusion to access Dataproc resources during a pipeline run.

For more information, see Service accounts in Cloud Data Fusion.

Logging and monitoring

Cloud Logging and Cloud Monitoring are crucial for gaining insights into the health and performance of your Cloud Data Fusion pipelines. You enable Logging and Monitoring only when you create the Cloud Data Fusion instance.

Enabling Logging and Monitoring lets you view Cloud Data Fusion pipeline logs in the Google Cloud console on the Logging viewer page.

Monitoring provides built-in dashboards for Cloud Data Fusion. You can also create custom dashboards to monitor specific metrics.

Lineage integration with Dataplex

Cloud Data Fusion provides an integration with Dataplex for lineage. For more information, see View lineage in Dataplex.

Encryption

Customer-managed encryption keys (CMEK) enable encryption of data at rest with a key that you can control through the Cloud Key Management Service. CMEK provides user control over the data written to Google Cloud internal resources in tenant projects and data written by Cloud Data Fusion pipelines. For more information, see Customer managed data encryption.

Manage permissions with role-based access control (RBAC)

Cloud Data Fusion lets you control with Identity and Access Management (IAM).

For granular permissions for actions performed in Cloud Data Fusion: Studio operations, use RBAC. For more information, see the RBAC overview.

Version upgrades

Cloud Data Fusion has versions. You can upgrade an instance to a later version in the Cloud Data Fusion console. For more information, see Versioning in Cloud Data Fusion.

What's next