该项目可以有多个 Cloud Data Fusion 实例。当您在 Cloud Data Fusion 界面或 Google Cloud CLI 中访问实例时,可以管理实例所拥有的资源和服务。
如需了解详情,请参阅有关租户项目的 Service Infrastructure 文档。
客户项目
客户创建并拥有此项目。默认情况下,Cloud Data Fusion 会在此项目中创建临时 Dataproc 集群,以运行您的流水线。
Cloud Data Fusion 实例
Cloud Data Fusion 实例是 Cloud Data Fusion 的唯一部署,您可以在其中设计和执行流水线。
您可以在单个项目中创建多个实例,并指定要在其中创建 Cloud Data Fusion 实例的 Google Cloud 区域。
根据您的要求和费用限制,您可以创建使用 Developer、Basic 或 Enterprise 版 Cloud Data Fusion 的实例。
每个实例都包含一个独一无二的独立 Cloud Data Fusion 部署,该部署中包含一组用于处理流水线生命周期管理、编排、协调和元数据管理的服务。这些服务使用租户项目中的长时间运行资源运行。
为来源 10.128.0.0/9 启用 tcp:0-65535;udp:0-65535;icmp,涵盖 10.128.0.1 到 10.255.255.254 之间的 IP 地址
默认允许 rdp
为来源“0.0.0.0/0”启用“tcp:3389”
默认允许 ssh
为来源“0.0.0.0/0”启用“tcp:22”
这些默认 VPC 网络设置可最大限度地减少设置云服务(包括 Cloud Data Fusion)的前提条件。考虑到网络安全,组织通常不允许您针对业务运营使用默认 VPC 网络。如果没有默认 VPC 网络,则无法创建 Cloud Data Fusion 公共实例。请改为创建私有实例。
默认 VPC 网络不授予对资源的访问权限。相反,Identity and Access Management (IAM) 会控制访问权限:
必须使用经过验证的身份才能登录 Google Cloud。
登录后,您需要明确权限(例如 Viewer 角色)才能查看 Google Cloud 服务。
专用实例
某些组织要求其所有生产系统都与公共 IP 地址隔离开来。Cloud Data Fusion 专用实例在所有类型的 VPC 网络设置中满足该要求。
Cloud Data Fusion 中的 Private Service Connect
Cloud Data Fusion 实例可能需要连接到位于本地、 Google Cloud或其他云提供商上的资源。将 Cloud Data Fusion 与内部 IP 地址搭配使用时,系统会通过Google Cloud 项目中的 VPC 网络建立与外部资源的连接。通过该网络的流量不会通过公共互联网。向 Cloud Data Fusion 授予使用 VPC 网络对等互连访问 VPC 的权限时,存在一些限制,这些限制在您使用大规模网络时会变得明显。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-04。"],[[["\u003cp\u003eThis document details how to connect to data sources from Cloud Data Fusion instances, covering both public and private instances in design and execution environments.\u003c/p\u003e\n"],["\u003cp\u003eCloud Data Fusion utilizes a tenant project for managing pipeline resources, and customer projects house the Dataproc clusters where pipelines are executed.\u003c/p\u003e\n"],["\u003cp\u003eThe design environment for pipelines resides in the Google-managed tenant project, while the execution environment resides within the customer's project, utilizing Dataproc clusters.\u003c/p\u003e\n"],["\u003cp\u003ePublic Cloud Data Fusion instances use the default VPC network with external IP addresses, while private instances isolate production systems from public IPs, ensuring secure data access.\u003c/p\u003e\n"],["\u003cp\u003ePrivate Service Connect provides an alternative to VPC network peering, allowing Cloud Data Fusion to establish private, secure connections to consumer VPC networks for enhanced control and flexibility.\u003c/p\u003e\n"]]],[],null,["# Introduction to Cloud Data Fusion networking\n\nThis page provides background information about connecting to your data sources\nfrom public or private Cloud Data Fusion instances from design and\nexecution environments.\n| **Note:** This information is valid for Cloud Data Fusion versions higher than 6.4\n\nBefore you begin\n----------------\n\nNetworking in Cloud Data Fusion requires a basic understanding of the\nfollowing: \n\n#### Tenant project\n\nCloud Data Fusion creates a tenant project that holds the resources\nand services needed to manage pipelines on your behalf, such as when it\nruns pipelines on the Dataproc clusters that reside in your\ncustomer project.\n\n\u003cbr /\u003e\n\n\nThe tenant project isn't exposed to you directly, but when\nyou create a private instance, you use the project's name to set up VPC\npeering. Each private instance in the tenant project has its own\nVPC network and subnet.\n\n\u003cbr /\u003e\n\n\nThe project can have multiple Cloud Data Fusion instances. You\nmanage the resources and services it holds when you access an instance in\nthe Cloud Data Fusion UI or Google Cloud CLI.\n\n\nFor more information, see the Service Infrastructure documentation about\n[tenant projects](/service-infrastructure/docs/glossary#tenant). \n\n#### Customer project\n\nThe customer creates and owns this project. By default,\nCloud Data Fusion creates an ephemeral Dataproc cluster\nin this project to run your pipelines. \n\n#### Cloud Data Fusion instance\n\nA Cloud Data Fusion instance is a unique deployment of\nCloud Data Fusion, where you design and execute pipelines.\n\nYou can create multiple instances in a single project and specify the\nGoogle Cloud region in which to create the Cloud Data Fusion\ninstances.\n\nBased on your requirements and cost constraints, you can create an\ninstance that uses the\n[Developer, Basic, or Enterprise](/data-fusion/pricing)\nedition of Cloud Data Fusion.\n\nEach instance contains a unique, independent Cloud Data Fusion\ndeployment that contains a set of services that handle pipeline lifecycle\nmanagement, orchestration, coordination, and metadata management. These\nservices run using long-running resources in a\n[tenant project](/service-infrastructure/docs/glossary#tenant).\n\nNetwork diagram\n---------------\n\nThe following diagrams show the connections when you build data pipelines that\nextract, transform, blend, aggregate, and load data from various on-premises and\ncloud data sources.\n\nSee the diagrams for\n[controlling egress in a private instance](/data-fusion/docs/how-to/egress-control)\nand\n[connecting to a public source](/data-fusion/docs/how-to/connect-to-public-source).\n\nPipeline design and execution\n-----------------------------\n\nCloud Data Fusion provides separation of design and execution environments,\nwhich lets you design a pipeline once, and then execute it in multiple\nenvironments. The design environment resides in the tenant\nproject, while the execution environment is in one or more customer projects.\n\nExample: You design your pipeline using Cloud Data Fusion services, such as\nWrangler and Preview. Those services run in the tenant project, where access to\ndata is controlled by the Google-managed\n[Cloud Data Fusion Service Agent](/iam/docs/understanding-roles#datafusion.serviceAgent)\nrole. You then execute the pipeline in your customer project so that it uses\nyour Dataproc cluster. In the customer project, the default\nCompute Engine service account controls access to data. You can configure your\nproject to use a custom service account.\n\nFor more information about configuring service accounts, see\n[Cloud Data Fusion service accounts](/data-fusion/docs/concepts/service-accounts).\n\n### Design environment\n\nWhen you create a Cloud Data Fusion instance in your customer project,\nCloud Data Fusion automatically creates a separate, Google-managed tenant\nproject to run the services required to manage the lifecycle of pipelines and\nmetadata, the Cloud Data Fusion UI, and design-time tools like Preview and\nWrangler.\n\n#### DNS resolution in Cloud Data Fusion\n\nTo resolve domain names in your design-time environment when you wrangle and\npreview the data that you're transferring into Google Cloud, use DNS Peering\n(available starting in Cloud Data Fusion 6.7.0). It lets you use domain or\nhostnames for sources and sinks, which you don't need to reconfigure as often as\nIP addresses.\n\nDNS resolution is recommended in your design-time environment in\nCloud Data Fusion, when you test connections and preview pipelines that use\ndomain names of on-premises or other servers (such as databases or FTP servers),\nin a private VPC network.\n\nFor more information, see\n[DNS Peering](/dns/docs/zones/zones-overview#peering_zones) and\n[Cloud DNS Forwarding](https://cloud.google.com/blog/products/networking/announcing-cloud-dns-forwarding-unifying-hybrid-cloud-naming).\n\n### Execution environment\n\nAfter you verify and deploy your pipeline in an instance, you either execute the\npipeline manually, or it executes on a time schedule or a pipeline state\ntrigger.\n\nWhether the execution environment is provisioned and managed by\nCloud Data Fusion or the customer, the environment exists in your customer\nproject.\n| **Note:** As an execution environment and a managed service, Dataproc has its own network and firewall requirements.\n\nPublic instances (default)\n--------------------------\n\nThe easiest way to provision a Cloud Data Fusion instance is to create\na public instance. It serves well as a starting point and provides access to\nexternal endpoints on the public internet.\n\nA public instance in Cloud Data Fusion uses the default\nVPC network in your project.\n| **Note:** Both design and execution environments have external IP addresses that are kept behind a firewall to control access.\n\nThe default VPC network has the following:\n\n- Autogenerated subnets for each region\n- Routing tables\n- Firewall rules to ensure communication among your computing resources\n\n### Networking across regions\n\nWhen you create a new project, a benefit of the default VPC\nnetwork is that it autopopulates one subnet per region using a predefined IP\naddress range, expressed as a CIDR block. The IP address ranges start with\n`10.128.0.0/20`, `10.132.0.0/20`, across the Google Cloud global regions.\n\nTo ensure that your computing resources connect to each other across regions,\nthe default VPC network sets the default local routes to each\nsubnet. By setting up the default route to the internet (`0.0.0.0/0`), you gain\naccess to the internet and capture any unrouted network traffic.\n\n### Firewall rules\n\nThe default VPC network provides a set of firewall rules:\n\nThese default VPC network settings minimize the prerequisites for\nsetting up cloud services, including Cloud Data Fusion. Due to concerns\nabout network security, organizations often don't let you use the default\nVPC network for business operations. Without the default\nVPC network, you cannot create a Cloud Data Fusion public\ninstance. Instead,\n[create a private instance](/data-fusion/docs/how-to/create-private-ip).\n\nThe default VPC network does not grant open access to resources.\nInstead, Identity and Access Management (IAM) controls access:\n\n- A validated identity is required to sign in to Google Cloud.\n- After you've logged in, you need explicit permission (for example, the Viewer role) to view Google Cloud services.\n\nPrivate instances\n-----------------\n\nSome organizations require that all of their production systems be isolated\nfrom public IP addresses. A Cloud Data Fusion private instance meets that\nrequirement in all kinds of VPC network settings.\n\n### Private Service Connect in Cloud Data Fusion\n\nCloud Data Fusion instances might need to connect to resources located\non-premises, on Google Cloud, or on other cloud providers. When using\nCloud Data Fusion with internal IP addresses, connections to external\nresources are established over the VPC network in your\nGoogle Cloud project. Traffic over the network doesn't go through the\npublic internet. When Cloud Data Fusion is provided access to your\nVPC using VPC network peering, there are limitations,\nwhich become apparent when you use large-scale networks.\n\nWith Private Service Connect interfaces, Cloud Data Fusion\nconnects to your VPC without the use of VPC network peering.\nThe [Private Service Connect interface](/vpc/docs/about-private-service-connect-interfaces) is\na type of\n[Private Service Connect](/vpc/docs/private-service-connect)\nthat provides a way for Cloud Data Fusion to initiate private and secure\nconnections to consumer VPC networks. This not only provides the flexibility and\nease of access (like VPC network peering), but also provides the explicit\nauthorization and consumer-side control that\nPrivate Service Connect offers. For more information, see [Create\na private instance with\nPrivate Service Connect](/data-fusion/docs/how-to/configure-private-service-connect).\n\nAccess to data in design and execution environments\n---------------------------------------------------\n\nIn a public instance, network communication happens over the open internet,\nwhich is not recommended for critical environments. To securely access your data\nsources, always execute your pipelines from a private instance in your execution\nenvironment.\n\nAccess to sources\n-----------------\n\nWhen accessing data sources, public and private instances:\n\n- make outgoing calls to Google Cloud APIs using Private Google Access\n- communicate with an execution (Dataproc) environment through VPC peering\n\nThe following table compares public and private instances during design and\nexecution for various data sources:\n\nWhat's next\n-----------\n\n- [Access control in Cloud Data Fusion](/data-fusion/docs/access-control)\n- [Service accounts in Cloud Data Fusion](/data-fusion/docs/concepts/service-accounts)\n- [Creating a public instance](/data-fusion/docs/how-to/create-instance)\n- [Creating a private instance](/data-fusion/docs/how-to/create-private-ip)"]]