專案可以有多個 Cloud Data Fusion 執行個體。您可以在 Cloud Data Fusion UI 或 Google Cloud CLI 中存取執行個體時,管理其所控的資源和服務。
如需更多資訊,請參閱服務基礎架構說明文件中的用戶群專案。
客戶專案
客戶建立並擁有這個專案。根據預設,Cloud Data Fusion 會在這個專案中建立暫時性 Dataproc 叢集,以便執行管道。
Cloud Data Fusion 執行個體
Cloud Data Fusion 執行個體是 Cloud Data Fusion 的獨特部署,可用於設計及執行管道。您可以在單一專案中建立多個執行個體,並指定要建立 Cloud Data Fusion 執行個體的 Google Cloud 區域。您可以依據需求和成本限制,建立使用 Cloud Data Fusion 開發人員、基本或企業版的執行個體。每個執行個體中有專屬、獨立的 Cloud Data Fusion 部署,其中包含的一組服務會負責管道生命週期管理、自動化調度管理、協調作業及中繼資料管理。這些服務會使用用戶群專案中的長期執行資源來執行。
為來源 10.128.0.0/9 啟用 tcp:0-65535;udp:0-65535;icmp,涵蓋最小 10.128.0.1 到最大 10.255.255.254 IP 位址
預設允許 rdp
為來源 0.0.0.0/0 啟用 tcp:3389
預設允許 ssh
為來源 0.0.0.0/0 啟用 tcp:22
這些預設的 VPC 網路設定可減少設定雲端服務 (包括 Cloud Data Fusion) 的必要條件。基於網路安全考量,機構組織通常不會允許您使用預設的虛擬私有雲網路進行業務作業。沒有預設虛擬私有雲網路,就無法建立 Cloud Data Fusion 公開執行個體。請改為建立私人執行個體。
預設的虛擬私有雲網路不會授予資源的開放存取權。而是由身分與存取權管理 (IAM) 控管存取權:
您必須提供已驗證的身分證件,才能登入 Google Cloud。
登入後,您必須具備明確的權限 (例如「檢視者」角色),才能查看 Google Cloud 服務。
私人執行個體
有些機構規定所有實際工作環境系統都必須與公開 IP 位址隔離。Cloud Data Fusion 私人執行個體可在所有類型的虛擬私有雲網路設定中滿足這項需求。
Cloud Data Fusion 中的 Private Service Connect
Cloud Data Fusion 執行個體可能需要連結至地端、 Google Cloud或其他雲端服務供應商的資源。使用內部 IP 位址的 Cloud Data Fusion 時,系統會透過Google Cloud 專案中的 VPC 網路建立外部資源連線。網路上的流量不會經過公開網際網路。當 Cloud Data Fusion 透過 VPC 網路對等互連功能存取 VPC 時,會受到限制,這在使用大型網路時會更加明顯。
[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-04 (世界標準時間)。"],[[["\u003cp\u003eThis document details how to connect to data sources from Cloud Data Fusion instances, covering both public and private instances in design and execution environments.\u003c/p\u003e\n"],["\u003cp\u003eCloud Data Fusion utilizes a tenant project for managing pipeline resources, and customer projects house the Dataproc clusters where pipelines are executed.\u003c/p\u003e\n"],["\u003cp\u003eThe design environment for pipelines resides in the Google-managed tenant project, while the execution environment resides within the customer's project, utilizing Dataproc clusters.\u003c/p\u003e\n"],["\u003cp\u003ePublic Cloud Data Fusion instances use the default VPC network with external IP addresses, while private instances isolate production systems from public IPs, ensuring secure data access.\u003c/p\u003e\n"],["\u003cp\u003ePrivate Service Connect provides an alternative to VPC network peering, allowing Cloud Data Fusion to establish private, secure connections to consumer VPC networks for enhanced control and flexibility.\u003c/p\u003e\n"]]],[],null,["# Introduction to Cloud Data Fusion networking\n\nThis page provides background information about connecting to your data sources\nfrom public or private Cloud Data Fusion instances from design and\nexecution environments.\n| **Note:** This information is valid for Cloud Data Fusion versions higher than 6.4\n\nBefore you begin\n----------------\n\nNetworking in Cloud Data Fusion requires a basic understanding of the\nfollowing: \n\n#### Tenant project\n\nCloud Data Fusion creates a tenant project that holds the resources\nand services needed to manage pipelines on your behalf, such as when it\nruns pipelines on the Dataproc clusters that reside in your\ncustomer project.\n\n\u003cbr /\u003e\n\n\nThe tenant project isn't exposed to you directly, but when\nyou create a private instance, you use the project's name to set up VPC\npeering. Each private instance in the tenant project has its own\nVPC network and subnet.\n\n\u003cbr /\u003e\n\n\nThe project can have multiple Cloud Data Fusion instances. You\nmanage the resources and services it holds when you access an instance in\nthe Cloud Data Fusion UI or Google Cloud CLI.\n\n\nFor more information, see the Service Infrastructure documentation about\n[tenant projects](/service-infrastructure/docs/glossary#tenant). \n\n#### Customer project\n\nThe customer creates and owns this project. By default,\nCloud Data Fusion creates an ephemeral Dataproc cluster\nin this project to run your pipelines. \n\n#### Cloud Data Fusion instance\n\nA Cloud Data Fusion instance is a unique deployment of\nCloud Data Fusion, where you design and execute pipelines.\n\nYou can create multiple instances in a single project and specify the\nGoogle Cloud region in which to create the Cloud Data Fusion\ninstances.\n\nBased on your requirements and cost constraints, you can create an\ninstance that uses the\n[Developer, Basic, or Enterprise](/data-fusion/pricing)\nedition of Cloud Data Fusion.\n\nEach instance contains a unique, independent Cloud Data Fusion\ndeployment that contains a set of services that handle pipeline lifecycle\nmanagement, orchestration, coordination, and metadata management. These\nservices run using long-running resources in a\n[tenant project](/service-infrastructure/docs/glossary#tenant).\n\nNetwork diagram\n---------------\n\nThe following diagrams show the connections when you build data pipelines that\nextract, transform, blend, aggregate, and load data from various on-premises and\ncloud data sources.\n\nSee the diagrams for\n[controlling egress in a private instance](/data-fusion/docs/how-to/egress-control)\nand\n[connecting to a public source](/data-fusion/docs/how-to/connect-to-public-source).\n\nPipeline design and execution\n-----------------------------\n\nCloud Data Fusion provides separation of design and execution environments,\nwhich lets you design a pipeline once, and then execute it in multiple\nenvironments. The design environment resides in the tenant\nproject, while the execution environment is in one or more customer projects.\n\nExample: You design your pipeline using Cloud Data Fusion services, such as\nWrangler and Preview. Those services run in the tenant project, where access to\ndata is controlled by the Google-managed\n[Cloud Data Fusion Service Agent](/iam/docs/understanding-roles#datafusion.serviceAgent)\nrole. You then execute the pipeline in your customer project so that it uses\nyour Dataproc cluster. In the customer project, the default\nCompute Engine service account controls access to data. You can configure your\nproject to use a custom service account.\n\nFor more information about configuring service accounts, see\n[Cloud Data Fusion service accounts](/data-fusion/docs/concepts/service-accounts).\n\n### Design environment\n\nWhen you create a Cloud Data Fusion instance in your customer project,\nCloud Data Fusion automatically creates a separate, Google-managed tenant\nproject to run the services required to manage the lifecycle of pipelines and\nmetadata, the Cloud Data Fusion UI, and design-time tools like Preview and\nWrangler.\n\n#### DNS resolution in Cloud Data Fusion\n\nTo resolve domain names in your design-time environment when you wrangle and\npreview the data that you're transferring into Google Cloud, use DNS Peering\n(available starting in Cloud Data Fusion 6.7.0). It lets you use domain or\nhostnames for sources and sinks, which you don't need to reconfigure as often as\nIP addresses.\n\nDNS resolution is recommended in your design-time environment in\nCloud Data Fusion, when you test connections and preview pipelines that use\ndomain names of on-premises or other servers (such as databases or FTP servers),\nin a private VPC network.\n\nFor more information, see\n[DNS Peering](/dns/docs/zones/zones-overview#peering_zones) and\n[Cloud DNS Forwarding](https://cloud.google.com/blog/products/networking/announcing-cloud-dns-forwarding-unifying-hybrid-cloud-naming).\n\n### Execution environment\n\nAfter you verify and deploy your pipeline in an instance, you either execute the\npipeline manually, or it executes on a time schedule or a pipeline state\ntrigger.\n\nWhether the execution environment is provisioned and managed by\nCloud Data Fusion or the customer, the environment exists in your customer\nproject.\n| **Note:** As an execution environment and a managed service, Dataproc has its own network and firewall requirements.\n\nPublic instances (default)\n--------------------------\n\nThe easiest way to provision a Cloud Data Fusion instance is to create\na public instance. It serves well as a starting point and provides access to\nexternal endpoints on the public internet.\n\nA public instance in Cloud Data Fusion uses the default\nVPC network in your project.\n| **Note:** Both design and execution environments have external IP addresses that are kept behind a firewall to control access.\n\nThe default VPC network has the following:\n\n- Autogenerated subnets for each region\n- Routing tables\n- Firewall rules to ensure communication among your computing resources\n\n### Networking across regions\n\nWhen you create a new project, a benefit of the default VPC\nnetwork is that it autopopulates one subnet per region using a predefined IP\naddress range, expressed as a CIDR block. The IP address ranges start with\n`10.128.0.0/20`, `10.132.0.0/20`, across the Google Cloud global regions.\n\nTo ensure that your computing resources connect to each other across regions,\nthe default VPC network sets the default local routes to each\nsubnet. By setting up the default route to the internet (`0.0.0.0/0`), you gain\naccess to the internet and capture any unrouted network traffic.\n\n### Firewall rules\n\nThe default VPC network provides a set of firewall rules:\n\nThese default VPC network settings minimize the prerequisites for\nsetting up cloud services, including Cloud Data Fusion. Due to concerns\nabout network security, organizations often don't let you use the default\nVPC network for business operations. Without the default\nVPC network, you cannot create a Cloud Data Fusion public\ninstance. Instead,\n[create a private instance](/data-fusion/docs/how-to/create-private-ip).\n\nThe default VPC network does not grant open access to resources.\nInstead, Identity and Access Management (IAM) controls access:\n\n- A validated identity is required to sign in to Google Cloud.\n- After you've logged in, you need explicit permission (for example, the Viewer role) to view Google Cloud services.\n\nPrivate instances\n-----------------\n\nSome organizations require that all of their production systems be isolated\nfrom public IP addresses. A Cloud Data Fusion private instance meets that\nrequirement in all kinds of VPC network settings.\n\n### Private Service Connect in Cloud Data Fusion\n\nCloud Data Fusion instances might need to connect to resources located\non-premises, on Google Cloud, or on other cloud providers. When using\nCloud Data Fusion with internal IP addresses, connections to external\nresources are established over the VPC network in your\nGoogle Cloud project. Traffic over the network doesn't go through the\npublic internet. When Cloud Data Fusion is provided access to your\nVPC using VPC network peering, there are limitations,\nwhich become apparent when you use large-scale networks.\n\nWith Private Service Connect interfaces, Cloud Data Fusion\nconnects to your VPC without the use of VPC network peering.\nThe [Private Service Connect interface](/vpc/docs/about-private-service-connect-interfaces) is\na type of\n[Private Service Connect](/vpc/docs/private-service-connect)\nthat provides a way for Cloud Data Fusion to initiate private and secure\nconnections to consumer VPC networks. This not only provides the flexibility and\nease of access (like VPC network peering), but also provides the explicit\nauthorization and consumer-side control that\nPrivate Service Connect offers. For more information, see [Create\na private instance with\nPrivate Service Connect](/data-fusion/docs/how-to/configure-private-service-connect).\n\nAccess to data in design and execution environments\n---------------------------------------------------\n\nIn a public instance, network communication happens over the open internet,\nwhich is not recommended for critical environments. To securely access your data\nsources, always execute your pipelines from a private instance in your execution\nenvironment.\n\nAccess to sources\n-----------------\n\nWhen accessing data sources, public and private instances:\n\n- make outgoing calls to Google Cloud APIs using Private Google Access\n- communicate with an execution (Dataproc) environment through VPC peering\n\nThe following table compares public and private instances during design and\nexecution for various data sources:\n\nWhat's next\n-----------\n\n- [Access control in Cloud Data Fusion](/data-fusion/docs/access-control)\n- [Service accounts in Cloud Data Fusion](/data-fusion/docs/concepts/service-accounts)\n- [Creating a public instance](/data-fusion/docs/how-to/create-instance)\n- [Creating a private instance](/data-fusion/docs/how-to/create-private-ip)"]]