비공개 Cloud Data Fusion 인스턴스는 원격 Hadoop 프로비저닝 도구와 함께 사용할 수 있습니다.
Dataproc 클러스터는 Cloud Data Fusion과 피어링된 VPC 네트워크에 있어야 합니다. 원격 Hadoop 프로비저닝 도구는 Dataproc 클러스터의 마스터 노드에 대한 내부 IP 주소로 구성됩니다.
액세스 제어
Cloud Data Fusion 인스턴스에 대한 액세스 관리: RBAC가 사용 설정된 인스턴스는 Identity and Access Management를 통해 네임스페이스 수준에서 액세스 관리를 지원합니다. RBAC가 사용 중지된 인스턴스는 인스턴스 수준에서 액세스 관리만 지원합니다. 인스턴스에 대한 액세스 권한이 있으면 해당 인스턴스의 모든 파이프라인 및 메타데이터에 액세스할 수 있습니다.
데이터에 대한 파이프라인 액세스: 데이터에 대한 파이프라인 액세스는 사용자가 지정하는 커스텀 서비스 계정일 수 있는 서비스 계정에 대한 액세스 권한을 부여하여 제공합니다.
방화벽 규칙
파이프라인 실행의 경우 파이프라인이 실행되는 고객 VPC에 적절한 방화벽 규칙을 설정하여 인그레스 및 이그레스를 제어합니다.
비밀번호, 키, 기타 데이터가 Cloud Data Fusion에 안전하게 저장되고 Cloud Key Management Service에 저장된 키를 사용하여 암호화됩니다. 런타임 시 Cloud Data Fusion은 Cloud Key Management Service를 호출하여 저장된 보안 비밀을 복호화하는 데 사용되는 키를 가져옵니다.
암호화
기본적으로 저장 데이터는 Google 소유 및 Google 관리 키를 사용하여 암호화되며 전송 중인 데이터는 TLS v1.2를 사용하여 암호화됩니다. 고객 관리 암호화 키(CMEK)를 사용하여 Dataproc 클러스터 메타데이터와 Cloud Storage, BigQuery, Pub/Sub 데이터 소스와 싱크를 포함하여 Cloud Data Fusion 파이프라인에서 작성한 데이터를 제어합니다.
서비스 계정
Cloud Data Fusion 파이프라인은 고객 프로젝트의 Dataproc 클러스터에서 실행되며 고객 지정(커스텀) 서비스 계정을 사용하여 실행되도록 구성할 수 있습니다. 커스텀 서비스 계정에는 서비스 계정 사용자 역할이 부여되어야 합니다.
프로젝트
Cloud Data Fusion 서비스는 사용자가 액세스할 수 없는 Google 관리 테넌트 프로젝트에서 생성됩니다. Cloud Data Fusion 파이프라인은 고객 프로젝트 내의 Dataproc 클러스터에서 실행됩니다. 고객은 수명 기간 동안 이 클러스터에 액세스할 수 있습니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[[["\u003cp\u003eSecurity-sensitive workloads should be placed in separate Google Cloud projects for strict isolation, with role-based access control enabled to manage resource access within Cloud Data Fusion instances.\u003c/p\u003e\n"],["\u003cp\u003eTo reduce the risk of data exfiltration and to ensure the instance isn't publicly accessible, users should enable internal IP addresses and VPC service controls in Cloud Data Fusion instances.\u003c/p\u003e\n"],["\u003cp\u003ePrivate Cloud Data Fusion instances, connected to a VPC Network through VPC peering or Private Service Connect, use internal IP addresses and are not exposed to the public internet, offering enhanced security.\u003c/p\u003e\n"],["\u003cp\u003ePipeline execution's ingress and egress are controlled by setting firewall rules on the customer VPC, and data within Cloud Data Fusion is encrypted, at rest using Google-managed keys, and in transit using TLS v1.2.\u003c/p\u003e\n"],["\u003cp\u003eUsers should be cautious when installing plugins or artifacts, as untrusted ones may present a security risk to Cloud Data Fusion.\u003c/p\u003e\n"]]],[],null,["# Security overview\n\nSecurity recommendations\n------------------------\n\nFor workloads that require a strong security boundary or isolation, consider the\nfollowing:\n\n- To enforce strict isolation, place security-sensitive workloads in a\n different Google Cloud project.\n\n- To control access to specific resources, enable\n [role-based access control](/data-fusion/docs/concepts/rbac) in your\n Cloud Data Fusion instances.\n\n- To ensure that the instance isn't publicly accessible and to reduce the risk\n of sensitive data exfiltration,\n enable [internal IP addresses](/data-fusion/docs/how-to/create-private-ip)\n and [VPC service controls (VPC-SC)](/data-fusion/docs/how-to/using-vpc-sc)\n in your instances.\n\nAuthentication\n--------------\n\nThe Cloud Data Fusion web UI supports authentication mechanisms supported\nby Google Cloud console, with access controlled through [Identity and Access Management](/iam/docs).\n\nNetworking controls\n-------------------\n\nYou can create a [private Cloud Data Fusion instance](/data-fusion/docs/how-to/create-private-ip),\nwhich can be connected to your VPC Network through [VPC peering](/data-fusion/docs/how-to/create-private-ip#set-up-vpc-peering)\nor [Private Service Connect](/data-fusion/docs/how-to/configure-private-service-connect#configure-psc).\nPrivate Cloud Data Fusion instances have an internal IP address, and aren't\nexposed to the public internet. Additional security is available using\n[VPC Service Controls](/data-fusion/docs/how-to/using-vpc-sc) to establish a\nsecurity perimeter around a Cloud Data Fusion private instance.\n\nFor more information, see the\n[Cloud Data Fusion networking overview](/data-fusion/docs/concepts/networking).\n\n### Pipeline execution on pre-created internal IP Dataproc clusters\n\nYou can use a private Cloud Data Fusion instance with the\n[remote Hadoop provisioner](https://cdap.atlassian.net/wiki/spaces/DOCS/pages/480313996).\nThe Dataproc cluster must be on the\nVPC network [peered](/data-fusion/docs/how-to/create-private-ip#set_up_network_peering)\nwith Cloud Data Fusion. The remote Hadoop provisioner is configured with\nthe internal IP address of the master node of the Dataproc\ncluster.\n\nAccess control\n--------------\n\n- Managing access to the Cloud Data Fusion instance:\n RBAC-enabled instances support managing access at a namespace level through\n Identity and Access Management. RBAC-disabled instances only support managing access at an\n instance level. If you have access to an instance, you have access to\n all pipelines and metadata in that instance.\n\n- Pipeline access to your data: Pipeline access to data is provided by\n granting access to the [service account](#service_accounts), which can be a\n custom service account that you specify.\n\n### Firewall rules\n\nFor a pipeline execution, you control ingress and egress by setting the\nappropriate firewall rules on the customer VPC on which the pipeline is being\nexecuted.\n\nFor more information, see\n[Firewall rules](/data-fusion/docs/concepts/networking#firewall-rules).\n\nKey storage\n-----------\n\nPasswords, keys, and other data are securely stored in Cloud Data Fusion\nand encrypted using keys stored in [Cloud Key Management Service](/kms/docs). At runtime,\nCloud Data Fusion calls Cloud Key Management Service to retrieve the key used to decrypt\nstored secrets.\n\nEncryption\n----------\n\nBy default, data is encrypted at rest using\n[Google-owned and Google-managed encryption keys](/storage/docs/encryption/default-keys),\nand in transit using TLS v1.2. You use\n[customer-managed encryption keys (CMEK)](/data-fusion/docs/how-to/customer-managed-encryption-keys)\nto control the data written by Cloud Data Fusion pipelines, including\nDataproc cluster metadata and Cloud Storage,\nBigQuery, and Pub/Sub data sources and sinks.\n\nService accounts\n----------------\n\nCloud Data Fusion pipelines execute in Dataproc clusters in\nthe customer project, and can be configured to run using a customer-specified\n(custom) service account. A custom service account must be granted the\n[Service Account User](/iam/docs/understanding-roles#service-accounts-roles)\nrole.\n\nProjects\n--------\n\nCloud Data Fusion services are created in Google-managed tenant projects\nthat users can't access. Cloud Data Fusion pipelines execute on\nDataproc clusters inside customer projects. Customers can access\nthese clusters during their lifetime.\n\nAudit logs\n----------\n\nCloud Data Fusion audit logs are available from [Logging](/logging/docs).\n\nPlugins and artifacts\n---------------------\n\nOperators and Admins should be wary of installing untrusted plugins or\nartifacts, as they might present a security risk.\n\nWorkforce identity federation\n-----------------------------\n\n[Workforce identity federation](/iam/docs/workforce-identity-federation) users\ncan perform operations in Cloud Data Fusion, such as creating, deleting,\nupgrading, and listing instances. For more information about limitations, see\n[Workforce identity federation: supported products and limitations](/iam/docs/federated-identity-supported-services#data-fusion)."]]