디스크 유형을 지정하려면 compute.googleapis.com/projects/PROJECT_ID/zones/ZONE/diskTypes/DISK_TYPE 형식을 사용합니다.
다음을 바꿉니다.
PROJECT_ID: 프로젝트 ID입니다.
ZONE: Persistent Disk 영역입니다(예: us-central1-b).
DISK_TYPE: 디스크 유형(pd-ssd 또는 pd-standard)입니다.
자세한 내용은 Compute Engine API 참조 페이지의 diskTypes를 참조하세요.
디스크 크기
Persistent Disk 크기입니다.
Java
diskSizeGb 파이프라인 옵션을 설정합니다.
Python
disk_size_gb 파이프라인 옵션을 설정합니다.
Go
disk_size_gb 파이프라인 옵션을 설정합니다.
이 옵션을 설정하는 경우 작업자 부팅 이미지와 로컬 로그를 고려하여 최소 30GB 이상을 지정합니다.
디스크 크기를 줄이면 사용 가능한 셔플 I/O가 줄어듭니다. Dataflow Shuffle 또는 Streaming Engine을 사용하지 않는 셔플 바인딩 작업으로 인해 런타임 및 작업 비용이 증가할 수 있습니다.
일괄 작업
Dataflow Shuffle을 사용하는 일괄 작업의 경우 이 옵션은 작업자 VM 부팅 디스크 크기를 설정합니다. Dataflow Shuffle을 사용하지 않는 일괄 작업의 경우 이 옵션은 셔플된 데이터를 저장하는 데 사용되는 디스크의 크기를 설정합니다. 부팅 디스크 크기는 영향을 받지 않습니다.
일괄 작업에서 Dataflow Shuffle을 사용하는 경우 기본 디스크 크기는 25GB입니다. 그렇지 않으면 기본값은 250GB입니다.
스트리밍 작업
Streaming Engine을 사용하는 스트리밍 작업의 경우 이 옵션은 부팅 디스크 크기를 설정합니다. Streaming Engine을 사용하지 않는 스트리밍 작업의 경우 이 옵션은 Dataflow 서비스에서 만든 각 추가 Persistent Disk의 크기를 설정합니다. 부팅 디스크는 영향을 받지 않습니다.
스트리밍 작업에서 Streaming Engine을 사용하지 않으면 실험 플래그 streaming_boot_disk_size_gb로 부팅 디스크 크기를 설정할 수 있습니다. 예를 들어 80GB 부팅 디스크를 만들려면 --experiments=streaming_boot_disk_size_gb=80을 지정합니다.
스트리밍 작업에서 Streaming Engine을 사용하는 경우 기본 디스크 크기는 30GB입니다. 그렇지 않으면 기본값은 400GB입니다.
Cloud Storage FUSE를 사용하여 Cloud Storage 버킷을 Dataflow VM에 마운트
Cloud Storage FUSE를 사용하면 Cloud Storage 버킷을 Dataflow VM에 직접 마운트할 수 있어, 소프트웨어가 해당 파일들을 로컬 파일처럼 액세스할 수 있습니다. 이러한 통합을 통해 데이터를 사전에 다운로드할 필요가 없어지며, 워크로드에 대한 데이터 액세스가 간소화됩니다. 자세한 내용은 Dataflow 및 Cloud Storage FUSE로 ML 데이터 처리를 참조하세요.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[[["\u003cp\u003eDataflow allows configuration of worker VMs by setting specific pipeline options when creating a job, including machine type, disk type, and disk size.\u003c/p\u003e\n"],["\u003cp\u003eThe machine type for worker VMs can be set to either x86 or Arm, and custom machine types can be specified using a defined format of family, vCPU count, and memory size.\u003c/p\u003e\n"],["\u003cp\u003eThe type of Persistent Disk can be specified, using a format that includes the project ID, zone, and disk type (either \u003ccode\u003epd-ssd\u003c/code\u003e or \u003ccode\u003epd-standard\u003c/code\u003e), while certain jobs, such as those using Streaming Engine or the N4 machine type, should not have a disk type specified.\u003c/p\u003e\n"],["\u003cp\u003eThe Persistent Disk size can be configured, with a recommendation to set at least 30 GB to account for the worker boot image and local logs; the default sizes vary depending on whether the job is batch or streaming and if Dataflow Shuffle or Streaming Engine are being utilized.\u003c/p\u003e\n"],["\u003cp\u003eShared core machine types like \u003ccode\u003ef1\u003c/code\u003e and \u003ccode\u003eg1\u003c/code\u003e series are not supported under the Dataflow Service Level Agreement, and Arm machines are also supported with the Tau T2A series.\u003c/p\u003e\n"]]],[],null,["# Configure Dataflow worker VMs\n\nThis document describes how to configure the worker VMs for a Dataflow\njob.\n\nBy default, Dataflow selects the machine type for the worker VMs that\nrun your job, along with the size and type of Persistent Disk. To configure the\nworker VMs, set the following\n[pipeline options](/dataflow/docs/reference/pipeline-options#worker-level_options)\nwhen you create the job.\n\nMachine type\n------------\n\nThe Compute Engine [machine type](/compute/docs/machine-types) that\nDataflow uses when starting worker VMs. You can use x86 or Arm\nmachine types, including custom machine types. \n\n### Java\n\nSet the `workerMachineType` pipeline option.\n\n### Python\n\nSet the `machine_type` pipeline option.\n\n### Go\n\nSet the `worker_machine_type` pipeline option.\n\n- For Arm, the\n [Tau T2A machine series](/compute/docs/general-purpose-machines#t2a_machines)\n is supported. For more information about using Arm VMs, see\n [Use Arm VMs in Dataflow](/dataflow/docs/guides/use-arm-vms).\n\n- Shared core machine types, such as `f1` and `g1` series workers, are not\n supported under the Dataflow\n [Service Level Agreement](/dataflow/sla).\n\n- Billing is independent of the machine type family. For more information, see\n [Dataflow pricing](/dataflow/pricing).\n\n### Custom machine types\n\nTo specify a custom machine type, use the following format:\n\u003cvar translate=\"no\"\u003eFAMILY\u003c/var\u003e`-`\u003cvar translate=\"no\"\u003evCPU\u003c/var\u003e`-`\u003cvar translate=\"no\"\u003eMEMORY\u003c/var\u003e. Replace the\nfollowing:\n\n- \u003cvar translate=\"no\"\u003eFAMILY\u003c/var\u003e. Use one of the following values:\n\n- \u003cvar translate=\"no\"\u003evCPU\u003c/var\u003e. The number of vCPUs.\n- \u003cvar translate=\"no\"\u003eMEMORY\u003c/var\u003e. The memory, in MB.\n\nTo enable\n[extended memory](/compute/docs/instances/creating-instance-with-custom-machine-type#extendedmemory),\nappend `-ext` to the machine type. Examples: `n2-custom-6-3072`,\n`n2-custom-2-32768-ext`.\n\nFor more information about valid custom machine types, see\n[Custom machine types](/compute/docs/general-purpose-machines#custom_machine_types)\nin the Compute Engine documentation.\n\nDisk type\n---------\n\nThe type of [Persistent Disk](/compute/docs/disks#pdspecs) to use.\n\nDon't specify a Persistent Disk when using either\n[Streaming Engine](/dataflow/docs/streaming-engine) or the N4\n[machine type](#machine-type). \n\n### Java\n\nSet the `workerDiskType` pipeline option.\n\n### Python\n\nSet the `worker_disk_type` pipeline option.\n\n### Go\n\nSet the `disk_type` pipeline option.\n\nTo specify the disk type, use the following format:\n`compute.googleapis.com/projects/`\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e`/zones/`\u003cvar translate=\"no\"\u003eZONE\u003c/var\u003e`/diskTypes/`\u003cvar translate=\"no\"\u003eDISK_TYPE\u003c/var\u003e.\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: your project ID\n- \u003cvar translate=\"no\"\u003eZONE\u003c/var\u003e: the zone for the Persistent Disk, for example `us-central1-b`\n- \u003cvar translate=\"no\"\u003eDISK_TYPE\u003c/var\u003e: the disk type, either `pd-ssd` or `pd-standard`\n\nFor more information, see the Compute Engine API reference page for\n[diskTypes](/compute/docs/reference/latest/diskTypes).\n\nDisk size\n---------\n\nThe Persistent Disk size. \n\n### Java\n\nSet the `diskSizeGb` pipeline option.\n\n### Python\n\nSet the `disk_size_gb` pipeline option.\n\n### Go\n\nSet the `disk_size_gb` pipeline option.\n\nIf you set this option, specify at least 30 GB to account for the worker\nboot image and local logs.\n\nLowering the disk size reduces available shuffle I/O. Shuffle-bound jobs\nnot using Dataflow Shuffle or Streaming Engine may result in\nincreased runtime and job cost.\n\n### Batch jobs\n\nFor batch jobs using\n[Dataflow Shuffle](/dataflow/docs/shuffle-for-batch), this option\nsets the size of a worker VM boot disk. For batch jobs not using\nDataflow Shuffle, this option sets the size of the disks used to\nstore shuffled data; the boot disk size is not affected.\n\nIf a batch job uses Dataflow Shuffle, then the default disk size\nis 25 GB. Otherwise, the default is 250 GB.\n\n### Streaming jobs\n\nFor streaming jobs using [Streaming Engine](/dataflow/docs/streaming-engine),\nthis option sets size of the boot disks. For streaming jobs not using\nStreaming Engine, this option sets the size of each additional Persistent Disk\ncreated by the Dataflow service; the boot disk is not affected.\n\nIf a streaming job does not use Streaming Engine, you can set the boot disk size\nwith the experiment flag `streaming_boot_disk_size_gb`. For example, specify\n`--experiments=streaming_boot_disk_size_gb=80` to create boot disks of 80 GB.\n\nIf a streaming job uses Streaming Engine, then the default disk size is\n30 GB. Otherwise, the default is 400 GB.\n\nUse Cloud Storage FUSE to mount your Cloud Storage buckets onto Dataflow VMs\n----------------------------------------------------------------------------\n\nCloud Storage FUSE lets you mount your Cloud Storage buckets directly\nwith Dataflow VMs, allowing software to access files as if they\nare local. This integration eliminates the need for pre-downloading data,\nstreamlining data access for your workloads. For more information, see [Process\nML data using Dataflow and\nCloud Storage FUSE](/dataflow/docs/machine-learning/ml-process-dataflow-fuse).\n\nWhat's next\n-----------\n\n- [Set Dataflow pipeline options](/dataflow/docs/guides/setting-pipeline-options)\n- [Use Arm VMs on Dataflow](/dataflow/docs/guides/use-arm-vms)"]]