[[["わかりやすい","easyToUnderstand","thumb-up"],["問題の解決に役立った","solvedMyProblem","thumb-up"],["その他","otherUp","thumb-up"]],[["わかりにくい","hardToUnderstand","thumb-down"],["情報またはサンプルコードが不正確","incorrectInformationOrSampleCode","thumb-down"],["必要な情報 / サンプルがない","missingTheInformationSamplesINeed","thumb-down"],["翻訳に関する問題","translationIssue","thumb-down"],["その他","otherDown","thumb-down"]],["最終更新日 2025-09-04 UTC。"],[[["\u003cp\u003eDataflow allows configuration of worker VMs by setting specific pipeline options when creating a job, including machine type, disk type, and disk size.\u003c/p\u003e\n"],["\u003cp\u003eThe machine type for worker VMs can be set to either x86 or Arm, and custom machine types can be specified using a defined format of family, vCPU count, and memory size.\u003c/p\u003e\n"],["\u003cp\u003eThe type of Persistent Disk can be specified, using a format that includes the project ID, zone, and disk type (either \u003ccode\u003epd-ssd\u003c/code\u003e or \u003ccode\u003epd-standard\u003c/code\u003e), while certain jobs, such as those using Streaming Engine or the N4 machine type, should not have a disk type specified.\u003c/p\u003e\n"],["\u003cp\u003eThe Persistent Disk size can be configured, with a recommendation to set at least 30 GB to account for the worker boot image and local logs; the default sizes vary depending on whether the job is batch or streaming and if Dataflow Shuffle or Streaming Engine are being utilized.\u003c/p\u003e\n"],["\u003cp\u003eShared core machine types like \u003ccode\u003ef1\u003c/code\u003e and \u003ccode\u003eg1\u003c/code\u003e series are not supported under the Dataflow Service Level Agreement, and Arm machines are also supported with the Tau T2A series.\u003c/p\u003e\n"]]],[],null,["# Configure Dataflow worker VMs\n\nThis document describes how to configure the worker VMs for a Dataflow\njob.\n\nBy default, Dataflow selects the machine type for the worker VMs that\nrun your job, along with the size and type of Persistent Disk. To configure the\nworker VMs, set the following\n[pipeline options](/dataflow/docs/reference/pipeline-options#worker-level_options)\nwhen you create the job.\n\nMachine type\n------------\n\nThe Compute Engine [machine type](/compute/docs/machine-types) that\nDataflow uses when starting worker VMs. You can use x86 or Arm\nmachine types, including custom machine types. \n\n### Java\n\nSet the `workerMachineType` pipeline option.\n\n### Python\n\nSet the `machine_type` pipeline option.\n\n### Go\n\nSet the `worker_machine_type` pipeline option.\n\n- For Arm, the\n [Tau T2A machine series](/compute/docs/general-purpose-machines#t2a_machines)\n is supported. For more information about using Arm VMs, see\n [Use Arm VMs in Dataflow](/dataflow/docs/guides/use-arm-vms).\n\n- Shared core machine types, such as `f1` and `g1` series workers, are not\n supported under the Dataflow\n [Service Level Agreement](/dataflow/sla).\n\n- Billing is independent of the machine type family. For more information, see\n [Dataflow pricing](/dataflow/pricing).\n\n### Custom machine types\n\nTo specify a custom machine type, use the following format:\n\u003cvar translate=\"no\"\u003eFAMILY\u003c/var\u003e`-`\u003cvar translate=\"no\"\u003evCPU\u003c/var\u003e`-`\u003cvar translate=\"no\"\u003eMEMORY\u003c/var\u003e. Replace the\nfollowing:\n\n- \u003cvar translate=\"no\"\u003eFAMILY\u003c/var\u003e. Use one of the following values:\n\n- \u003cvar translate=\"no\"\u003evCPU\u003c/var\u003e. The number of vCPUs.\n- \u003cvar translate=\"no\"\u003eMEMORY\u003c/var\u003e. The memory, in MB.\n\nTo enable\n[extended memory](/compute/docs/instances/creating-instance-with-custom-machine-type#extendedmemory),\nappend `-ext` to the machine type. Examples: `n2-custom-6-3072`,\n`n2-custom-2-32768-ext`.\n\nFor more information about valid custom machine types, see\n[Custom machine types](/compute/docs/general-purpose-machines#custom_machine_types)\nin the Compute Engine documentation.\n\nDisk type\n---------\n\nThe type of [Persistent Disk](/compute/docs/disks#pdspecs) to use.\n\nDon't specify a Persistent Disk when using either\n[Streaming Engine](/dataflow/docs/streaming-engine) or the N4\n[machine type](#machine-type). \n\n### Java\n\nSet the `workerDiskType` pipeline option.\n\n### Python\n\nSet the `worker_disk_type` pipeline option.\n\n### Go\n\nSet the `disk_type` pipeline option.\n\nTo specify the disk type, use the following format:\n`compute.googleapis.com/projects/`\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e`/zones/`\u003cvar translate=\"no\"\u003eZONE\u003c/var\u003e`/diskTypes/`\u003cvar translate=\"no\"\u003eDISK_TYPE\u003c/var\u003e.\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: your project ID\n- \u003cvar translate=\"no\"\u003eZONE\u003c/var\u003e: the zone for the Persistent Disk, for example `us-central1-b`\n- \u003cvar translate=\"no\"\u003eDISK_TYPE\u003c/var\u003e: the disk type, either `pd-ssd` or `pd-standard`\n\nFor more information, see the Compute Engine API reference page for\n[diskTypes](/compute/docs/reference/latest/diskTypes).\n\nDisk size\n---------\n\nThe Persistent Disk size. \n\n### Java\n\nSet the `diskSizeGb` pipeline option.\n\n### Python\n\nSet the `disk_size_gb` pipeline option.\n\n### Go\n\nSet the `disk_size_gb` pipeline option.\n\nIf you set this option, specify at least 30 GB to account for the worker\nboot image and local logs.\n\nLowering the disk size reduces available shuffle I/O. Shuffle-bound jobs\nnot using Dataflow Shuffle or Streaming Engine may result in\nincreased runtime and job cost.\n\n### Batch jobs\n\nFor batch jobs using\n[Dataflow Shuffle](/dataflow/docs/shuffle-for-batch), this option\nsets the size of a worker VM boot disk. For batch jobs not using\nDataflow Shuffle, this option sets the size of the disks used to\nstore shuffled data; the boot disk size is not affected.\n\nIf a batch job uses Dataflow Shuffle, then the default disk size\nis 25 GB. Otherwise, the default is 250 GB.\n\n### Streaming jobs\n\nFor streaming jobs using [Streaming Engine](/dataflow/docs/streaming-engine),\nthis option sets size of the boot disks. For streaming jobs not using\nStreaming Engine, this option sets the size of each additional Persistent Disk\ncreated by the Dataflow service; the boot disk is not affected.\n\nIf a streaming job does not use Streaming Engine, you can set the boot disk size\nwith the experiment flag `streaming_boot_disk_size_gb`. For example, specify\n`--experiments=streaming_boot_disk_size_gb=80` to create boot disks of 80 GB.\n\nIf a streaming job uses Streaming Engine, then the default disk size is\n30 GB. Otherwise, the default is 400 GB.\n\nUse Cloud Storage FUSE to mount your Cloud Storage buckets onto Dataflow VMs\n----------------------------------------------------------------------------\n\nCloud Storage FUSE lets you mount your Cloud Storage buckets directly\nwith Dataflow VMs, allowing software to access files as if they\nare local. This integration eliminates the need for pre-downloading data,\nstreamlining data access for your workloads. For more information, see [Process\nML data using Dataflow and\nCloud Storage FUSE](/dataflow/docs/machine-learning/ml-process-dataflow-fuse).\n\nWhat's next\n-----------\n\n- [Set Dataflow pipeline options](/dataflow/docs/guides/setting-pipeline-options)\n- [Use Arm VMs on Dataflow](/dataflow/docs/guides/use-arm-vms)"]]