默认情况下,Cloud Data Fusion 使用“Autoscale”作为计算配置文件。估算工作负载的适当集群工作器(节点)数量非常困难,整个流水线的单个集群大小通常不是理想之选。Dataproc 自动扩缩功能提供自动管理集群资源的机制,还启用了集群工作器虚拟机的自动扩缩功能。如需了解详情,请参阅自动扩缩。
在计算配置页面(您可以在其中查看配置文件列表)上,有一个总核心数列,其中显示了配置文件可扩容到的最大 vCPU 数量,例如 Up to 84。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-04。"],[[["\u003cp\u003eCompute profiles define how and where pipelines are executed, including the provisioner and its configuration.\u003c/p\u003e\n"],["\u003cp\u003eProfiles can be scoped as either "system" for use across all namespaces or "user" for use within a specific namespace.\u003c/p\u003e\n"],["\u003cp\u003eCompute profiles can be assigned to batch pipelines, schedules, or manual runs, and if multiple are assigned, the schedule profile takes precedence, otherwise the pipeline's assigned profile will be used, next the namespace profile, then the system default profile, and lastly the built-in profile.\u003c/p\u003e\n"],["\u003cp\u003eBy default, Cloud Data Fusion utilizes Autoscale, which dynamically adjusts cluster resources based on workload demands, but can increase costs if not used correctly.\u003c/p\u003e\n"],["\u003cp\u003eProfile settings can be overridden at runtime using runtime arguments or schedule properties, with the ability to override the assigned profile itself or specific profile properties.\u003c/p\u003e\n"]]],[],null,["# Manage compute profiles\n\nA *compute profile* specifies how and where a pipeline is executed. It\nencapsulates any information required to set up and delete the physical\nexecution environment of a pipeline. A compute profile specifies a\n[provisioner](/data-fusion/docs/concepts/provisioners) name and the configuration settings for that provisioner.\n\nEach compute profile has a scope: *system* or *user*. You can use system compute\nprofiles for any namespaces under it. User compute profiles exist within a\nnamespace, and only pipelines in that namespace can use user compute profiles.\nCompute profiles can be assigned to batch pipelines. When a compute profile is\nassigned to a pipeline, the provisioner specified in the profile will be used to\ncreate a cluster where the pipeline will run.\n\nFor example, an administrator might decide to create small, medium, and large\ncompute profiles. They configure each profile with the Google Cloud\ncredentials required to create and delete Dataproc clusters in\nthe company's Google Cloud account.\n\n- The small profile is configured to create a 5-node cluster.\n- The medium profile is configured to create a 20-node cluster.\n- The large profile is configured to create a 50-node cluster.\n\nThe administrator assigns the small profile to pipelines that are scheduled to\nrun every hour on small amounts of data. They assign the large profile to\npipelines that are scheduled to run every day on a large amount of data.\n\nDefault compute profile\n-----------------------\n\nBy default, Cloud Data Fusion uses Autoscale as the compute profile.\nEstimating the appropriate number of cluster workers (nodes) for a workload is\ndifficult, and a single cluster size for an entire pipeline is often not ideal.\nDataproc Autoscaling provides a mechanism for automating cluster\nresource management and enables cluster worker VM autoscaling. For more\ninformation, see [Autoscaling](/dataproc/docs/concepts/configuring-clusters/autoscaling).\n\nOn the **Compute config** page, where you can see a list of profiles, there is\na **Total cores** column, which has the maximum vCPUs that the profile can scale\nup to, such as `Up to 84`.\n| **Note:** Autoscaling can increase costs. For example, it's not recommended for real-time pipelines or replication jobs because clusters only scale up and there might be increased costs from the additional clusters.\n\nSystem and user compute profiles\n--------------------------------\n\nA compute profile indicates which provisioner to use when creating a cluster\nand specifies the cluster configuration. They also specify the provisioner\nconfiguration that should be used when creating a cluster.\n\n- To create a *system compute profile* , go to the **System admin** page in Cloud Data Fusion Studio. This page lists all system compute profiles and lets you create new system compute profiles.\n- To create a *user compute profile* , go to the **Namespace\n administration** page in Cloud Data Fusion Studio, and then select the namespace to create the profile in. Then, you can create a profile that exists only within that namespace.\n\nCompute profile assignment\n--------------------------\n\nYou can assign compute profiles to batch pipelines in the following ways:\n\n- Assign a default profile for the Cloud Data Fusion instance.\n- Assign a default profile for a specific namespace.\n- Assign a profile to a batch pipeline to use for runs that are started manually.\n- Assign a profile to a pipeline schedule.\n\nIf a profile is set in the schedule that triggers a run, or if you manually run\na pipeline and there's a profile assigned to that pipeline,\nCloud Data Fusion uses that compute profile.\n\nIf no profile is set, Cloud Data Fusion uses the default profile for the\nnamespace. If no default profile is set for the namespace,\n\nCloud Data Fusion uses the system default profile. If no system default is\nset, the built-in profile is used.\n\nAssign a default compute profile\n--------------------------------\n\nTo assign default profiles to a Cloud Data Fusion namespace or instance, go\nto the Cloud Data Fusion Studio and click **System admin** \\\u003e\n**Configuration** \\\u003e **System compute profiles** . To select the\ndefault, click the star star by the profile\nname.\n\nOptional: use the Preferences Microservices to set default profiles\n-------------------------------------------------------------------\n\n- To set the default profile, set a preference on the Cloud Data Fusion instance with key system.profile.name and value `system:\u003cprofile-name\u003e`.\n- To set the default profile for a namespace, set a preference on the chosen namespace with key `system.profile.name` and value `\u003cscope\u003e:\u003cprofile-name\u003e`.\n\nAssign a compute profile for manual runs\n----------------------------------------\n\nTo assign a profile to use for manual pipeline runs, follow these steps:\n\n1. Navigate to the pipeline detail page.\n2. Click **Configure \\\u003e Compute config**.\n3. Select a profile and click **Save**. The selected profile is used whenever the pipeline runs manually.\n\nAlternatively, you can use the Preferences Microservices to set the profile for\nmanual runs by setting preference on the `DataPipelineWorkflow` entity with key\n`system.profile.name` and value `\u003cscope\u003e:\u003cprofile-name\u003e`.\n\nAssign a compute profile to a schedule\n--------------------------------------\n\nAny time you create a schedule for a pipeline, you can assign a profile to it.\nWhenever the schedule triggers a pipeline run, it will use that profile for the\nrun. This is true for time schedules and schedules that other pipelines\ntrigger.\n\nOverride a compute profile configuration\n----------------------------------------\n\nWhen a profile is created, each configuration setting can be made immutable by\nlocking it. However, if configuration settings are not locked, they can be\noverridden at runtime. To override profile configuration, follow these steps:\n\n1. From the Pipeline List page, select the deployed pipeline you want to run.\n2. From the Pipeline Details page, click **Configure**.\n3. Choose a compute profile and click **Customize**.\n4. Change any settings and click **Save**.\n\nYou can use runtime arguments and schedule properties to modify the cluster\nsize and other settings.\n\n- To override the profile used, set a runtime argument with the key `system.profile.name`and value `\u003cscope\u003e:\u003cprofile-name\u003e`.\n- To override a profile property, set a runtime argument with key `system.profile.properties.\u003cproperty-name\u003e` and value equal to the value for that property.\n\nFor example, to override the `numWorkerssetting` to a value of `10`, set a\npreference or runtime argument with the key\n`system.profile.properties.numWorkers` and the value `10`.\n\nWhat's next\n-----------\n\n- Learn more about [provisioners in Cloud Data Fusion](/data-fusion/docs/concepts/provisioners).\n- Learn more about [Dataproc cluster configuration](/data-fusion/docs/concepts/dataproc)."]]