此页面由 Cloud Translation API 翻译。

创建和运行使用 GPU 的作业

本文档介绍了如何创建和运行使用图形处理器 (GPU) 的作业。如需详细了解 GPU 的功能和限制，请参阅 Compute Engine 文档中的 GPU 简介。

创建批处理作业时，您可以选择使用 GPU 加速特定工作负载。使用 GPU 的作业的常见用例包括密集型数据处理和人工智能工作负载 (AI)，例如机器学习 (ML)。

准备工作

如果您之前未使用过批处理功能，请参阅开始使用批处理，并完成适用于项目和用户的前提条件，以启用批处理功能。
如需获得创建作业所需的权限，请让您的管理员为您授予以下 IAM 角色：
- 项目的 Batch Job Editor (roles/batch.jobsEditor)
- 作业的服务账号（默认是默认的 Compute Engine 服务账号）上的 Service Account User (roles/iam.serviceAccountUser)
如需详细了解如何授予角色，请参阅管理对项目、文件夹和组织的访问权限。

您也可以通过自定义角色或其他预定义角色来获取所需的权限。

创建使用 GPU 的作业

如需创建使用 GPU 的作业，请执行以下操作：

规划使用 GPU 的作业的要求。
使用您确定的要求和方法创建作业。如需查看使用推荐选项创建作业的示例，请参阅本文档中的创建使用 GPU 的示例作业。

规划使用 GPU 的作业的要求

在创建使用 GPU 的作业之前，请按以下部分所述规划作业的要求：

选择 GPU 机器类型和配置方法
安装 GPU 驱动程序
定义兼容的虚拟机资源

第 1 步：选择 GPU 机器类型和预配方法

作业的要求因首选 GPU 机器类型和预配方法而异，并且每种方法的选项可能相互依赖。根据您的要求和优先级，您可以先选择 GPU 机器类型，也可以先选择预配方法。通常，GPU 机器类型主要影响性能和基本价格，而预配方法主要影响资源可用性以及额外费用或折扣。

选择 GPU 机器类型

可用的 GPU 机器类型（GPU 类型、GPU 数量和机器类型 [vCPU 和内存] 的有效组合）及其用例列在 Compute Engine 文档的 GPU 机器类型页面上。

作业指定 GPU 机器类型所需的字段因下表中的类别而异：

GPU 机器类型及其作业要求
加速器优化虚拟机的 GPU：采用加速器优化机器系列机器类型的虚拟机会自动挂接特定类型和数量的这些 GPU。	如需为加速器优化型虚拟机使用 GPU，我们建议您指定机器类型。每种加速器优化机器类型仅支持特定类型和数量的 GPU，因此，无论您是否在加速器优化机器类型之外指定这些值，其功能都是等效的。具体而言，Batch 还支持仅为加速器优化型虚拟机指定 GPU 的类型和数量，但由此产生的 vCPU 和内存选项通常非常有限。因此，我们建议您验证可用的 vCPU 和内存选项是否与作业的任务要求兼容。
适用于 N1 虚拟机的 GPU：对于这些 GPU，您需要指定要挂接到每个虚拟机的类型和数量，并且必须将其挂接到机器类型为 N1 机器系列的虚拟机。	如需为 N1 虚拟机使用 GPU，我们建议您至少指定 GPU 类型和 GPU 数量。确保值组合与 N1 机器类型的有效 GPU 选项之一相匹配。对于使用任何特定类型和数量 GPU 的 N1 虚拟机，vCPU 和内存选项非常灵活。除非您使用 Google Cloud 控制台创建作业，否则可以让批处理作业自动选择符合作业任务要求的机器类型。注意：如果作业指定了 N1 机器类型，但未指定 GPU 类型或 GPU 数量，则批处理不会为其使用 GPU。

GPU 机器类型及其作业要求

加速器优化虚拟机的 GPU：采用加速器优化机器系列机器类型的虚拟机会自动挂接特定类型和数量的这些 GPU。

如需为加速器优化型虚拟机使用 GPU，我们建议您指定机器类型。每种加速器优化机器类型仅支持特定类型和数量的 GPU，因此，无论您是否在加速器优化机器类型之外指定这些值，其功能都是等效的。

具体而言，Batch 还支持仅为加速器优化型虚拟机指定 GPU 的类型和数量，但由此产生的 vCPU 和内存选项通常非常有限。因此，我们建议您验证可用的 vCPU 和内存选项是否与作业的任务要求兼容。

适用于 N1 虚拟机的 GPU：对于这些 GPU，您需要指定要挂接到每个虚拟机的类型和数量，并且必须将其挂接到机器类型为 N1 机器系列的虚拟机。

如需为 N1 虚拟机使用 GPU，我们建议您至少指定 GPU 类型和 GPU 数量。确保值组合与 N1 机器类型的有效 GPU 选项之一相匹配。对于使用任何特定类型和数量 GPU 的 N1 虚拟机，vCPU 和内存选项非常灵活。除非您使用 Google Cloud 控制台创建作业，否则可以让批处理作业自动选择符合作业任务要求的机器类型。

选择配置方法

Batch 会根据作业请求的资源类型，使用不同的方法为使用 GPU 的作业预配虚拟机资源。下表介绍了可用的配置方法及其要求，并按用例列出了这些方法：从资源可用性最高到最低。

总的来说，我们建议大多数用户执行以下操作：

如果您打算使用不带预留的 A3 GPU 机器类型，请使用适用于批处理作业的动态工作负载调度程序（预览版）。

注意：如果您想将适用于批处理的动态工作负载调度程序与其他 GPU 机器类型搭配使用，请与 Google Cloud 销售团队或您的客户支持团队联系。
对于所有其他 GPU 机器类型，请使用默认的配置方法。默认的预配方法通常是按需预配；但如果您的项目有作业可以自动使用的未用预留，则属于例外情况。

配置方法及其作业要求

配置方法及其作业要求
预留使用场景：如果您希望非常高地保证资源可用性，或者您已经有可能未使用的现有预留，我们建议您为作业预留资源。详细信息：预留会按与运行虚拟机相同的价格收取指定虚拟机的费用，直到您删除预留为止。使用预留的虚拟机不会产生单独的费用，但无论是否有使用，预留都会产生费用。	批处理会为可以使用未使用的预留的作业使用预留。如需详细了解预留及其要求，请参阅使用虚拟机预留确保资源可用性页面。
适用于批处理的动态工作负载调度程序（预览版）用例：如果您想为机器类型属于 A3 机器系列的虚拟机使用 GPU，但不想消耗预留，我们建议您使用动态工作负载调度器。详情：借助动态工作负载调度器，您可以更轻松地同时访问许多有助于加速 AI 和机器学习工作负载的资源。例如，动态工作负载调度程序可以缓解由资源不可用导致的延迟或问题，从而有助于作业调度。重要提示：与其他作业不同，通过动态工作负载调度器使用 GPU 的批处理作业使用针对 Compute Engine 代管式实例组 (MIG) 的调整大小请求，其行为略有不同。具体而言，通过动态工作负载调度器使用 GPU 的作业可能需要抢占式分配配额，这是缓解动态工作负载调度器 GPU 配额问题的推荐选项。如需了解详情，请参阅 GPU 虚拟机和抢占式分配配额。	对于满足以下所有条件的作业，Batch 会使用动态工作负载调度器：指定 A3 GPU 机器类型。屏蔽预订。具体而言，作业必须将 `reservation` 字段设置为 `NO_RESERVATION`。如需了解详情，请参阅创建和运行无法使用预留虚拟机的作业。请勿使用 Spot 虚拟机。具体而言，作业可以省略 `provisioningModel` 字段，也可以将 `provisioningModel` 字段设置为 `STANDARD`。提示：虽然您可以在提供 A3 虚拟机的位置中的任何位置运行作业，但我们建议您使用位置 `us-central1`，因为该位置有专门用于动态工作负载调度器的容量。
按需应用场景：对于所有其他作业，我们建议使用按需模式。详细信息：按需访问通常是访问 Compute Engine 虚拟机的默认方式。借助按需模式，您可以一次请求一个虚拟机的资源，并在资源可用时立即访问这些资源。	批量作业会按需使用所有其他作业。
Spot 虚拟机应用场景：我们建议您尝试使用 Spot 虚拟机来降低容错型工作负载的费用。注意：Spot 虚拟机可能并非始终可用。您可以遵循 Spot 虚拟机最佳实践，提高资源可用性。不过，如果问题仍然存在，您可能需要改用其他配置方法。详细信息：Spot 虚拟机的折扣力度很大，但可能并非始终可用，并且可随时被抢占。如需了解详情，请参阅 Compute Engine 文档中的 Spot 虚拟机。	对于将 `provisioningModel` 字段设置为 `SPOT` 的作业，Batch 会使用 Spot 虚拟机。

预留

使用场景：如果您希望非常高地保证资源可用性，或者您已经有可能未使用的现有预留，我们建议您为作业预留资源。
详细信息：预留会按与运行虚拟机相同的价格收取指定虚拟机的费用，直到您删除预留为止。使用预留的虚拟机不会产生单独的费用，但无论是否有使用，预留都会产生费用。

批处理会为可以使用未使用的预留的作业使用预留。如需详细了解预留及其要求，请参阅使用虚拟机预留确保资源可用性页面。

适用于批处理的动态工作负载调度程序（预览版）

用例：如果您想为机器类型属于 A3 机器系列的虚拟机使用 GPU，但不想消耗预留，我们建议您使用动态工作负载调度器。
详情：借助动态工作负载调度器，您可以更轻松地同时访问许多有助于加速 AI 和机器学习工作负载的资源。例如，动态工作负载调度程序可以缓解由资源不可用导致的延迟或问题，从而有助于作业调度。

重要提示：与其他作业不同，通过动态工作负载调度器使用 GPU 的批处理作业使用针对 Compute Engine 代管式实例组 (MIG) 的调整大小请求，其行为略有不同。具体而言，通过动态工作负载调度器使用 GPU 的作业可能需要抢占式分配配额，这是缓解动态工作负载调度器 GPU 配额问题的推荐选项。如需了解详情，请参阅 GPU 虚拟机和抢占式分配配额。

对于满足以下所有条件的作业，Batch 会使用动态工作负载调度器：

指定 A3 GPU 机器类型。
屏蔽预订。具体而言，作业必须将 reservation 字段设置为 NO_RESERVATION。如需了解详情，请参阅创建和运行无法使用预留虚拟机的作业。
请勿使用 Spot 虚拟机。具体而言，作业可以省略 provisioningModel 字段，也可以将 provisioningModel 字段设置为 STANDARD。

按需

应用场景：对于所有其他作业，我们建议使用按需模式。
详细信息：按需访问通常是访问 Compute Engine 虚拟机的默认方式。借助按需模式，您可以一次请求一个虚拟机的资源，并在资源可用时立即访问这些资源。

批量作业会按需使用所有其他作业。

Spot 虚拟机

应用场景：我们建议您尝试使用 Spot 虚拟机来降低容错型工作负载的费用。

注意：Spot 虚拟机可能并非始终可用。您可以遵循 Spot 虚拟机最佳实践，提高资源可用性。不过，如果问题仍然存在，您可能需要改用其他配置方法。
详细信息：Spot 虚拟机的折扣力度很大，但可能并非始终可用，并且可随时被抢占。如需了解详情，请参阅 Compute Engine 文档中的 Spot 虚拟机。

对于将 provisioningModel 字段设置为 SPOT 的作业，Batch 会使用 Spot 虚拟机。

第 2 步：安装 GPU 驱动程序

如需让作业使用 GPU，您必须安装 GPU 驱动程序。如需安装 GPU 驱动程序，请选择以下方法之一：

自动安装 GPU 驱动程序（如有可能，建议采用此方法）：如示例所示，如需让批处理从第三方位置提取所需的 GPU 驱动程序并代表您进行安装，请将作业的 installGpuDrivers 字段设置为 true。如果您的作业不需要您手动安装驱动程序，建议您使用此方法。

（可选）如果您需要指定 Batch 要安装的 GPU 驱动程序版本，还可以设置 driverVersion 字段。
手动安装 GPU 驱动程序：如果满足以下任一条件，则必须使用此方法：

重要提示：由于存在一个已知问题，您可能还需要为指定某些 Compute Engine 映像的作业手动安装驱动程序。如需了解详情，请参阅只有在自动安装驱动程序时，包含 GPU 的作业和内核已过时的虚拟机操作系统映像才会失败。
- 作业同时使用脚本和容器可运行项，并且无法访问互联网。如需详细了解作业的访问权限，请参阅批处理网络概览。
- 作业使用自定义虚拟机映像。如需详细了解虚拟机操作系统映像以及您可以使用哪些虚拟机操作系统映像，请参阅虚拟机操作系统环境概览。
如需手动安装所需的 GPU 驱动程序，建议使用以下方法：
1. 创建包含 GPU 驱动程序的自定义虚拟机映像。
  1. 如需安装 GPU 驱动程序，请根据要使用的操作系统运行安装脚本：
    - 适用于 Container-Optimized OS 的 GPU 驱动程序
    - 适用于其他操作系统的 GPU 驱动程序
  2. 如果您的作业包含任何容器可运行项，并且未使用 Container-Optimized OS，您还必须安装 NVIDIA 容器工具包
2. 创建和提交使用 GPU 的作业时，请指定包含 GPU 驱动程序的自定义虚拟机映像，并将作业的 installGpuDrivers 字段设置为 false（默认值）。

第 3 步：定义兼容的虚拟机资源

如需了解为作业定义虚拟机资源的要求和选项，请参阅作业资源。

总而言之，在为使用 GPU 的作业定义虚拟机资源时，您必须执行以下所有操作：

确保 GPU 机器类型在作业虚拟机所在的位置可用。

如需了解 GPU 机器类型在哪些位置可用，请参阅 Compute Engine 文档中的按区域和可用区划分的 GPU 可用性。
如果您指定作业的机器类型，请确保该机器类型具有足够的 vCPU 和内存来满足作业的任务要求。每当您使用 Google Cloud 控制台创建作业时，都必须指定作业的机器类型；如果您要创建的任务使用 GPU 来优化加速器虚拟机，则建议您指定机器类型。
请务必使用有效的方法为作业定义虚拟机资源：
- 使用 instances[].policy 字段直接定义虚拟机资源（如果可能，建议这样做）。 此方法在示例中有所展示。
- 使用 instances[].instanceTemplate 字段通过模板定义虚拟机资源。 若要通过自定义映像手动安装 GPU 驱动程序，必须使用此方法。如需了解详情，请参阅使用虚拟机实例模板定义作业资源。

创建使用 GPU 的作业示例

以下部分介绍了如何使用推荐的选项为每种 GPU 机器类型创建示例作业。具体而言，示例作业全部会自动安装 GPU 驱动程序，全部会直接定义虚拟机资源，并且会指定配置方法或使用默认配置方法。

通过动态工作负载调度程序为 A3 虚拟机使用 GPU（预览版）
为加速器优化型虚拟机使用 GPU
为 N1 虚拟机使用 GPU

通过适用于批处理的动态工作负载调度程序 (Preview) 为 A3 虚拟机使用 GPU

您可以使用 gcloud CLI 或 Batch API 创建作业，以便通过动态工作负载调度程序为 A3 虚拟机使用 GPU。

gcloud

创建一个 JSON 文件，用于安装 GPU 驱动程序、指定 A3 机器系列中的机器类型、屏蔽预订，并在具有 GPU 机器类型的位置运行。

例如，如需创建一个基本脚本作业，以便通过动态工作负载调度程序为 A3 虚拟机使用 GPU，请创建一个包含以下内容的 JSON 文件：

{
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "echo Hello world from task ${BATCH_TASK_INDEX}."
                        }
                    }
                ]
            },
            "taskCount": 3,
            "parallelism": 1
        }
    ],
    "allocationPolicy": {
        "instances": [
            {
                "installGpuDrivers": INSTALL_GPU_DRIVERS,
                "policy": {
                    "machineType": "MACHINE_TYPE",
                    "reservation": "NO_RESERVATION"
                }
            }
        ],
        "location": {
            "allowedLocations": [
                "ALLOWED_LOCATIONS"
            ]
        }
    },
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

替换以下内容：

INSTALL_GPU_DRIVERS：设置为 true 时，批处理作业会从第三方位置提取您在 policy 字段中指定的 GPU 类型所需的驱动程序，并代表您安装这些驱动程序。如果您将此字段设置为 false（默认值），则需要手动安装 GPU 驱动程序，才能对此作业使用任何 GPU。
MACHINE_TYPE：A3 机器系列中的机器类型。
ALLOWED_LOCATIONS：您可以选择使用 allowedLocations[] 字段指定允许运行作业虚拟机的区域或区域中的特定可用区，例如，regions/us-central1 允许在区域 us-central1 中的所有可用区运行作业虚拟机。请务必为此作业指定提供您所需 GPU 机器类型的位置。否则，如果您省略此字段，请确保作业所在的位置提供 GPU 机器类型。

如需创建和运行作业，请使用 gcloud batch jobs submit 命令：
```
gcloud batch jobs submit JOB_NAME \
    --location LOCATION \
    --config JSON_CONFIGURATION_FILE
```
替换以下内容：
- JOB_NAME：作业的名称。
- LOCATION：作业的位置。
- JSON_CONFIGURATION_FILE：包含作业配置详细信息的 JSON 文件的路径。

API

向 jobs.create 方法发出 POST 请求，以安装 GPU 驱动程序、指定 A3 机器系列中的机器类型、阻止预订，并在具有 GPU 机器类型的位置运行。

例如，如需创建一个基本脚本作业，以便通过动态工作负载调度器为 A3 虚拟机使用 GPU，请发出以下请求：

POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME

{
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "echo Hello world from task ${BATCH_TASK_INDEX}."
                        }
                    }
                ]
            },
            "taskCount": 3,
            "parallelism": 1
        }
    ],
    "allocationPolicy": {
        "instances": [
            {
                "installGpuDrivers": INSTALL_GPU_DRIVERS,
                "policy": {
                    "machineType": "MACHINE_TYPE",
                    "reservation": "NO_RESERVATION"
                }
            }
        ],
        "location": {
            "allowedLocations": [
                "ALLOWED_LOCATIONS"
            ]
        }
    },
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

替换以下内容：

PROJECT_ID：您的项目的项目 ID。
LOCATION：作业的位置。
JOB_NAME：作业的名称。
INSTALL_GPU_DRIVERS：设置为 true 时，批处理作业会从第三方位置提取您在 policy 字段中指定的 GPU 类型所需的驱动程序，并代表您安装这些驱动程序。如果您将此字段设置为 false（默认值），则需要手动安装 GPU 驱动程序，才能对此作业使用任何 GPU。
MACHINE_TYPE：A3 机器系列中的机器类型。
ALLOWED_LOCATIONS：您可以选择使用 allowedLocations[] 字段指定允许运行作业虚拟机的区域或区域中的特定可用区，例如，regions/us-central1 允许在区域 us-central1 中的所有可用区运行作业虚拟机。请务必为此作业指定提供您所需 GPU 机器类型的位置。否则，如果您省略此字段，请确保作业所在的位置提供 GPU 机器类型。

为加速器优化型虚拟机使用 GPU

您可以使用 Google Cloud 控制台、gcloud CLI、Batch API、Java、Node.js 或 Python 创建作业，以便为加速器优化型虚拟机使用 GPU。

控制台

如需使用 Google Cloud 控制台创建使用 GPU 的作业，请执行以下操作：

在 Google Cloud 控制台中，前往 Job list（作业列表）页面。

前往“作业列表”
点击创建。系统随即会打开创建批处理作业页面。在左侧窗格中，选择作业详情页面。
配置作业详情页面：
1. 可选：在作业名称字段中，自定义作业名称。
  
  例如，输入 example-gpu-job。
2. 配置任务详情部分：
  1. 在新建可运行对象窗口中，添加至少一个脚本或容器以便此作业运行。
    
    例如，如需创建基本脚本作业，请执行以下操作：
    1. 选中脚本复选框。系统随即会显示一个字段。
    2. 在该字段中，输入以下脚本：
      echo Hello world from task ${BATCH_TASK_INDEX}.
    3. 点击完成。
  2. 在任务数字段中，输入此作业的任务数量。
    
    例如，输入 3。
  3. 可选：在并行处理字段中，输入要并发运行的任务数量。
    
    例如，输入 1（默认值）。
配置 Resource specifications 页面：
1. 在左侧窗格中，点击资源规范。系统随即会打开资源规范页面。
2. 可选：在虚拟机预配模型部分中，为此作业的虚拟机选择以下预配模型选项之一：
  - 如果您的作业可以承受抢占，并且您希望使用折扣虚拟机，请选择 Spot。
  - 否则，请选择标准（默认）。
3. 选择此作业的位置。
  1. 在区域字段中，选择一个区域。
  2. 在区域字段中，执行以下操作之一：
    - 如果您想限制此作业仅在特定可用区运行，请选择一个可用区。
    - 否则，请选择不限（默认）。
  重要提示：请务必仅指定提供您为此作业所需 GPU 机器类型的位置。
4. 为此作业的虚拟机选择 GPU 机器类型：
  1. 在机器系列选项中，点击 GPU。
  2. 在 GPU 类型字段中，选择 GPU 的类型。然后，在 GPU 数量字段中，选择每个虚拟机的 GPU 数量。
    
    如果您选择了适用于加速器优化型虚拟机的 GPU 类型之一，则 Machine type（机器类型）字段仅允许选择一个机器类型选项，具体取决于您选择的 GPU 类型和数量。
  3. 如需自动安装 GPU 驱动程序，请选择 GPU 驱动程序安装（默认）。
5. 配置每项任务所需的虚拟机资源量：
  
  重要提示：请确保 GPU 机器类型有足够的虚拟机资源来满足作业的任务要求。
  1. 在核心数字段中，输入每个任务的 vCPU 数量。
    
    例如，输入 1（默认值）。
  2. 在内存字段中，输入每个任务的 RAM 用量（以 GB 为单位）。
    
    例如，输入 0.5（默认值）。
6. 点击完成。
可选：配置此作业的其他字段。
可选：如需查看作业配置，请在左侧窗格中点击预览。
点击创建。

作业详情页面会显示您创建的作业。

gcloud

创建一个 JSON 文件，用于安装 GPU 驱动程序、指定加速器优化机器系列中的机器类型，并在具有 GPU 机器类型的位置运行。

例如，如需创建一个基本脚本作业，以便为加速器优化的虚拟机使用 GPU，请创建一个包含以下内容的 JSON 文件：

{
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "echo Hello world from task ${BATCH_TASK_INDEX}."
                        }
                    }
                ]
            },
            "taskCount": 3,
            "parallelism": 1
        }
    ],
    "allocationPolicy": {
        "instances": [
            {
                "installGpuDrivers": INSTALL_GPU_DRIVERS,
                "policy": {
                    "machineType": "MACHINE_TYPE"
                }
            }
        ],
        "location": {
            "allowedLocations": [
                "ALLOWED_LOCATIONS"
            ]
        }
    },
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

替换以下内容：

INSTALL_GPU_DRIVERS：设置为 true 时，批处理作业会从第三方位置提取您在 policy 字段中指定的 GPU 类型所需的驱动程序，并代表您安装这些驱动程序。如果您将此字段设置为 false（默认值），则需要手动安装 GPU 驱动程序，才能对此作业使用任何 GPU。
MACHINE_TYPE：加速器优化机器系列中的机器类型。
ALLOWED_LOCATIONS：您可以选择使用 allowedLocations[] 字段指定允许运行作业虚拟机的区域或区域中的特定可用区，例如，regions/us-central1 允许在区域 us-central1 中的所有可用区运行作业虚拟机。请务必为此作业指定提供您所需 GPU 机器类型的位置。否则，如果您省略此字段，请确保作业所在的位置提供 GPU 机器类型。

如需创建和运行作业，请使用 gcloud batch jobs submit 命令：
```
gcloud batch jobs submit JOB_NAME \
    --location LOCATION \
    --config JSON_CONFIGURATION_FILE
```
替换以下内容：
- JOB_NAME：作业的名称。
- LOCATION：作业的位置。
- JSON_CONFIGURATION_FILE：包含作业配置详细信息的 JSON 文件的路径。

API

向 jobs.create 方法发出 POST 请求，该方法会安装 GPU 驱动程序、指定加速器优化机器系列中的机器类型，并在具有 GPU 机器类型的位置运行。

例如，如需创建一个基本脚本作业，以便为加速器优化的虚拟机使用 GPU，请发出以下请求：

POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME

{
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "echo Hello world from task ${BATCH_TASK_INDEX}."
                        }
                    }
                ]
            },
            "taskCount": 3,
            "parallelism": 1
        }
    ],
    "allocationPolicy": {
        "instances": [
            {
                "installGpuDrivers": INSTALL_GPU_DRIVERS,
                "policy": {
                    "machineType": "MACHINE_TYPE"
                }
            }
        ],
        "location": {
            "allowedLocations": [
                "ALLOWED_LOCATIONS"
            ]
        }
    },
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

替换以下内容：

PROJECT_ID：您的项目的项目 ID。
LOCATION：作业的位置。
JOB_NAME：作业的名称。
INSTALL_GPU_DRIVERS：设置为 true 时，批处理作业会从第三方位置提取您在 policy 字段中指定的 GPU 类型所需的驱动程序，并代表您安装这些驱动程序。如果您将此字段设置为 false（默认值），则需要手动安装 GPU 驱动程序，才能对此作业使用任何 GPU。
MACHINE_TYPE：加速器优化机器系列中的机器类型。
ALLOWED_LOCATIONS：您可以选择使用 allowedLocations[] 字段指定允许运行作业虚拟机的区域或区域中的特定可用区，例如，regions/us-central1 允许在区域 us-central1 中的所有可用区运行作业虚拟机。请务必为此作业指定提供您所需 GPU 机器类型的位置。否则，如果您省略此字段，请确保作业所在的位置提供 GPU 机器类型。

Java


import com.google.cloud.batch.v1.AllocationPolicy;
import com.google.cloud.batch.v1.AllocationPolicy.Accelerator;
import com.google.cloud.batch.v1.AllocationPolicy.InstancePolicy;
import com.google.cloud.batch.v1.AllocationPolicy.InstancePolicyOrTemplate;
import com.google.cloud.batch.v1.BatchServiceClient;
import com.google.cloud.batch.v1.CreateJobRequest;
import com.google.cloud.batch.v1.Job;
import com.google.cloud.batch.v1.LogsPolicy;
import com.google.cloud.batch.v1.Runnable;
import com.google.cloud.batch.v1.Runnable.Script;
import com.google.cloud.batch.v1.TaskGroup;
import com.google.cloud.batch.v1.TaskSpec;
import com.google.protobuf.Duration;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class CreateGpuJob {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    // Project ID or project number of the Google Cloud project you want to use.
    String projectId = "YOUR_PROJECT_ID";
    // Name of the region you want to use to run the job. Regions that are
    // available for Batch are listed on: https://cloud.google.com/batch/docs/get-started#locations
    String region = "europe-central2";
    // The name of the job that will be created.
    // It needs to be unique for each project and region pair.
    String jobName = "JOB_NAME";
    // Optional. When set to true, Batch fetches the drivers required for the GPU type
    // that you specify in the policy field from a third-party location,
    // and Batch installs them on your behalf. If you set this field to false (default),
    // you need to install GPU drivers manually to use any GPUs for this job.
    boolean installGpuDrivers = false;
    // Accelerator-optimized machine types are available to Batch jobs. See the list
    // of available types on: https://cloud.google.com/compute/docs/accelerator-optimized-machines
    String machineType = "g2-standard-4";

    createGpuJob(projectId, region, jobName, installGpuDrivers, machineType);
  }

  // Create a job that uses GPUs
  public static Job createGpuJob(String projectId, String region, String jobName,
                                  boolean installGpuDrivers, String machineType)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (BatchServiceClient batchServiceClient = BatchServiceClient.create()) {
      // Define what will be done as part of the job.
      Runnable runnable =
          Runnable.newBuilder()
              .setScript(
                  Script.newBuilder()
                      .setText(
                          "echo Hello world! This is task ${BATCH_TASK_INDEX}. "
                                  + "This job has a total of ${BATCH_TASK_COUNT} tasks.")
                      // You can also run a script from a file. Just remember, that needs to be a
                      // script that's already on the VM that will be running the job.
                      // Using setText() and setPath() is mutually exclusive.
                      // .setPath("/tmp/test.sh")
                      .build())
              .build();

      TaskSpec task = TaskSpec.newBuilder()
                  // Jobs can be divided into tasks. In this case, we have only one task.
                  .addRunnables(runnable)
                  .setMaxRetryCount(2)
                  .setMaxRunDuration(Duration.newBuilder().setSeconds(3600).build())
                  .build();

      // Tasks are grouped inside a job using TaskGroups.
      // Currently, it's possible to have only one task group.
      TaskGroup taskGroup = TaskGroup.newBuilder()
          .setTaskCount(3)
          .setParallelism(1)
          .setTaskSpec(task)
          .build();

      // Policies are used to define on what kind of virtual machines the tasks will run.
      // Read more about machine types here: https://cloud.google.com/compute/docs/machine-types
      InstancePolicy instancePolicy =
          InstancePolicy.newBuilder().setMachineType(machineType).build();  

      // Policies are used to define on what kind of virtual machines the tasks will run on.
      AllocationPolicy allocationPolicy =
          AllocationPolicy.newBuilder()
              .addInstances(
                  InstancePolicyOrTemplate.newBuilder()
                      .setInstallGpuDrivers(installGpuDrivers)
                      .setPolicy(instancePolicy)
                      .build())
              .build();

      Job job =
          Job.newBuilder()
              .addTaskGroups(taskGroup)
              .setAllocationPolicy(allocationPolicy)
              .putLabels("env", "testing")
              .putLabels("type", "script")
              // We use Cloud Logging as it's an out of the box available option.
              .setLogsPolicy(
                  LogsPolicy.newBuilder().setDestination(LogsPolicy.Destination.CLOUD_LOGGING))
              .build();

      CreateJobRequest createJobRequest =
          CreateJobRequest.newBuilder()
              // The job's parent is the region in which the job will run.
              .setParent(String.format("projects/%s/locations/%s", projectId, region))
              .setJob(job)
              .setJobId(jobName)
              .build();

      Job result =
          batchServiceClient
              .createJobCallable()
              .futureCall(createJobRequest)
              .get(5, TimeUnit.MINUTES);

      System.out.printf("Successfully created the job: %s", result.getName());

      return result;
    }
  }
}

Node.js

// Imports the Batch library
const batchLib = require('@google-cloud/batch');
const batch = batchLib.protos.google.cloud.batch.v1;

// Instantiates a client
const batchClient = new batchLib.v1.BatchServiceClient();

/**
 * TODO(developer): Update these variables before running the sample.
 */
// Project ID or project number of the Google Cloud project you want to use.
const projectId = await batchClient.getProjectId();
// Name of the region you want to use to run the job. Regions that are
// available for Batch are listed on: https://cloud.google.com/batch/docs/get-started#locations
const region = 'europe-central2';
// The name of the job that will be created.
// It needs to be unique for each project and region pair.
const jobName = 'batch-gpu-job';
// The GPU type. You can view a list of the available GPU types
// by using the `gcloud compute accelerator-types list` command.
const gpuType = 'nvidia-l4';
// The number of GPUs of the specified type.
const gpuCount = 1;
// Optional. When set to true, Batch fetches the drivers required for the GPU type
// that you specify in the policy field from a third-party location,
// and Batch installs them on your behalf. If you set this field to false (default),
// you need to install GPU drivers manually to use any GPUs for this job.
const installGpuDrivers = false;
// Accelerator-optimized machine types are available to Batch jobs. See the list
// of available types on: https://cloud.google.com/compute/docs/accelerator-optimized-machines
const machineType = 'g2-standard-4';

// Define what will be done as part of the job.
const runnable = new batch.Runnable({
  script: new batch.Runnable.Script({
    commands: ['-c', 'echo Hello world! This is task ${BATCH_TASK_INDEX}.'],
  }),
});

const task = new batch.TaskSpec({
  runnables: [runnable],
  maxRetryCount: 2,
  maxRunDuration: {seconds: 3600},
});

// Tasks are grouped inside a job using TaskGroups.
const group = new batch.TaskGroup({
  taskCount: 3,
  taskSpec: task,
});

// Policies are used to define on what kind of virtual machines the tasks will run on.
// In this case, we tell the system to use "g2-standard-4" machine type.
// Read more about machine types here: https://cloud.google.com/compute/docs/machine-types
const instancePolicy = new batch.AllocationPolicy.InstancePolicy({
  machineType,
  // Accelerator describes Compute Engine accelerators to be attached to the VM
  accelerators: [
    new batch.AllocationPolicy.Accelerator({
      type: gpuType,
      count: gpuCount,
      installGpuDrivers,
    }),
  ],
});

const allocationPolicy = new batch.AllocationPolicy.InstancePolicyOrTemplate({
  instances: [{installGpuDrivers, policy: instancePolicy}],
});

const job = new batch.Job({
  name: jobName,
  taskGroups: [group],
  labels: {env: 'testing', type: 'script'},
  allocationPolicy,
  // We use Cloud Logging as it's an option available out of the box
  logsPolicy: new batch.LogsPolicy({
    destination: batch.LogsPolicy.Destination.CLOUD_LOGGING,
  }),
});
// The job's parent is the project and region in which the job will run
const parent = `projects/${projectId}/locations/${region}`;

async function callCreateBatchGPUJob() {
  // Construct request
  const request = {
    parent,
    jobId: jobName,
    job,
  };

  // Run request
  const [response] = await batchClient.createJob(request);
  console.log(JSON.stringify(response));
}

await callCreateBatchGPUJob();

Python

from google.cloud import batch_v1


def create_gpu_job(project_id: str, region: str, job_name: str) -> batch_v1.Job:
    """
    This method shows how to create a sample Batch Job that will run
    a simple command on Cloud Compute instances on GPU machines.

    Args:
        project_id: project ID or project number of the Cloud project you want to use.
        region: name of the region you want to use to run the job. Regions that are
            available for Batch are listed on: https://cloud.google.com/batch/docs/get-started#locations
        job_name: the name of the job that will be created.
            It needs to be unique for each project and region pair.

    Returns:
        A job object representing the job created.
    """
    client = batch_v1.BatchServiceClient()

    # Define what will be done as part of the job.
    task = batch_v1.TaskSpec()
    runnable = batch_v1.Runnable()
    runnable.script = batch_v1.Runnable.Script()
    runnable.script.text = "echo Hello world! This is task ${BATCH_TASK_INDEX}. This job has a total of ${BATCH_TASK_COUNT} tasks."
    # You can also run a script from a file. Just remember, that needs to be a script that's
    # already on the VM that will be running the job. Using runnable.script.text and runnable.script.path is mutually
    # exclusive.
    # runnable.script.path = '/tmp/test.sh'
    task.runnables = [runnable]

    # We can specify what resources are requested by each task.
    resources = batch_v1.ComputeResource()
    resources.cpu_milli = 2000  # in milliseconds per cpu-second. This means the task requires 2 whole CPUs.
    resources.memory_mib = 16  # in MiB
    task.compute_resource = resources

    task.max_retry_count = 2
    task.max_run_duration = "3600s"

    # Tasks are grouped inside a job using TaskGroups.
    # Currently, it's possible to have only one task group.
    group = batch_v1.TaskGroup()
    group.task_count = 4
    group.task_spec = task

    # Policies are used to define on what kind of virtual machines the tasks will run on.
    # In this case, we tell the system to use "g2-standard-4" machine type.
    # Read more about machine types here: https://cloud.google.com/compute/docs/machine-types
    policy = batch_v1.AllocationPolicy.InstancePolicy()
    policy.machine_type = "g2-standard-4"

    instances = batch_v1.AllocationPolicy.InstancePolicyOrTemplate()
    instances.policy = policy
    instances.install_gpu_drivers = True
    allocation_policy = batch_v1.AllocationPolicy()
    allocation_policy.instances = [instances]

    job = batch_v1.Job()
    job.task_groups = [group]
    job.allocation_policy = allocation_policy
    job.labels = {"env": "testing", "type": "container"}
    # We use Cloud Logging as it's an out of the box available option
    job.logs_policy = batch_v1.LogsPolicy()
    job.logs_policy.destination = batch_v1.LogsPolicy.Destination.CLOUD_LOGGING

    create_request = batch_v1.CreateJobRequest()
    create_request.job = job
    create_request.job_id = job_name
    # The job's parent is the region in which the job will run
    create_request.parent = f"projects/{project_id}/locations/{region}"

    return client.create_job(create_request)

为 N1 虚拟机使用 GPU

您可以使用 Google Cloud 控制台、gcloud CLI、Batch API、Java、Node.js 或 Python 创建针对 N1 虚拟机使用 GPU 的作业。

控制台

如需使用 Google Cloud 控制台创建使用 GPU 的作业，请执行以下操作：

在 Google Cloud 控制台中，前往 Job list（作业列表）页面。

前往“作业列表”
点击创建。系统随即会打开创建批处理作业页面。在左侧窗格中，选择作业详情页面。
配置作业详情页面：
1. 可选：在作业名称字段中，自定义作业名称。
  
  例如，输入 example-gpu-job。
2. 配置任务详情部分：
  1. 在新建可运行对象窗口中，添加至少一个脚本或容器以便此作业运行。
    
    例如，如需创建基本脚本作业，请执行以下操作：
    1. 选中脚本复选框。系统随即会显示一个字段。
    2. 在该字段中，输入以下脚本：
      echo Hello world from task ${BATCH_TASK_INDEX}.
    3. 点击完成。
  2. 在任务数字段中，输入此作业的任务数量。
    
    例如，输入 3。
  3. 可选：在并行处理字段中，输入要并发运行的任务数量。
    
    例如，输入 1（默认值）。
配置 Resource specifications 页面：
1. 在左侧窗格中，点击资源规范。系统随即会打开资源规范页面。
2. 可选：在虚拟机预配模型部分中，为此作业的虚拟机选择以下预配模型选项之一：
  - 如果您的作业可以承受抢占，并且您希望使用折扣虚拟机，请选择 Spot。
  - 否则，请选择标准（默认）。
3. 选择此作业的位置。
  1. 在区域字段中，选择一个区域。
  2. 在区域字段中，执行以下操作之一：
    - 如果您想限制此作业仅在特定可用区运行，请选择一个可用区。
    - 否则，请选择不限（默认）。
  重要提示：请务必仅指定提供您为此作业所需 GPU 机器类型的位置。
4. 为此作业的虚拟机选择 GPU 机器类型：
  1. 在机器系列选项中，点击 GPU。
  2. 在 GPU 类型字段中，选择 GPU 的类型。
    
    如果您选择了适用于 N1 虚拟机的 GPU 类型之一，则 Series 字段会设置为 N1。
  3. 在 GPU 数量字段中，选择每个虚拟机的 GPU 数量。
  4. 在机器类型字段中，选择机器类型。
  5. 如需自动安装 GPU 驱动程序，请选择 GPU 驱动程序安装（默认）。
5. 配置每项任务所需的虚拟机资源量：
  
  重要提示：请确保 GPU 机器类型有足够的虚拟机资源来满足作业的任务要求。
  1. 在核心数字段中，输入每个任务的 vCPU 数量。
    
    例如，输入 1（默认值）。
  2. 在内存字段中，输入每个任务的 RAM 用量（以 GB 为单位）。
    
    例如，输入 0.5（默认值）。
6. 点击完成。
可选：配置此作业的其他字段。
可选：如需查看作业配置，请在左侧窗格中点击预览。
点击创建。

作业详情页面会显示您创建的作业。

gcloud

创建一个 JSON 文件，用于安装 GPU 驱动程序、定义 accelerators[] 字段的 type 和 count 子字段，并在具有 GPU 机器类型的位置运行。

例如，如需创建一个基本脚本作业，以便为 N1 虚拟机使用 GPU，并让批处理选择确切的 N1 机器类型，请创建一个包含以下内容的 JSON 文件：

{
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "echo Hello world from task ${BATCH_TASK_INDEX}."
                        }
                    }
                ]
            },
            "taskCount": 3,
            "parallelism": 1
        }
    ],
    "allocationPolicy": {
        "instances": [
            {
                "installGpuDrivers": INSTALL_GPU_DRIVERS,
                "policy": {
                    "accelerators": [
                        {
                            "type": "GPU_TYPE",
                            "count": GPU_COUNT
                        }
                    ]
                }
            }
        ],
        "location": {
            "allowedLocations": [
                "ALLOWED_LOCATIONS"
            ]
        }
    },
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

替换以下内容：

INSTALL_GPU_DRIVERS：设置为 true 时，批处理作业会从第三方位置提取您在 policy 字段中指定的 GPU 类型所需的驱动程序，并代表您安装这些驱动程序。如果您将此字段设置为 false（默认值），则需要手动安装 GPU 驱动程序，才能对此作业使用任何 GPU。
GPU_TYPE：GPU 类型。您可以使用 gcloud compute accelerator-types list 命令查看可用 GPU 类型的列表。仅将此字段用于 N1 虚拟机的 GPU。
GPU_COUNT：指定类型的 GPU 数量。如需详细了解有效选项，请参阅 N1 机器系列的 GPU 机器类型。仅将此字段用于 N1 虚拟机的 GPU。
ALLOWED_LOCATIONS：您可以选择使用 allowedLocations[] 字段指定允许运行作业虚拟机的区域或区域中的特定可用区，例如，regions/us-central1 允许在区域 us-central1 中的所有可用区运行作业虚拟机。请务必为此作业指定提供您所需 GPU 机器类型的位置。否则，如果您省略此字段，请确保作业所在的位置提供 GPU 机器类型。

如需创建和运行作业，请使用 gcloud batch jobs submit 命令：
```
gcloud batch jobs submit JOB_NAME \
    --location LOCATION \
    --config JSON_CONFIGURATION_FILE
```
替换以下内容：
- JOB_NAME：作业的名称。
- LOCATION：作业的位置。
- JSON_CONFIGURATION_FILE：包含作业配置详细信息的 JSON 文件的路径。

API

向用于安装 GPU 驱动程序、定义 accelerators[] 字段的 type 和 count 子字段以及使用具有 GPU 机器类型的位置的 jobs.create 方法发出 POST 请求。

例如，如需创建一个基本脚本作业，以便为 N1 虚拟机使用 GPU，并让批处理选择确切的 N1 机器类型，请发出以下请求：

POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME

{
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "echo Hello world from task ${BATCH_TASK_INDEX}."
                        }
                    }
                ]
            },
            "taskCount": 3,
            "parallelism": 1
        }
    ],
    "allocationPolicy": {
        "instances": [
            {
                "installGpuDrivers": INSTALL_GPU_DRIVERS,
                "policy": {
                    "accelerators": [
                        {
                            "type": "GPU_TYPE",
                            "count": GPU_COUNT
                        }
                    ]
                }
            }
        ],
        "location": {
            "allowedLocations": [
                "ALLOWED_LOCATIONS"
            ]
        }
    },
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

替换以下内容：

PROJECT_ID：您的项目的项目 ID。
LOCATION：作业的位置。
JOB_NAME：作业的名称。
INSTALL_GPU_DRIVERS：设置为 true 时，批处理作业会从第三方位置提取您在 policy 字段中指定的 GPU 类型所需的驱动程序，并代表您安装这些驱动程序。如果您将此字段设置为 false（默认值），则需要手动安装 GPU 驱动程序，才能对此作业使用任何 GPU。
GPU_TYPE：GPU 类型。您可以使用 gcloud compute accelerator-types list 命令查看可用 GPU 类型的列表。仅将此字段用于 N1 虚拟机的 GPU。
GPU_COUNT：指定类型的 GPU 数量。如需详细了解有效选项，请参阅适用于 N1 机器系列的 GPU 机器类型。仅将此字段用于 N1 虚拟机的 GPU。
ALLOWED_LOCATIONS：您可以选择使用 allowedLocations[] 字段指定允许运行作业虚拟机的区域或区域中的特定可用区，例如，regions/us-central1 允许在区域 us-central1 中的所有可用区运行作业虚拟机。请务必为此作业指定提供您所需 GPU 机器类型的位置。否则，如果您省略此字段，请确保作业所在的位置提供 GPU 机器类型。

Java


import com.google.cloud.batch.v1.AllocationPolicy;
import com.google.cloud.batch.v1.AllocationPolicy.Accelerator;
import com.google.cloud.batch.v1.AllocationPolicy.InstancePolicy;
import com.google.cloud.batch.v1.AllocationPolicy.InstancePolicyOrTemplate;
import com.google.cloud.batch.v1.BatchServiceClient;
import com.google.cloud.batch.v1.CreateJobRequest;
import com.google.cloud.batch.v1.Job;
import com.google.cloud.batch.v1.LogsPolicy;
import com.google.cloud.batch.v1.Runnable;
import com.google.cloud.batch.v1.Runnable.Script;
import com.google.cloud.batch.v1.TaskGroup;
import com.google.cloud.batch.v1.TaskSpec;
import com.google.protobuf.Duration;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class CreateGpuJobN1 {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    // Project ID or project number of the Google Cloud project you want to use.
    String projectId = "YOUR_PROJECT_ID";
    // Name of the region you want to use to run the job. Regions that are
    // available for Batch are listed on: https://cloud.google.com/batch/docs/get-started#locations
    String region = "europe-central2";
    // The name of the job that will be created.
    // It needs to be unique for each project and region pair.
    String jobName = "JOB_NAME";
    // Optional. When set to true, Batch fetches the drivers required for the GPU type
    // that you specify in the policy field from a third-party location,
    // and Batch installs them on your behalf. If you set this field to false (default),
    // you need to install GPU drivers manually to use any GPUs for this job.
    boolean installGpuDrivers = false;
    // The GPU type. You can view a list of the available GPU types
    // by using the `gcloud compute accelerator-types list` command.
    String gpuType = "nvidia-tesla-t4";
    // The number of GPUs of the specified type.
    int gpuCount = 2;

    createGpuJob(projectId, region, jobName, installGpuDrivers, gpuType, gpuCount);
  }

  // Create a job that uses GPUs
  public static Job createGpuJob(String projectId, String region, String jobName,
                                  boolean installGpuDrivers, String gpuType, int gpuCount)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (BatchServiceClient batchServiceClient = BatchServiceClient.create()) {
      // Define what will be done as part of the job.
      Runnable runnable =
          Runnable.newBuilder()
              .setScript(
                  Script.newBuilder()
                      .setText(
                          "echo Hello world! This is task ${BATCH_TASK_INDEX}. "
                                  + "This job has a total of ${BATCH_TASK_COUNT} tasks.")
                      // You can also run a script from a file. Just remember, that needs to be a
                      // script that's already on the VM that will be running the job.
                      // Using setText() and setPath() is mutually exclusive.
                      // .setPath("/tmp/test.sh")
                      .build())
              .build();

      TaskSpec task = TaskSpec.newBuilder()
                  // Jobs can be divided into tasks. In this case, we have only one task.
                  .addRunnables(runnable)
                  .setMaxRetryCount(2)
                  .setMaxRunDuration(Duration.newBuilder().setSeconds(3600).build())
                  .build();

      // Tasks are grouped inside a job using TaskGroups.
      // Currently, it's possible to have only one task group.
      TaskGroup taskGroup = TaskGroup.newBuilder()
          .setTaskCount(3)
          .setParallelism(1)
          .setTaskSpec(task)
          .build();

      // Accelerator describes Compute Engine accelerators to be attached to the VM.
      Accelerator accelerator = Accelerator.newBuilder()
          .setType(gpuType)
          .setCount(gpuCount)
          .build();

      // Policies are used to define on what kind of virtual machines the tasks will run on.
      AllocationPolicy allocationPolicy =
          AllocationPolicy.newBuilder()
              .addInstances(
                  InstancePolicyOrTemplate.newBuilder()
                      .setInstallGpuDrivers(installGpuDrivers)
                      .setPolicy(InstancePolicy.newBuilder().addAccelerators(accelerator))
                      .build())
              .build();

      Job job =
          Job.newBuilder()
              .addTaskGroups(taskGroup)
              .setAllocationPolicy(allocationPolicy)
              .putLabels("env", "testing")
              .putLabels("type", "script")
              // We use Cloud Logging as it's an out of the box available option.
              .setLogsPolicy(
                  LogsPolicy.newBuilder().setDestination(LogsPolicy.Destination.CLOUD_LOGGING))
              .build();

      CreateJobRequest createJobRequest =
          CreateJobRequest.newBuilder()
              // The job's parent is the region in which the job will run.
              .setParent(String.format("projects/%s/locations/%s", projectId, region))
              .setJob(job)
              .setJobId(jobName)
              .build();

      Job result =
          batchServiceClient
              .createJobCallable()
              .futureCall(createJobRequest)
              .get(5, TimeUnit.MINUTES);

      System.out.printf("Successfully created the job: %s", result.getName());

      return result;
    }
  }
}

Node.js

// Imports the Batch library
const batchLib = require('@google-cloud/batch');
const batch = batchLib.protos.google.cloud.batch.v1;

// Instantiates a client
const batchClient = new batchLib.v1.BatchServiceClient();

/**
 * TODO(developer): Update these variables before running the sample.
 */
// Project ID or project number of the Google Cloud project you want to use.
const projectId = await batchClient.getProjectId();
// Name of the region you want to use to run the job. Regions that are
// available for Batch are listed on: https://cloud.google.com/batch/docs/get-started#locations
const region = 'europe-central2';
// The name of the job that will be created.
// It needs to be unique for each project and region pair.
const jobName = 'batch-gpu-job-n1';
// The GPU type. You can view a list of the available GPU types
// by using the `gcloud compute accelerator-types list` command.
const gpuType = 'nvidia-tesla-t4';
// The number of GPUs of the specified type.
const gpuCount = 1;
// Optional. When set to true, Batch fetches the drivers required for the GPU type
// that you specify in the policy field from a third-party location,
// and Batch installs them on your behalf. If you set this field to false (default),
// you need to install GPU drivers manually to use any GPUs for this job.
const installGpuDrivers = false;
// Accelerator-optimized machine types are available to Batch jobs. See the list
// of available types on: https://cloud.google.com/compute/docs/accelerator-optimized-machines
const machineType = 'n1-standard-16';

// Define what will be done as part of the job.
const runnable = new batch.Runnable({
  script: new batch.Runnable.Script({
    commands: ['-c', 'echo Hello world! This is task ${BATCH_TASK_INDEX}.'],
  }),
});

const task = new batch.TaskSpec({
  runnables: [runnable],
  maxRetryCount: 2,
  maxRunDuration: {seconds: 3600},
});

// Tasks are grouped inside a job using TaskGroups.
const group = new batch.TaskGroup({
  taskCount: 3,
  taskSpec: task,
});

// Policies are used to define on what kind of virtual machines the tasks will run on.
// In this case, we tell the system to use "g2-standard-4" machine type.
// Read more about machine types here: https://cloud.google.com/compute/docs/machine-types
const instancePolicy = new batch.AllocationPolicy.InstancePolicy({
  machineType,
  // Accelerator describes Compute Engine accelerators to be attached to the VM
  accelerators: [
    new batch.AllocationPolicy.Accelerator({
      type: gpuType,
      count: gpuCount,
      installGpuDrivers,
    }),
  ],
});

const allocationPolicy = new batch.AllocationPolicy.InstancePolicyOrTemplate({
  instances: [{installGpuDrivers, policy: instancePolicy}],
});

const job = new batch.Job({
  name: jobName,
  taskGroups: [group],
  labels: {env: 'testing', type: 'script'},
  allocationPolicy,
  // We use Cloud Logging as it's an option available out of the box
  logsPolicy: new batch.LogsPolicy({
    destination: batch.LogsPolicy.Destination.CLOUD_LOGGING,
  }),
});
// The job's parent is the project and region in which the job will run
const parent = `projects/${projectId}/locations/${region}`;

async function callCreateBatchGPUJobN1() {
  // Construct request
  const request = {
    parent,
    jobId: jobName,
    job,
  };

  // Run request
  const [response] = await batchClient.createJob(request);
  console.log(JSON.stringify(response));
}

await callCreateBatchGPUJobN1();

Python

from google.cloud import batch_v1


def create_gpu_job(
    project_id: str, region: str, zone: str, job_name: str
) -> batch_v1.Job:
    """
    This method shows how to create a sample Batch Job that will run
    a simple command on Cloud Compute instances on GPU machines.

    Args:
        project_id: project ID or project number of the Cloud project you want to use.
        region: name of the region you want to use to run the job. Regions that are
            available for Batch are listed on: https://cloud.google.com/batch/docs/get-started#locations
        zone: name of the zone you want to use to run the job. Important in regard to GPUs availability.
            GPUs availability can be found here: https://cloud.google.com/compute/docs/gpus/gpu-regions-zones
        job_name: the name of the job that will be created.
            It needs to be unique for each project and region pair.

    Returns:
        A job object representing the job created.
    """
    client = batch_v1.BatchServiceClient()

    # Define what will be done as part of the job.
    task = batch_v1.TaskSpec()
    runnable = batch_v1.Runnable()
    runnable.script = batch_v1.Runnable.Script()
    runnable.script.text = "echo Hello world! This is task ${BATCH_TASK_INDEX}. This job has a total of ${BATCH_TASK_COUNT} tasks."
    # You can also run a script from a file. Just remember, that needs to be a script that's
    # already on the VM that will be running the job. Using runnable.script.text and runnable.script.path is mutually
    # exclusive.
    # runnable.script.path = '/tmp/test.sh'
    task.runnables = [runnable]

    # We can specify what resources are requested by each task.
    resources = batch_v1.ComputeResource()
    resources.cpu_milli = 2000  # in milliseconds per cpu-second. This means the task requires 2 whole CPUs.
    resources.memory_mib = 16  # in MiB
    task.compute_resource = resources

    task.max_retry_count = 2
    task.max_run_duration = "3600s"

    # Tasks are grouped inside a job using TaskGroups.
    # Currently, it's possible to have only one task group.
    group = batch_v1.TaskGroup()
    group.task_count = 4
    group.task_spec = task

    # Policies are used to define on what kind of virtual machines the tasks will run on.
    # Read more about machine types here: https://cloud.google.com/compute/docs/machine-types
    policy = batch_v1.AllocationPolicy.InstancePolicy()
    policy.machine_type = "n1-standard-16"

    accelerator = batch_v1.AllocationPolicy.Accelerator()
    # Note: not every accelerator is compatible with instance type
    # Read more here: https://cloud.google.com/compute/docs/gpus#t4-gpus
    accelerator.type_ = "nvidia-tesla-t4"
    accelerator.count = 1

    policy.accelerators = [accelerator]
    instances = batch_v1.AllocationPolicy.InstancePolicyOrTemplate()
    instances.policy = policy
    instances.install_gpu_drivers = True
    allocation_policy = batch_v1.AllocationPolicy()
    allocation_policy.instances = [instances]

    location = batch_v1.AllocationPolicy.LocationPolicy()
    location.allowed_locations = ["zones/us-central1-b"]
    allocation_policy.location = location

    job = batch_v1.Job()
    job.task_groups = [group]
    job.allocation_policy = allocation_policy
    job.labels = {"env": "testing", "type": "container"}
    # We use Cloud Logging as it's an out of the box available option
    job.logs_policy = batch_v1.LogsPolicy()
    job.logs_policy.destination = batch_v1.LogsPolicy.Destination.CLOUD_LOGGING

    create_request = batch_v1.CreateJobRequest()
    create_request.job = job
    create_request.job_id = job_name
    # The job's parent is the region in which the job will run
    create_request.parent = f"projects/{project_id}/locations/{region}"

    return client.create_job(create_request)

后续步骤

如果您在创建或运行作业时遇到问题，请参阅问题排查。
查看作业和任务。
不妨详细了解作业创建选项。