试用 Gemini 1.5 模型，这是 Vertex AI 中最新的多模态模型，看看您可以运用多达 200 万词元的上下文窗口构建怎样的应用。 试用 Gemini 1.5 模型，这是 Vertex AI 中最新的多模态模型，看看您可以运用多达 200 万词元的上下文窗口构建怎样的应用。

准备用于分类的视频训练数据

本页面介绍了如何准备视频训练数据，以便在 Vertex AI 数据集中使用来训练视频分类模型。

以下各部分介绍数据要求、架构文件以及架构定义的数据导入文件（JSONL 和 CSV）的格式。

或者，您也可以导入尚未添加注解的视频，并在之后使用 Google Cloud 控制台添加注解（请参阅使用 Google Cloud 控制台添加标签）。

数据要求

以下要求适用于用于训练 AutoML 或自定义训练的模型的数据集。

Vertex AI 支持以下视频格式来训练模型或请求执行预测（为视频添加注释）。
- .MOV
- .MPEG4
- .MP4
- .AVI
要在网络控制台中查看视频内容或为视频添加注解，视频必须采用浏览器原生支持的格式。由于并非所有浏览器都以原生方式处理 .MOV 或 .AVI 内容，因此建议使用 .MPEG4 或 .MP4 视频格式。
文件大小的上限为 50 GB（时长不超过 3 小时）。不支持容器中格式错误或时间戳为空的单个视频文件。
每个数据集中的最大标签数限制为 1,000。
您可以向导入文件中的视频分配“ML_USE”标签。在训练时，您可以选择使用这些标签将视频及其对应的注释拆分为“训练”集或“测试”集。对于视频分类，请注意以下事项：
- 模型训练需要至少两个不同的类别。例如，“新闻”和“MTV”，或“游戏”和“其他”。
- 考虑添加一个“None_of_the_above”类别以及与您定义的任何类别都不匹配的视频片段。

用于训练 AutoML 模型的视频数据的最佳做法

以下做法适用于用于训练 AutoML 模型的数据集。

训练数据应尽可能接近要对其执行预测的数据。例如，如果您的用例涉及模糊的低分辨率视频（例如，来自监控摄像头的视频），那么您的训练数据应由模糊的低分辨率视频组成。一般来说，您还应该考虑为训练视频提供多种角度、分辨率和背景。
Vertex AI 模型通常不能预测人类无法分配的标签。如果一个人经过训练，仍无法在观看视频 1-2 秒后分配标签，那么模型可能也无法通过训练达到此目的。
如果最常见标签下的视频数量不超过最罕见标签下文档数量的 100 倍，则模型效果最佳。建议移除出现频率较低的标签。对于视频分类，每个标签的建议训练视频数约为 1,000。每个标签的训练视频数下限为 10；对于高级模型为 50。一般来说，每个标签需要有更多样本，才能训练每个视频具有多个标签的模型，并且得出的分数也更难以解读。

架构文件

创建用于导入注释的 jsonl 文件时，请使用以下可公开访问的架构文件。此架构文件规定数据输入文件的格式。文件的结构遵循 OpenAPI 架构测试。

视频分类架构文件：

gs://google-cloud-aiplatform/schema/dataset/ioformat/video_classification_io_format_1.0.0.yaml

完整架构文件


title: VideoClassification
description: >
  Import and export format for importing/exporting videos together with
  classification annotations with time segment. Can be used in
  Dataset.import_schema_uri field.
type: object
required:
- videoGcsUri
properties:
  videoGcsUri:
    type: string
    description: >
      A Cloud Storage URI pointing to a video. Up to 50 GB in size and
      up to 3 hours in duration. Supported file mime types: `video/mp4`,
      `video/avi`, `video/quicktime`.
  timeSegmentAnnotations:
    type: array
    description: >
      Multiple classification annotations. Each on a time segment of the video.
    items:
      type: object
      description: Annotation with a time segment on media (e.g., video).
      properties:
        displayName:
          type: string
          description: >
            It will be imported as/exported from AnnotationSpec's display name.
        startTime:
          type: string
          description: >
            The start of the time segment. Expressed as a number of seconds as
            measured from the start of the video, with "s" appended at the end.
            Fractions are allowed, up to a microsecond precision.
          default: 0s
        endTime:
          type: string
          description: >
            The end of the time segment. Expressed as a number of seconds as
            measured from the start of the video, with "s" appended at the end.
            Fractions are allowed, up to a microsecond precision, and "Infinity"
            is allowed, which corresponds to the end of the video.
          default: Infinity
        annotationResourceLabels:
          description: Resource labels on the Annotation.
          type: object
          additionalProperties:
            type: string
  dataItemResourceLabels:
    description: Resource labels on the DataItem.
    type: object
    additionalProperties:
      type: string

输入文件

用于视频分类的训练数据的格式如下。

如需导入数据，请创建 JSONL 或 CSV 文件。

JSONL

每行的 JSON：
如需了解详情，请参阅分类架构（全局）文件。


{
	"videoGcsUri": "gs://bucket/filename.ext",
	"timeSegmentAnnotations": [{
		"displayName": "LABEL",
		"startTime": "start_time_of_segment",
		"endTime": "end_time_of_segment"
	}],
	"dataItemResourceLabels": {
		"aiplatform.googleapis.com/ml_use": "train|test"
	}
}

示例 JSONL - 视频分类：


{"videoGcsUri": "gs://demo/video1.mp4", "timeSegmentAnnotations": [{"displayName": "cartwheel", "startTime": "1.0s", "endTime": "12.0s"}], "dataItemResourceLabels": {"aiplatform.googleapis.com/ml_use": "training"}}
{"videoGcsUri": "gs://demo/video2.mp4", "timeSegmentAnnotations": [{"displayName": "swing", "startTime": "4.0s", "endTime": "9.0s"}], "dataItemResourceLabels": {"aiplatform.googleapis.com/ml_use": "test"}}
...

CSV

CSV 中一行的格式：

[ML_USE,]VIDEO_URI,LABEL,START,END

列的列表

ML_USE（可选）。在训练模型时用于拆分数据。使用 TRAINING 或 TEST。
VIDEO_URI。此字段包含视频的 Cloud Storage URI。Cloud Storage URI 区分大小写。
LABEL。标签必须以字母开头，且只能包含字母、数字和下划线。您可以在 CSV 文件中添加多行，每行标识同一视频片段但采用不同的标签，以此为视频指定多个标签。
START,END。START 和 END 这两列分别标识要分析的视频片段的开始时间和结束时间（以秒为单位）。开始时间必须早于结束时间。这两个值必须为非负数，并且必须在视频的时间范围内。例如 0.09845,1.36005。如需使用视频的全部内容，请将开始时间指定为 0，并将结束时间指定为视频全长或“inf”。例如：0,inf。

示例 CSV - 使用单个标签进行分类

同一视频片段的单个标签：

TRAINING,gs://YOUR_VIDEO_PATH/vehicle.mp4,mustang,0,5.4
...

示例 CSV - 多个标签：

同一视频片段的多个标签：

gs://YOUR_VIDEO_PATH/vehicle.mp4,fiesta,0,8.285
gs://YOUR_VIDEO_PATH/vehicle.mp4,ranger,0,8.285
gs://YOUR_VIDEO_PATH/vehicle.mp4,explorer,0,8.285
...

示例 CSV - 无标签：

您也可以在数据文件中提供视频，但不指定任何标签。然后，在训练模型之前，您必须先使用 Google Cloud 控制台将标签应用于数据。为此，您只需提供视频的 Cloud Storage URI，后跟三个英文逗号，如以下示例所示。

gs://YOUR_VIDEO_PATH/vehicle.mp4,,,
...

创建数据集