配置执行缓存

Vertex AI Pipelines 运行流水线时，会通过每个流水线步骤的接口（缓存密钥）检查 Vertex ML Metadata 是否存在执行作业。

此步骤的界面定义为以下各项的组合：

流水线步骤的输入。这些输入包括输入参数的值（如有）和输入工件 ID（如果有）。
流水线步骤的输出定义。此输出定义包括输出参数定义（名称，如果有）和输出工件定义（名称，如有）。
组件的规范。此规范包括使用的映像、命令、参数和环境变量，以及命令和参数的顺序。

此外，只有具有相同流水线名称的流水线才会共享缓存。

如果 Vertex ML Metadata 中存在匹配的执行作业，则使用该执行作业的输出，并跳过该步骤。这样可以跳过已在上一个流水线运行中完成的计算，从而降低费用。

您可以通过设置以下内容来关闭任务级别的执行缓存：

eval_task.set_caching_options(False)

您可以为整个流水线作业关闭执行缓存。使用 PipelineJob() 运行流水线时，可以使用 enable_caching 参数指定此流水线运行不使用缓存。流水线作业中的所有步骤都不会使用缓存。详细了解如何创建流水线运行。

使用以下示例关闭缓存：

pl = PipelineJob(
    display_name="My first pipeline",

    # Whether or not to enable caching
    # True = enable the current run to use caching results from previous runs
    # False = disable the current run's use of caching results from previous runs
    # None = defer to cache option for each pipeline component in the pipeline definition
    enable_caching=False,

    # Local or Cloud Storage path to a compiled pipeline definition
    template_path="pipeline.yaml",

    # Dictionary containing input parameters for your pipeline
    parameter_values=parameter_values,

    # Cloud Storage path to act as the pipeline root
    pipeline_root=pipeline_root,
)

此功能存在以下限制：

缓存的结果没有存留时间 (TTL)，只要条目未从 Vertex ML Metadata 中删除，就可以重复使用。如果该条目已从 Vertex 机器学习元数据中删除，则任务将重新运行以重新生成结果。