You can specify whether a pipeline task must be rerun if it fails, by configuring the retries for that task. You can set the number of attempts to rerun the task on failure and the delay between subsequent retries.
Use the following code sample to configure the failure policy of a pipeline task
named train_op
by using the
set_retry
method in the Kubeflow Pipelines SDK:
from kfp import dsl
@dsl.pipeline(name='custom-container-pipeline')
def pipeline():
generate = generate_op()
train = (
train_op(
training_data=generate.outputs['training_data'],
test_data=generate.outputs['test_data'],
config_file=generate.outputs['config_file'])
.set_retry(
num_retries=NUMBER_OF_RETRIES,
backoff_duration='BACKOFF_DURATION',
backoff_factor=BACKOFF_FACTOR,
backoff_maxk_duration='BACKOFF_MAX_DURATION'
)
Replace the following:
NUMBER_OF_RETRIES: The number of times to retry the task upon failure.
BACKOFF_DURATION: Optional. The duration of time wait after the task fails before retrying. If you don't set this parameter, the duration is set to
0s
, by default.BACKOFF_FACTOR: Optional. The factor by which the backoff duration is multiplied for each subsequent retry. If you don't set this parameter, the backoff factor is set to
2.0
, by default.BACKOFF_MAX_DURATION: Optional. The maximum backoff duration between subsequent retries. If you don't set this parameter, the maximum duration is set to
3600s
, by default.