Specify machine types for a pipeline step

Kubeflow pipeline components are factory functions that create pipeline steps. Each component describes the inputs, outputs, and implementation of the component. For example, in the code sample below, train_op is a component.

For example, a training component could take a CSV file as an input and use it to train a model. By setting the machine type parameters on the pipeline step, you can manage the requirements of each step in your pipeline. If you have two training steps and one step trains on a huge data file and the second step trains on a small data file, you can allocate more memory and CPU to the first task, and fewer resources to the second task.

By default, the component will run on as a Vertex AI CustomJob using an e2-standard-4 machine, with 4 core CPUs and 16GB memory.

The following sample shows you how to set CPU, memory, and GPU configuration settings for a step:

@dsl.pipeline(name='custom-container-pipeline')
def pipeline():
  generate = generate_op()
  train = (train_op(
      training_data=generate.outputs['training_data'],
      test_data=generate.outputs['test_data'],
      config_file=generate.outputs['config_file']).
    set_cpu_limit('CPU_LIMIT').
    set_memory_limit('MEMORY_LIMIT').
    add_node_selector_constraint(SELECTOR_CONSTRAINT).
    set_gpu_limit(GPU_LIMIT))

Replace the following:

  • CPU_LIMIT: The maximum CPU limit for this operator. This string value can be a number (integer value for number of CPUs), or a number followed by "m", which means 1/1000. You can specify at most 96 CPUs.

  • MEMORY_LIMIT: The maximum memory limit for this operator. This string value can be a number, or a number followed by "K" (kilobyte), "M" (megabyte), or "G" (gigabyte). At most 624GB is supported.

  • SELECTOR_CONSTRAINT: Each constraint is a key-value pair label. For the container to be eligible to run on a node, the node must have each of the constraints appeared as labels. For example:

    • 'cloud.google.com/gke-accelerator', 'NVIDIA_TESLA_K80'
      • Available values:
        • NVIDIA_TESLA_K80
        • NVIDIA_TESLA_P4
        • NVIDIA_TESLA_P100
        • NVIDIA_TESLA_T4
        • NVIDIA_TESLA_V100
  • GPU_LIMIT: The GPU limit (positive number) for the operator.

    For more information about GPU resources, see Configuring compute resources for custom training.

Vertex AI Pipelines will automatically find the best matching machine type to run the component.