If you run multiple SDK processes on a shared Dataflow GPU, you can improve GPU efficiency and utilization by enabling the NVIDIA Multi-Process Service (MPS). MPS supports concurrent processing on a GPU by enabling processes to share CUDA contexts and scheduling resources. MPS can reduce context-switching costs, increase parallelism, and reduce storage requirements.
Target workflows are Python pipelines that run on workers with more than one vCPU.
MPS is an NVIDIA technology that implements the CUDA API, an NVIDIA platform that supports general-purpose GPU computing. For more information, see the NVIDIA Multi-Process Service user guide.
Benefits
- Improves parallel processing and overall throughput for GPU pipelines, especially for workloads with low GPU resource usage.
- Improves GPU utilization, which might reduce your costs.
Support and limitations
- MPS is supported only on Dataflow workers that use a single GPU.
- The pipeline can't use pipeline options that restrict parallelism.
- Avoid exceeding the available GPU memory, especially for use cases that involve loading large machine learning models. Balance the number of vCPUs and SDK processes with the available GPU memory that these processes need.
- MPS doesn't affect the concurrency of non-GPU operations.
- Dataflow Prime doesn't support MPS.
Enable MPS
When you run a pipeline with GPUs, enable MPS by doing the following:
- In the pipeline option
--dataflow_service_options
, appenduse_nvidia_mps
to theworker_accelerator
parameter. - Set the
count
to 1. - Don't use the pipeline option
--experiments=no_use_multiple_sdk_containers
.
The pipeline option --dataflow_service_options
looks like the following:
--dataflow_service_options="worker_accelerator=type:GPU_TYPE;count:1;install-nvidia-driver;use_nvidia_mps"
If you use TensorFlow and enable MPS, do the following:
- Enable dynamic memory allocation on the GPU. Use either of the following TensorFlow options:
- Turn on memory growth by calling
tf.config.experimental.set_memory_growth(gpu, True)
. - Set the environmental variable
TF_FORCE_GPU_ALLOW_GROWTH
to true.
- Turn on memory growth by calling
- Use logical devices with appropriate memory limits.
- For optimal performance, enforce the use of the GPU when possible by using soft device placement or manual placement.
What's next
- To review more best practices, see GPUs and worker parallelism.