Mejora el rendimiento en una GPU compartida mediante NVIDIA MPS
Organiza tus páginas con colecciones
Guarda y categoriza el contenido según tus preferencias.
Si ejecutas varios procesos del SDK en una GPU de Dataflow compartida, puedes mejorar la eficiencia y el uso de la GPU mediante la habilitación del servicio de procesos múltiples (MPS) de NVIDIA. MPS admite el procesamiento simultáneo en una GPU, ya que permite que los procesos compartan contextos de CUDA y programen recursos. MPS puede reducir los costos de cambio de contexto, aumentar el paralelismo y reducir los requisitos de almacenamiento.
Los flujos de trabajo de destino son canalizaciones de Python que se ejecutan en trabajadores con más de una CPU virtual.
Mejora el procesamiento paralelo y la capacidad de procesamiento general de las canalizaciones de GPU, en especial para las cargas de trabajo con un uso bajo de recursos de GPU.
Mejora el uso de GPU, lo que puede reducir los costos.
Asistencia y limitaciones
MPS solo es compatible con los trabajadores de Dataflow que usan una sola GPU.
La canalización no puede usar opciones de canalización que restringen el paralelismo.
Evita exceder la memoria de GPU disponible, en especial para los casos de uso que implican cargar modelos de aprendizaje automático grandes. Equilibra la cantidad de procesos de CPU virtuales y SDK con la memoria de GPU disponible que necesitan estos procesos.
MPS no afecta la simultaneidad de las operaciones que no son de GPU.
[[["Fácil de comprender","easyToUnderstand","thumb-up"],["Resolvió mi problema","solvedMyProblem","thumb-up"],["Otro","otherUp","thumb-up"]],[["Difícil de entender","hardToUnderstand","thumb-down"],["Información o código de muestra incorrectos","incorrectInformationOrSampleCode","thumb-down"],["Faltan la información o los ejemplos que necesito","missingTheInformationSamplesINeed","thumb-down"],["Problema de traducción","translationIssue","thumb-down"],["Otro","otherDown","thumb-down"]],["Última actualización: 2025-09-04 (UTC)"],[[["\u003cp\u003eNVIDIA Multi-Process Service (MPS) improves GPU efficiency and utilization when running multiple SDK processes on a shared Dataflow GPU by enabling concurrent processing and resource sharing.\u003c/p\u003e\n"],["\u003cp\u003eEnabling MPS enhances parallel processing and throughput for GPU pipelines, particularly for workloads with low GPU resource usage, potentially reducing overall costs.\u003c/p\u003e\n"],["\u003cp\u003eMPS is supported on Dataflow workers with a single GPU and requires specific pipeline configurations, including appending \u003ccode\u003euse_nvidia_mps\u003c/code\u003e to the \u003ccode\u003eworker_accelerator\u003c/code\u003e parameter with a count of 1 and avoiding the \u003ccode\u003e--experiments=no_use_multiple_sdk_containers\u003c/code\u003e option.\u003c/p\u003e\n"],["\u003cp\u003eWhen using TensorFlow with MPS, you must enable dynamic memory allocation on the GPU and use logical devices with memory limits to optimize performance.\u003c/p\u003e\n"],["\u003cp\u003eMPS is not compatible with Dataflow Prime.\u003c/p\u003e\n"]]],[],null,["# Improve performance on a shared GPU by using NVIDIA MPS\n\nIf you run multiple SDK processes on a shared Dataflow GPU, you\ncan improve GPU efficiency and utilization by enabling the NVIDIA Multi-Process\nService (MPS). MPS supports concurrent processing on a GPU by enabling processes\nto share CUDA contexts and scheduling resources. MPS can reduce\ncontext-switching costs, increase parallelism, and reduce storage requirements.\n\nTarget workflows are Python pipelines that run on workers with more than one\nvCPU.\n\nMPS is an NVIDIA technology that implements the CUDA API, an NVIDIA platform\nthat supports general-purpose GPU computing. For more information, see the\n[NVIDIA Multi-Process Service user guide](https://docs.nvidia.com/deploy/mps/index.html).\n\nBenefits\n--------\n\n- Improves parallel processing and overall throughput for GPU pipelines, especially for workloads with low GPU resource usage.\n- Improves GPU utilization, which might reduce your costs.\n\nSupport and limitations\n-----------------------\n\n- MPS is supported only on Dataflow workers that use a single GPU.\n- The pipeline can't use pipeline options that restrict parallelism.\n- Avoid exceeding the available GPU memory, especially for use cases that involve loading large machine learning models. Balance the number of vCPUs and SDK processes with the available GPU memory that these processes need.\n- MPS doesn't affect the concurrency of non-GPU operations.\n- Dataflow Prime doesn't support MPS.\n\nEnable MPS\n----------\n\nWhen you [run a pipeline with GPUs](/dataflow/docs/gpu/use-gpus), enable MPS by\ndoing the following:\n\n- In the pipeline option `--dataflow_service_options`, append `use_nvidia_mps` to the `worker_accelerator` parameter.\n- Set the `count` to 1.\n- Don't use the pipeline option `--experiments=no_use_multiple_sdk_containers`.\n\nThe pipeline option `--dataflow_service_options` looks like the following: \n\n --dataflow_service_options=\"worker_accelerator=type:\u003cvar translate=\"no\"\u003eGPU_TYPE\u003c/var\u003e;count:1;install-nvidia-driver;use_nvidia_mps\"\n\nIf you use TensorFlow and enable MPS, do the following:\n\n1. [Enable dynamic memory allocation](https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth) on the GPU. Use either of the following TensorFlow options:\n - Turn on memory growth by calling `tf.config.experimental.set_memory_growth(gpu, True)`.\n - Set the environmental variable `TF_FORCE_GPU_ALLOW_GROWTH` to true.\n2. Use logical devices with appropriate memory limits.\n3. For optimal performance, enforce the use of the GPU when possible by using [soft device placement](https://www.tensorflow.org/api_docs/python/tf/config/set_soft_device_placement) or [manual placement](https://www.tensorflow.org/guide/gpu#manual_device_placement).\n\nWhat's next\n-----------\n\n- To review more best practices, see [GPUs and worker parallelism](/dataflow/docs/gpu/develop-with-gpus#parallelism)."]]