Migliorare le prestazioni su una GPU condivisa utilizzando NVIDIA MPS
Mantieni tutto organizzato con le raccolte
Salva e classifica i contenuti in base alle tue preferenze.
Se esegui più processi SDK su una GPU Dataflow condivisa, puoi migliorare l'efficienza e l'utilizzo della GPU attivando il servizio NVIDIA Multi-Process (MPS). MPS supporta l'elaborazione simultanea su una GPU consentendo ai processi di condividere contesti CUDA e risorse di pianificazione. L'MPS può ridurre i costi di commutazione del contesto, aumentare il parallelismo e ridurre i requisiti di archiviazione.
I flussi di lavoro target sono pipeline Python che vengono eseguite su worker con più di una vCPU.
MPS è una tecnologia NVIDIA che implementa l'API CUDA, una piattaforma NVIDIA che supporta il calcolo GPU generico. Per ulteriori informazioni, consulta la guida dell'utente di NVIDIA Multi-Process Service.
Vantaggi
Migliora l'elaborazione parallela e il throughput complessivo per le pipeline GPU,
soprattutto per i carichi di lavoro con un utilizzo ridotto delle risorse GPU.
Migliora l'utilizzo della GPU, il che potrebbe ridurre i costi.
Supporto e limitazioni
MPS è supportato solo sui worker Dataflow che utilizzano una singola GPU.
La pipeline non può utilizzare opzioni che limitano il parallelismo.
Evita di superare la memoria GPU disponibile, in particolare per i casi d'uso che richiedono il caricamento di modelli di machine learning di grandi dimensioni. Bilancia il numero di vCPU
e dei processi SDK con la memoria GPU disponibile di cui questi processi hanno bisogno.
MPS non influisce sulla concorrenza delle operazioni non GPU.
[[["Facile da capire","easyToUnderstand","thumb-up"],["Il problema è stato risolto","solvedMyProblem","thumb-up"],["Altra","otherUp","thumb-up"]],[["Difficile da capire","hardToUnderstand","thumb-down"],["Informazioni o codice di esempio errati","incorrectInformationOrSampleCode","thumb-down"],["Mancano le informazioni o gli esempi di cui ho bisogno","missingTheInformationSamplesINeed","thumb-down"],["Problema di traduzione","translationIssue","thumb-down"],["Altra","otherDown","thumb-down"]],["Ultimo aggiornamento 2025-09-04 UTC."],[[["\u003cp\u003eNVIDIA Multi-Process Service (MPS) improves GPU efficiency and utilization when running multiple SDK processes on a shared Dataflow GPU by enabling concurrent processing and resource sharing.\u003c/p\u003e\n"],["\u003cp\u003eEnabling MPS enhances parallel processing and throughput for GPU pipelines, particularly for workloads with low GPU resource usage, potentially reducing overall costs.\u003c/p\u003e\n"],["\u003cp\u003eMPS is supported on Dataflow workers with a single GPU and requires specific pipeline configurations, including appending \u003ccode\u003euse_nvidia_mps\u003c/code\u003e to the \u003ccode\u003eworker_accelerator\u003c/code\u003e parameter with a count of 1 and avoiding the \u003ccode\u003e--experiments=no_use_multiple_sdk_containers\u003c/code\u003e option.\u003c/p\u003e\n"],["\u003cp\u003eWhen using TensorFlow with MPS, you must enable dynamic memory allocation on the GPU and use logical devices with memory limits to optimize performance.\u003c/p\u003e\n"],["\u003cp\u003eMPS is not compatible with Dataflow Prime.\u003c/p\u003e\n"]]],[],null,["# Improve performance on a shared GPU by using NVIDIA MPS\n\nIf you run multiple SDK processes on a shared Dataflow GPU, you\ncan improve GPU efficiency and utilization by enabling the NVIDIA Multi-Process\nService (MPS). MPS supports concurrent processing on a GPU by enabling processes\nto share CUDA contexts and scheduling resources. MPS can reduce\ncontext-switching costs, increase parallelism, and reduce storage requirements.\n\nTarget workflows are Python pipelines that run on workers with more than one\nvCPU.\n\nMPS is an NVIDIA technology that implements the CUDA API, an NVIDIA platform\nthat supports general-purpose GPU computing. For more information, see the\n[NVIDIA Multi-Process Service user guide](https://docs.nvidia.com/deploy/mps/index.html).\n\nBenefits\n--------\n\n- Improves parallel processing and overall throughput for GPU pipelines, especially for workloads with low GPU resource usage.\n- Improves GPU utilization, which might reduce your costs.\n\nSupport and limitations\n-----------------------\n\n- MPS is supported only on Dataflow workers that use a single GPU.\n- The pipeline can't use pipeline options that restrict parallelism.\n- Avoid exceeding the available GPU memory, especially for use cases that involve loading large machine learning models. Balance the number of vCPUs and SDK processes with the available GPU memory that these processes need.\n- MPS doesn't affect the concurrency of non-GPU operations.\n- Dataflow Prime doesn't support MPS.\n\nEnable MPS\n----------\n\nWhen you [run a pipeline with GPUs](/dataflow/docs/gpu/use-gpus), enable MPS by\ndoing the following:\n\n- In the pipeline option `--dataflow_service_options`, append `use_nvidia_mps` to the `worker_accelerator` parameter.\n- Set the `count` to 1.\n- Don't use the pipeline option `--experiments=no_use_multiple_sdk_containers`.\n\nThe pipeline option `--dataflow_service_options` looks like the following: \n\n --dataflow_service_options=\"worker_accelerator=type:\u003cvar translate=\"no\"\u003eGPU_TYPE\u003c/var\u003e;count:1;install-nvidia-driver;use_nvidia_mps\"\n\nIf you use TensorFlow and enable MPS, do the following:\n\n1. [Enable dynamic memory allocation](https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth) on the GPU. Use either of the following TensorFlow options:\n - Turn on memory growth by calling `tf.config.experimental.set_memory_growth(gpu, True)`.\n - Set the environmental variable `TF_FORCE_GPU_ALLOW_GROWTH` to true.\n2. Use logical devices with appropriate memory limits.\n3. For optimal performance, enforce the use of the GPU when possible by using [soft device placement](https://www.tensorflow.org/api_docs/python/tf/config/set_soft_device_placement) or [manual placement](https://www.tensorflow.org/guide/gpu#manual_device_placement).\n\nWhat's next\n-----------\n\n- To review more best practices, see [GPUs and worker parallelism](/dataflow/docs/gpu/develop-with-gpus#parallelism)."]]