[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[],[],null,["# Run batch inference using GPUs on Cloud Run jobs\n\n| **Preview\n| --- GPU support for Cloud Run jobs**\n|\n|\n| This feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n| [launch stage descriptions](/products#product-launch-stages).\n\nYou can run batch inference with [Meta's Llama 3.2-1b LLM](https://huggingface.co/meta-llama/Llama-3.2-1B) and [vLLM](https://github.com/vllm-project/vllm) on a Cloud Run job, then write the results directly to Cloud Storage using Cloud Run volume mounts.\n\nSee a step-by-step instructional codelab at [How to run batch inference on Cloud Run jobs](https://codelabs.developers.google.com/codelabs/cloud-run/how-to-batch-inference-cloud-run-jobs)."]]