[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[],[],null,["# Run LLM inference on Cloud Run GPUs with vLLM\n\nThe following codelab shows how to run a backend service that runs [vLLM](https://github.com/vllm-project/vllm), which is an\ninference engine for production systems, along with Google's [Gemma 2](https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma/), which is\na 2 billion parameters instruction-tuned model.\n\nSee the entire codelab at [Run LLM inference on Cloud Run GPUs with vLLM](https://codelabs.developers.google.com/codelabs/how-to-run-inference-cloud-run-gpu-vllm#0)."]]