Optimum TPU와 함께 TPU를 GKE에서 사용하여 오픈소스 모델 제공

표준

이 튜토리얼에서는 Hugging Face의 Optimum TPU 제공 프레임워크와 함께 Tensor Processing Unit(TPU)을 Google Kubernetes Engine(GKE)에서 사용해 대규모 언어 모델(LLM) 오픈소스 모델을 제공하는 방법을 보여줍니다. 이 튜토리얼에서는 Hugging Face에서 오픈소스 모델을 다운로드하고 Optimum TPU를 실행하는 컨테이너를 사용하여 GKE Standard 클러스터에 모델을 배포합니다.

이 가이드는 AI/ML 워크로드를 배포하고 제공할 때 관리형 Kubernetes의 세밀한 제어, 확장성, 복원력, 이동성, 비용 효율성이 필요한 경우 좋은 출발점이 될 수 있습니다.

이 튜토리얼은 LLM 제공을 위해 Kubernetes 컨테이너 조정 기능을 사용하는 데 관심이 있는 Hugging Face 생태계의 생성형 AI 고객, GKE의 신규 또는 기존 사용자, ML 엔지니어, MLOps(DevOps) 엔지니어, 플랫폼 관리자를 대상으로 합니다.

참고로 JetStream, vLLM 및 기타 파트너 제품과 같은 제공 라이브러리를 통합할 수 있는 Vertex AI, GKE, Google Compute Engine 등의 Google Cloud 제품에서 LLM 추론을 수행할 수 있는 여러 가지 옵션이 있습니다. 예를 들어 JetStream을 사용하여 프로젝트의 최신 최적화를 가져올 수 있습니다. Hugging Face 옵션을 선호하는 경우 Optimum TPU를 사용할 수 있습니다.

Optimum TPU는 다음 기능을 지원합니다.

연속 일괄 처리
토큰 스트리밍
Transformer를 사용한 탐욕적 검색 및 다항 샘플링

목표

모델 특성에 따라 권장 TPU 토폴로지를 사용하여 GKE Standard 클러스터를 준비합니다.
GKE에 Optimum TPU를 배포합니다.
Optimum TPU를 사용하여 curl을 통해 지원되는 모델을 제공합니다.

시작하기 전에

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the required API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the required API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

Make sure that you have the following role or roles on the project: roles/container.admin, roles/iam.serviceAccountAdmin, roles/artifactregistry.admin
Check for the roles
1. In the Google Cloud console, go to the IAM page.
  Go to IAM
2. Select the project.
3. In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
4. For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.
Grant the roles
1. In the Google Cloud console, go to the IAM page.
  IAM으로 이동
2. 프로젝트를 선택합니다.
3. 액세스 권한 부여를 클릭합니다.
4. 새 주 구성원 필드에 사용자 식별자를 입력합니다. 일반적으로 Google 계정의 이메일 주소입니다.
5. 역할 선택 목록에서 역할을 선택합니다.
6. 역할을 추가로 부여하려면 다른 역할 추가를 클릭하고 각 역할을 추가합니다.
7. 저장을 클릭합니다.

Optimum TPU와 함께 TPU를 GKE에서 사용하여 오픈소스 모델 제공

목표

시작하기 전에

Check for the roles

Grant the roles

환경 준비

모델 액세스 권한 얻기

Gemma 2B

액세스 토큰 생성

Llama3 8B

액세스 토큰 생성

GKE 클러스터 만들기

TPU 노드 풀 만들기

클러스터와 통신하도록 kubectl 구성

컨테이너 빌드

이미지를 Artifact Registry로 푸시

Hugging Face 사용자 인증 정보용 Kubernetes 보안 비밀 만들기

Optimum TPU 배포

Gemma 2B

Llama3 8B

모델 제공

curl을 사용하여 모델 서버와 상호작용

삭제

배포된 리소스 삭제

다음 단계

Optimum TPU와 함께 TPU를 GKE에서 사용하여 오픈소스 모델 제공

목표

시작하기 전에

Check for the roles

Grant the roles

환경 준비

모델 액세스 권한 얻기

Gemma 2B

라이선스 동의 계약 서명

액세스 토큰 생성

Llama3 8B

액세스 토큰 생성

GKE 클러스터 만들기

TPU 노드 풀 만들기

클러스터와 통신하도록 kubectl 구성

컨테이너 빌드

이미지를 Artifact Registry로 푸시

Hugging Face 사용자 인증 정보용 Kubernetes 보안 비밀 만들기

Optimum TPU 배포

Gemma 2B

Llama3 8B

모델 제공

curl을 사용하여 모델 서버와 상호작용

삭제

배포된 리소스 삭제

다음 단계