Dataflow에서 Gemma 개방형 모델 사용

Gemma는 Gemini 모델을 만드는 데 사용되는 연구와 기술로 빌드된 최첨단 경량 개방형 모델군입니다. Apache Beam 추론 파이프라인에서 Gemma 모델을 사용할 수 있습니다. 개방형 가중치라는 용어는 모델의 선행 학습된 매개변수 또는 가중치가 해제된다는 의미입니다. 원본 데이터 세트, 모델 아키텍처, 학습 코드와 같은 세부정보는 제공되지 않습니다.

사용 가능한 모델 목록과 각 모델에 대한 세부정보는 Gemma 모델 개요를 참조하세요.
모델 다운로드 및 사용 방법은 KerasNLP를 사용하여 Gemma 시작하기를 참조하세요.
모델을 다운로드하려면 Gemma 모델을 참조하세요.

사용 사례

감정 분석을 위해 Dataflow에서 Gemma 모델을 사용할 수 있습니다. Dataflow 및 Gemma 모델을 사용하면 고객 리뷰와 같은 이벤트가 도착하면 처리할 수 있습니다. 모델을 통해 리뷰를 분석한 다음 추천을 생성합니다. Gemma를 Apache Beam과 결합하면 이 워크플로를 원활하게 완료할 수 있습니다.

지원 및 제한 사항

Gemma 개방형 모델은 Apache Beam 및 Dataflow에서 지원되며 다음 요구사항이 적용됩니다.

Apache Beam Python SDK 버전 2.46.0 이상을 사용하는 일괄 및 스트리밍 파이프라인에 사용할 수 있습니다.
Dataflow 작업은 Runner v2를 사용해야 합니다.
Dataflow 작업은 GPU를 사용해야 합니다. Dataflow에서 지원되는 GPU 유형 목록은 가용성을 참조하세요. T4 및 L4 GPU 유형이 권장됩니다.
모델을 .keras 파일 형식으로 다운로드하여 저장해야 합니다.
TensorFlow 모델 핸들러가 권장되지만 필수는 아닙니다.

기본 요건

Kaggle을 통해 Gemma 모델에 액세스합니다.
동의 양식을 작성하고 이용약관에 동의합니다.
Gemma 모델을 다운로드합니다. Cloud Storage 버킷과 같이 Dataflow 작업이 액세스할 수 있는 위치에 .keras 파일 형식으로 저장합니다. 모델 경로 변수의 값을 지정할 때 이 스토리지 위치 경로를 사용합니다.
Dataflow에서 작업을 실행하려면 커스텀 컨테이너 이미지를 만듭니다. 이 단계에서는 Dataflow 서비스에서 GPU로 파이프라인을 실행할 수 있습니다.
- Docker 이미지 만들기가 포함된 전체 워크플로를 보려면 GitHub의 Gemma를 사용한 Dataflow 스트리밍에서 RunInference를 참조하세요.
- Docker 이미지 빌드에 대한 자세한 내용은 'GPU로 파이프라인 실행'에서 커스텀 컨테이너 이미지 빌드를 참조하세요.
- Docker를 사용하여 컨테이너를 Artifact Registry에 푸시하려면 'Dataflow용 커스텀 컨테이너 이미지 빌드'에서 이미지 빌드 및 푸시 섹션을 참조하세요.

파이프라인에서 Gemma 사용

Apache Beam 파이프라인에서 Gemma 모델을 사용하려면 다음 단계를 따르세요.

Apache Beam 코드에서 파이프라인 종속 항목을 가져온 후 저장된 모델의 경로를 포함합니다.
```
model_path = "MODEL_PATH"
```
MODEL_PATH를 다운로드한 모델을 저장한 경로로 바꿉니다. 예를 들어 모델을 Cloud Storage 버킷에 저장할 경우 경로 형식은 gs://STORAGE_PATH/FILENAME.keras입니다.

Gemma 모델의 Keras 구현에는 프롬프트를 기반으로 텍스트를 생성하는 generate() 메서드가 있습니다. 요소를 generate() 메서드에 전달하려면 커스텀 추론 함수를 사용합니다.

def gemma_inference_function(model, batch, inference_args, model_id):
  vectorized_batch = np.stack(batch, axis=0)
  # The only inference_arg expected here is a max_length parameter to
  # determine how many words are included in the output.
  predictions = model.generate(vectorized_batch, **inference_args)
  return utils._convert_to_result(batch, predictions, model_id)

학습된 모델의 경로를 지정하여 파이프라인을 실행합니다. 이 예시에서는 TensorFlow 모델 핸들러를 사용합니다.

class FormatOutput(beam.DoFn):
  def process(self, element, *args, **kwargs):
    yield "Input: {input}, Output: {output}".format(input=element.example, output=element.inference)

# Instantiate a NumPy array of string prompts for the model.
examples = np.array(["Tell me the sentiment of the phrase 'I like pizza': "])
# Specify the model handler, providing a path and the custom inference function.
model_handler = TFModelHandlerNumpy(model_path, inference_fn=gemma_inference_function)
with beam.Pipeline() as p:
  _ = (p | beam.Create(examples) # Create a PCollection of the prompts.
         | RunInference(model_handler, inference_args={'max_length': 32}) # Send the prompts to the model and get responses.
         | beam.ParDo(FormatOutput()) # Format the output.
         | beam.Map(print) # Print the formatted output.
  )