이 페이지는 Cloud Translation API를 통해 번역되었습니다.

텍스트 분류 모델에서 예측 가져오기

이 페이지에서는 Google Cloud 콘솔 또는 Vertex AI API를 사용하여 텍스트 분류 모델에서 온라인(실시간) 예측과 일괄 예측을 얻는 방법을 보여줍니다.

온라인 예측과 일괄 예측 간의 차이점

온라인 예측은 모델 엔드포인트에 수행되는 동기식 요청입니다. 애플리케이션 입력에 대한 응답으로 요청하거나 적시의 추론이 필요한 상황에서 요청하는 경우에는 온라인 예측을 사용합니다.

일괄 예측은 비동기식 요청입니다. 모델을 엔드포인트에 배포할 필요 없이 모델 리소스에서 직접 일괄 예측을 요청합니다. 텍스트 데이터의 경우 즉각적인 응답이 필요하지 않고 단일 요청을 사용하여 누적된 데이터를 처리하고 싶은 경우 일괄 예측을 사용하세요.

온라인 예측 수행

엔드포인트에 모델 배포

모델을 온라인 예측을 제공하는 데 사용하려면 먼저 모델을 엔드포인트에 배포해야 합니다. 모델을 배포하면 물리적 리소스가 모델과 연결되므로 짧은 지연 시간으로 온라인 예측을 제공할 수 있습니다.

엔드포인트 1개에 모델을 2개 이상 배포할 수 있고 2개 이상의 엔드포인트에 모델 1개를 배포할 수 있습니다. 모델 배포 옵션 및 사용 사례에 대한 자세한 내용은 모델 배포 정보를 참조하세요.

다음 방법 중 하나를 사용하여 모델을 배포합니다.

Google Cloud 콘솔

Google Cloud 콘솔의 Vertex AI 섹션에서 모델 페이지로 이동합니다.

모델 페이지로 이동
배포하려는 모델의 이름을 클릭하여 세부정보 페이지를 엽니다.
배포 및 테스트 탭을 선택합니다.

이미 엔드포인트에 배포된 모델은 모델 배포 섹션에 나열됩니다.
엔드포인트에 배포를 클릭합니다.
모델을 새 엔드포인트에 배포하려면 새 엔드포인트 만들기를 선택하고 새 엔드포인트의 이름을 지정합니다. 기존 엔드포인트에 모델을 배포하려면 기존 엔드포인트에 추가를 선택하고 드롭다운 목록에서 엔드포인트를 선택합니다.

엔드포인트 1개에 모델을 2개 이상 추가할 수 있고 2개 이상의 엔드포인트에 모델 1개를 추가할 수 있습니다. 자세히 알아보기
모델 하나 이상이 배포된 기존 엔드포인트에 모델을 배포하는 경우 비율을 모두 합하면 100%가 되도록 배포 중인 모델과 이미 배포된 모델의 트래픽 분할 비율을 업데이트해야 합니다.
AutoML Text를 선택하고 다음과 같이 구성합니다.
1. 모델을 새 엔드포인트에 배포하는 경우 트래픽 분할 값으로 100을 허용합니다. 아니면 모두 합하여 100이 되도록 엔드포인트의 모든 모델에 대한 트래픽 분할 값을 조정합니다.
2. 모델에서 완료를 클릭하고 모든 트래픽 분할 비율이 올바르면 계속을 클릭합니다.
  모델이 배포되는 리전이 표시됩니다. 이 리전은 모델을 만든 리전이어야 합니다.
3. 배포를 클릭하여 모델을 엔드포인트에 배포합니다.

API

Vertex AI API를 사용하여 모델을 배포하는 경우 다음 단계를 완료합니다.

필요한 경우 엔드포인트를 만듭니다.
엔드포인트 ID를 가져옵니다.
모델을 엔드포인트에 배포합니다.

엔드포인트 만들기

기존 엔드포인트에 모델을 배포하는 경우 이 단계를 건너뛸 수 있습니다.

gcloud

다음 예시에서는 gcloud ai endpoints create 명령어를 사용합니다.

gcloud ai endpoints create \
  --region=LOCATION \
  --display-name=ENDPOINT_NAME

다음을 바꿉니다.

LOCATION_ID: Vertex AI를 사용하는 리전
ENDPOINT_NAME: 엔드포인트의 표시 이름

Google Cloud CLI 도구가 엔드포인트를 만드는 데 몇 초 정도 걸릴 수 있습니다.

REST

요청 데이터를 사용하기 전에 다음을 바꿉니다.

LOCATION_ID: 리전
PROJECT_ID: 프로젝트 ID
ENDPOINT_NAME: 엔드포인트의 표시 이름

HTTP 메서드 및 URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints

JSON 요청 본문:

{
  "display_name": "ENDPOINT_NAME"
}

요청을 보내려면 다음 옵션 중 하나를 펼칩니다.

cURL(Linux, macOS, Cloud Shell)

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하거나 gcloud CLI에 자동으로 로그인하는 Cloud Shell을 사용하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints"

PowerShell(Windows)

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints" | Select-Object -Expand Content

다음과 비슷한 JSON 응답이 표시됩니다.

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateEndpointOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-11-05T17:45:42.812656Z",
      "updateTime": "2020-11-05T17:45:42.812656Z"
    }
  }
}

응답에 "done": true가 포함될 때까지 작업 상태를 폴링할 수 있습니다.

Java

이 샘플을 사용해 보기 전에 Vertex AI 빠른 시작: 클라이언트 라이브러리 사용의 Java 설정 안내를 따르세요. 자세한 내용은 Vertex AI Java API 참고 문서를 참조하세요.

Vertex AI에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.


import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.aiplatform.v1.CreateEndpointOperationMetadata;
import com.google.cloud.aiplatform.v1.Endpoint;
import com.google.cloud.aiplatform.v1.EndpointServiceClient;
import com.google.cloud.aiplatform.v1.EndpointServiceSettings;
import com.google.cloud.aiplatform.v1.LocationName;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class CreateEndpointSample {

  public static void main(String[] args)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "YOUR_PROJECT_ID";
    String endpointDisplayName = "YOUR_ENDPOINT_DISPLAY_NAME";
    createEndpointSample(project, endpointDisplayName);
  }

  static void createEndpointSample(String project, String endpointDisplayName)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    EndpointServiceSettings endpointServiceSettings =
        EndpointServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (EndpointServiceClient endpointServiceClient =
        EndpointServiceClient.create(endpointServiceSettings)) {
      String location = "us-central1";
      LocationName locationName = LocationName.of(project, location);
      Endpoint endpoint = Endpoint.newBuilder().setDisplayName(endpointDisplayName).build();

      OperationFuture<Endpoint, CreateEndpointOperationMetadata> endpointFuture =
          endpointServiceClient.createEndpointAsync(locationName, endpoint);
      System.out.format("Operation name: %s\n", endpointFuture.getInitialFuture().get().getName());
      System.out.println("Waiting for operation to finish...");
      Endpoint endpointResponse = endpointFuture.get(300, TimeUnit.SECONDS);

      System.out.println("Create Endpoint Response");
      System.out.format("Name: %s\n", endpointResponse.getName());
      System.out.format("Display Name: %s\n", endpointResponse.getDisplayName());
      System.out.format("Description: %s\n", endpointResponse.getDescription());
      System.out.format("Labels: %s\n", endpointResponse.getLabelsMap());
      System.out.format("Create Time: %s\n", endpointResponse.getCreateTime());
      System.out.format("Update Time: %s\n", endpointResponse.getUpdateTime());
    }
  }
}

Node.js

이 샘플을 사용해 보기 전에 Vertex AI 빠른 시작: 클라이언트 라이브러리 사용의 Node.js 설정 안내를 따르세요. 자세한 내용은 Vertex AI Node.js API 참고 문서를 참조하세요.

Vertex AI에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const endpointDisplayName = 'YOUR_ENDPOINT_DISPLAY_NAME';
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';

// Imports the Google Cloud Endpoint Service Client library
const {EndpointServiceClient} = require('@google-cloud/aiplatform');

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const endpointServiceClient = new EndpointServiceClient(clientOptions);

async function createEndpoint() {
  // Configure the parent resource
  const parent = `projects/${project}/locations/${location}`;
  const endpoint = {
    displayName: endpointDisplayName,
  };
  const request = {
    parent,
    endpoint,
  };

  // Get and print out a list of all the endpoints for this resource
  const [response] = await endpointServiceClient.createEndpoint(request);
  console.log(`Long running operation : ${response.name}`);

  // Wait for operation to complete
  await response.promise();
  const result = response.result;

  console.log('Create endpoint response');
  console.log(`\tName : ${result.name}`);
  console.log(`\tDisplay name : ${result.displayName}`);
  console.log(`\tDescription : ${result.description}`);
  console.log(`\tLabels : ${JSON.stringify(result.labels)}`);
  console.log(`\tCreate time : ${JSON.stringify(result.createTime)}`);
  console.log(`\tUpdate time : ${JSON.stringify(result.updateTime)}`);
}
createEndpoint();

Python용 Vertex AI SDK

Python용 Vertex AI SDK를 설치하거나 업데이트하는 방법은 Python용 Vertex AI SDK 설치를 참조하세요. 자세한 내용은 Python용 Vertex AI SDK API 참조 문서를 확인하세요.

def create_endpoint_sample(
    project: str,
    display_name: str,
    location: str,
):
    aiplatform.init(project=project, location=location)

    endpoint = aiplatform.Endpoint.create(
        display_name=display_name,
        project=project,
        location=location,
    )

    print(endpoint.display_name)
    print(endpoint.resource_name)
    return endpoint

엔드포인트 ID 검색

모델을 배포하려면 엔드포인트 ID가 필요합니다.

gcloud

다음 예시에서는 gcloud ai endpoints list 명령어를 사용합니다.

gcloud ai endpoints list \
  --region=LOCATION \
  --filter=display_name=ENDPOINT_NAME

다음을 바꿉니다.

LOCATION_ID: Vertex AI를 사용하는 리전
ENDPOINT_NAME: 엔드포인트의 표시 이름

ENDPOINT_ID 열에 표시되는 번호를 확인합니다. 다음 단계에서 이 ID를 사용합니다.

REST

요청 데이터를 사용하기 전에 다음을 바꿉니다.

LOCATION_ID: Vertex AI를 사용하는 리전
PROJECT_ID: 프로젝트 ID
ENDPOINT_NAME: 엔드포인트의 표시 이름

HTTP 메서드 및 URL:

GET https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME

요청을 보내려면 다음 옵션 중 하나를 펼칩니다.

cURL(Linux, macOS, Cloud Shell)

다음 명령어를 실행합니다.

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME"

PowerShell(Windows)

다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME" | Select-Object -Expand Content

다음과 비슷한 JSON 응답이 표시됩니다.

{
  "endpoints": [
    {
      "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID",
      "displayName": "ENDPOINT_NAME",
      "etag": "AMEw9yPz5pf4PwBHbRWOGh0PcAxUdjbdX2Jm3QO_amguy3DbZGP5Oi_YUKRywIE-BtLx",
      "createTime": "2020-04-17T18:31:11.585169Z",
      "updateTime": "2020-04-17T18:35:08.568959Z"
    }
  ]
}

ENDPOINT_ID를 확인합니다.

모델 배포

아래에서 언어 또는 환경에 대한 탭을 선택하세요.

gcloud

다음 예시에서는 gcloud ai endpoints deploy-model 명령어를 사용합니다.

다음 예시에서는 여러 DeployedModel 리소스 간에 트래픽을 분할하지 않고 Model을 Endpoint에 배포합니다.

아래의 명령어 데이터를 사용하기 전에 다음을 바꿉니다.

ENDPOINT_ID: 엔드포인트의 ID
LOCATION_ID: Vertex AI를 사용하는 리전
MODEL_ID: 배포할 모델의 ID
DEPLOYED_MODEL_NAME: DeployedModel의 이름. DeployedModel의 Model 표시 이름도 사용할 수 있습니다.
MIN_REPLICA_COUNT: 이 배포의 최소 노드 수. 예측 로드 시 필요에 따라 최대 노드 수까지 노드 수를 늘리거나 줄일 수 있으며 이 노드 수 미만으로는 줄일 수 없습니다.
MAX_REPLICA_COUNT: 이 배포의 최대 노드 수. 예측 로드 시 필요에 따라 이 노드 수까지 노드 수를 늘리거나 줄일 수 있으며 최소 노드 수 미만으로는 줄일 수 없습니다. --max-replica-count 플래그를 생략하면 최대 노드 수가 --min-replica-count 값으로 설정됩니다.

gcloud ai endpoints deploy-model 명령어를 실행합니다.

Linux, macOS 또는 Cloud Shell

gcloud ai endpoints deploy-model ENDPOINT_ID\
  --region=LOCATION_ID \
  --model=MODEL_ID \
  --display-name=DEPLOYED_MODEL_NAME \
  --traffic-split=0=100

Windows(PowerShell)

gcloud ai endpoints deploy-model ENDPOINT_ID`
  --region=LOCATION_ID `
  --model=MODEL_ID `
  --display-name=DEPLOYED_MODEL_NAME `
  --traffic-split=0=100

Windows(cmd.exe)

gcloud ai endpoints deploy-model ENDPOINT_ID^
  --region=LOCATION_ID ^
  --model=MODEL_ID ^
  --display-name=DEPLOYED_MODEL_NAME ^
  --traffic-split=0=100

트래픽 분할

앞의 예시에서 --traffic-split=0=100 플래그는 Endpoint가 수신하는 예측 트래픽의 100%를 새 DeployedModel로 전송하며 임시 ID는 0으로 표현됩니다. Endpoint에 이미 다른 DeployedModel 리소스가 있으면 새 DeployedModel 및 이전 모델 간에 트래픽을 분할할 수 있습니다. 예를 들어 트래픽의 20%를 새 DeployedModel로, 80%를 이전 모델로 전송하려면 다음 명령어를 실행합니다.

아래의 명령어 데이터를 사용하기 전에 다음을 바꿉니다.

OLD_DEPLOYED_MODEL_ID: 기존 DeployedModel의 ID

gcloud ai endpoints deploy-model 명령어를 실행합니다.

Linux, macOS 또는 Cloud Shell

gcloud ai endpoints deploy-model ENDPOINT_ID\
  --region=LOCATION_ID \
  --model=MODEL_ID \
  --display-name=DEPLOYED_MODEL_NAME \ 
  --min-replica-count=MIN_REPLICA_COUNT \
  --max-replica-count=MAX_REPLICA_COUNT \
  --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

Windows(PowerShell)

gcloud ai endpoints deploy-model ENDPOINT_ID`
  --region=LOCATION_ID `
  --model=MODEL_ID `
  --display-name=DEPLOYED_MODEL_NAME \ 
  --min-replica-count=MIN_REPLICA_COUNT `
  --max-replica-count=MAX_REPLICA_COUNT `
  --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

Windows(cmd.exe)

gcloud ai endpoints deploy-model ENDPOINT_ID^
  --region=LOCATION_ID ^
  --model=MODEL_ID ^
  --display-name=DEPLOYED_MODEL_NAME \ 
  --min-replica-count=MIN_REPLICA_COUNT ^
  --max-replica-count=MAX_REPLICA_COUNT ^
  --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

REST

모델 배포

요청 데이터를 사용하기 전에 다음을 바꿉니다.

LOCATION_ID: Vertex AI를 사용하는 리전
PROJECT_ID: 프로젝트 ID
ENDPOINT_ID: 엔드포인트의 ID
MODEL_ID: 배포할 모델의 ID
DEPLOYED_MODEL_NAME: DeployedModel의 이름. DeployedModel의 Model 표시 이름도 사용할 수 있습니다.
TRAFFIC_SPLIT_THIS_MODEL: 이 작업과 함께 배포되는 모델로 라우팅될 이 엔드포인트에 대한 예측 트래픽 비율입니다. 기본값은 100입니다. 모든 트래픽 비율의 합은 100이 되어야 합니다. 트래픽 분할에 대해 자세히 알아보기
DEPLOYED_MODEL_ID_N: (선택사항) 다른 모델이 이 엔드포인트에 배포된 경우 모든 비율의 합이 100이 되도록 트래픽 분할 비율을 업데이트해야 합니다.
TRAFFIC_SPLIT_MODEL_N: 배포된 모델 ID 키의 트래픽 분할 비율 값
PROJECT_NUMBER: 프로젝트의 자동으로 생성된 프로젝트 번호

HTTP 메서드 및 URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel

JSON 요청 본문:

{
  "deployedModel": {
    "model": "projects/PROJECT_ID/locations/us-central1/models/MODEL_ID",
    "displayName": "DEPLOYED_MODEL_NAME",
    "automaticResources": {
     }
  },
  "trafficSplit": {
    "0": TRAFFIC_SPLIT_THIS_MODEL,
    "DEPLOYED_MODEL_ID_1": TRAFFIC_SPLIT_MODEL_1,
    "DEPLOYED_MODEL_ID_2": TRAFFIC_SPLIT_MODEL_2
  },
}

요청을 보내려면 다음 옵션 중 하나를 펼칩니다.

cURL(Linux, macOS, Cloud Shell)

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel"

PowerShell(Windows)

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel" | Select-Object -Expand Content

다음과 비슷한 JSON 응답이 표시됩니다.

{
  "name": "projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployModelOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-10-19T17:53:16.502088Z",
      "updateTime": "2020-10-19T17:53:16.502088Z"
    }
  }
}

Java

Vertex AI에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.


import com.google.api.gax.longrunning.OperationFuture;
import com.google.api.gax.longrunning.OperationTimedPollAlgorithm;
import com.google.api.gax.retrying.RetrySettings;
import com.google.cloud.aiplatform.v1.AutomaticResources;
import com.google.cloud.aiplatform.v1.DedicatedResources;
import com.google.cloud.aiplatform.v1.DeployModelOperationMetadata;
import com.google.cloud.aiplatform.v1.DeployModelResponse;
import com.google.cloud.aiplatform.v1.DeployedModel;
import com.google.cloud.aiplatform.v1.EndpointName;
import com.google.cloud.aiplatform.v1.EndpointServiceClient;
import com.google.cloud.aiplatform.v1.EndpointServiceSettings;
import com.google.cloud.aiplatform.v1.MachineSpec;
import com.google.cloud.aiplatform.v1.ModelName;
import com.google.cloud.aiplatform.v1.stub.EndpointServiceStubSettings;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;
import org.threeten.bp.Duration;

public class DeployModelSample {

  public static void main(String[] args)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "YOUR_PROJECT_ID";
    String deployedModelDisplayName = "YOUR_DEPLOYED_MODEL_DISPLAY_NAME";
    String endpointId = "YOUR_ENDPOINT_NAME";
    String modelId = "YOUR_MODEL_ID";
    int timeout = 900;
    deployModelSample(project, deployedModelDisplayName, endpointId, modelId, timeout);
  }

  static void deployModelSample(
      String project,
      String deployedModelDisplayName,
      String endpointId,
      String modelId,
      int timeout)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {

    // Set long-running operations (LROs) timeout
    final OperationTimedPollAlgorithm operationTimedPollAlgorithm =
        OperationTimedPollAlgorithm.create(
            RetrySettings.newBuilder()
                .setInitialRetryDelay(Duration.ofMillis(5000L))
                .setRetryDelayMultiplier(1.5)
                .setMaxRetryDelay(Duration.ofMillis(45000L))
                .setInitialRpcTimeout(Duration.ZERO)
                .setRpcTimeoutMultiplier(1.0)
                .setMaxRpcTimeout(Duration.ZERO)
                .setTotalTimeout(Duration.ofSeconds(timeout))
                .build());

    EndpointServiceStubSettings.Builder endpointServiceStubSettingsBuilder =
        EndpointServiceStubSettings.newBuilder();
    endpointServiceStubSettingsBuilder
        .deployModelOperationSettings()
        .setPollingAlgorithm(operationTimedPollAlgorithm);
    EndpointServiceStubSettings endpointStubSettings = endpointServiceStubSettingsBuilder.build();
    EndpointServiceSettings endpointServiceSettings =
        EndpointServiceSettings.create(endpointStubSettings);
    endpointServiceSettings =
        endpointServiceSettings.toBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (EndpointServiceClient endpointServiceClient =
        EndpointServiceClient.create(endpointServiceSettings)) {
      String location = "us-central1";
      EndpointName endpointName = EndpointName.of(project, location, endpointId);
      // key '0' assigns traffic for the newly deployed model
      // Traffic percentage values must add up to 100
      // Leave dictionary empty if endpoint should not accept any traffic
      Map<String, Integer> trafficSplit = new HashMap<>();
      trafficSplit.put("0", 100);
      ModelName modelName = ModelName.of(project, location, modelId);
      AutomaticResources automaticResourcesInput =
          AutomaticResources.newBuilder().setMinReplicaCount(1).setMaxReplicaCount(1).build();
      DeployedModel deployedModelInput =
          DeployedModel.newBuilder()
              .setModel(modelName.toString())
              .setDisplayName(deployedModelDisplayName)
              .setAutomaticResources(automaticResourcesInput)
              .build();

      OperationFuture<DeployModelResponse, DeployModelOperationMetadata> deployModelResponseFuture =
          endpointServiceClient.deployModelAsync(endpointName, deployedModelInput, trafficSplit);
      System.out.format(
          "Operation name: %s\n", deployModelResponseFuture.getInitialFuture().get().getName());
      System.out.println("Waiting for operation to finish...");
      DeployModelResponse deployModelResponse = deployModelResponseFuture.get(20, TimeUnit.MINUTES);

      System.out.println("Deploy Model Response");
      DeployedModel deployedModel = deployModelResponse.getDeployedModel();
      System.out.println("\tDeployed Model");
      System.out.format("\t\tid: %s\n", deployedModel.getId());
      System.out.format("\t\tmodel: %s\n", deployedModel.getModel());
      System.out.format("\t\tDisplay Name: %s\n", deployedModel.getDisplayName());
      System.out.format("\t\tCreate Time: %s\n", deployedModel.getCreateTime());

      DedicatedResources dedicatedResources = deployedModel.getDedicatedResources();
      System.out.println("\t\tDedicated Resources");
      System.out.format("\t\t\tMin Replica Count: %s\n", dedicatedResources.getMinReplicaCount());

      MachineSpec machineSpec = dedicatedResources.getMachineSpec();
      System.out.println("\t\t\tMachine Spec");
      System.out.format("\t\t\t\tMachine Type: %s\n", machineSpec.getMachineType());
      System.out.format("\t\t\t\tAccelerator Type: %s\n", machineSpec.getAcceleratorType());
      System.out.format("\t\t\t\tAccelerator Count: %s\n", machineSpec.getAcceleratorCount());

      AutomaticResources automaticResources = deployedModel.getAutomaticResources();
      System.out.println("\t\tAutomatic Resources");
      System.out.format("\t\t\tMin Replica Count: %s\n", automaticResources.getMinReplicaCount());
      System.out.format("\t\t\tMax Replica Count: %s\n", automaticResources.getMaxReplicaCount());
    }
  }
}

Node.js

Vertex AI에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const modelId = "YOUR_MODEL_ID";
// const endpointId = 'YOUR_ENDPOINT_ID';
// const deployedModelDisplayName = 'YOUR_DEPLOYED_MODEL_DISPLAY_NAME';
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';

const modelName = `projects/${project}/locations/${location}/models/${modelId}`;
const endpoint = `projects/${project}/locations/${location}/endpoints/${endpointId}`;
// Imports the Google Cloud Endpoint Service Client library
const {EndpointServiceClient} = require('@google-cloud/aiplatform');

// Specifies the location of the api endpoint:
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const endpointServiceClient = new EndpointServiceClient(clientOptions);

async function deployModel() {
  // Configure the parent resource
  // key '0' assigns traffic for the newly deployed model
  // Traffic percentage values must add up to 100
  // Leave dictionary empty if endpoint should not accept any traffic
  const trafficSplit = {0: 100};
  const deployedModel = {
    // format: 'projects/{project}/locations/{location}/models/{model}'
    model: modelName,
    displayName: deployedModelDisplayName,
    automaticResources: {minReplicaCount: 1, maxReplicaCount: 1},
  };
  const request = {
    endpoint,
    deployedModel,
    trafficSplit,
  };

  // Get and print out a list of all the endpoints for this resource
  const [response] = await endpointServiceClient.deployModel(request);
  console.log(`Long running operation : ${response.name}`);

  // Wait for operation to complete
  await response.promise();
  const result = response.result;

  console.log('Deploy model response');
  const modelDeployed = result.deployedModel;
  console.log('\tDeployed model');
  if (!modelDeployed) {
    console.log('\t\tId : {}');
    console.log('\t\tModel : {}');
    console.log('\t\tDisplay name : {}');
    console.log('\t\tCreate time : {}');

    console.log('\t\tDedicated resources');
    console.log('\t\t\tMin replica count : {}');
    console.log('\t\t\tMachine spec {}');
    console.log('\t\t\t\tMachine type : {}');
    console.log('\t\t\t\tAccelerator type : {}');
    console.log('\t\t\t\tAccelerator count : {}');

    console.log('\t\tAutomatic resources');
    console.log('\t\t\tMin replica count : {}');
    console.log('\t\t\tMax replica count : {}');
  } else {
    console.log(`\t\tId : ${modelDeployed.id}`);
    console.log(`\t\tModel : ${modelDeployed.model}`);
    console.log(`\t\tDisplay name : ${modelDeployed.displayName}`);
    console.log(`\t\tCreate time : ${modelDeployed.createTime}`);

    const dedicatedResources = modelDeployed.dedicatedResources;
    console.log('\t\tDedicated resources');
    if (!dedicatedResources) {
      console.log('\t\t\tMin replica count : {}');
      console.log('\t\t\tMachine spec {}');
      console.log('\t\t\t\tMachine type : {}');
      console.log('\t\t\t\tAccelerator type : {}');
      console.log('\t\t\t\tAccelerator count : {}');
    } else {
      console.log(
        `\t\t\tMin replica count : \
          ${dedicatedResources.minReplicaCount}`
      );
      const machineSpec = dedicatedResources.machineSpec;
      console.log('\t\t\tMachine spec');
      console.log(`\t\t\t\tMachine type : ${machineSpec.machineType}`);
      console.log(
        `\t\t\t\tAccelerator type : ${machineSpec.acceleratorType}`
      );
      console.log(
        `\t\t\t\tAccelerator count : ${machineSpec.acceleratorCount}`
      );
    }

    const automaticResources = modelDeployed.automaticResources;
    console.log('\t\tAutomatic resources');
    if (!automaticResources) {
      console.log('\t\t\tMin replica count : {}');
      console.log('\t\t\tMax replica count : {}');
    } else {
      console.log(
        `\t\t\tMin replica count : \
          ${automaticResources.minReplicaCount}`
      );
      console.log(
        `\t\t\tMax replica count : \
          ${automaticResources.maxReplicaCount}`
      );
    }
  }
}
deployModel();

Python용 Vertex AI SDK

def deploy_model_with_automatic_resources_sample(
    project,
    location,
    model_name: str,
    endpoint: Optional[aiplatform.Endpoint] = None,
    deployed_model_display_name: Optional[str] = None,
    traffic_percentage: Optional[int] = 0,
    traffic_split: Optional[Dict[str, int]] = None,
    min_replica_count: int = 1,
    max_replica_count: int = 1,
    metadata: Optional[Sequence[Tuple[str, str]]] = (),
    sync: bool = True,
):
    """
    model_name: A fully-qualified model resource name or model ID.
          Example: "projects/123/locations/us-central1/models/456" or
          "456" when project and location are initialized or passed.
    """

    aiplatform.init(project=project, location=location)

    model = aiplatform.Model(model_name=model_name)

    model.deploy(
        endpoint=endpoint,
        deployed_model_display_name=deployed_model_display_name,
        traffic_percentage=traffic_percentage,
        traffic_split=traffic_split,
        min_replica_count=min_replica_count,
        max_replica_count=max_replica_count,
        metadata=metadata,
        sync=sync,
    )

    model.wait()

    print(model.display_name)
    print(model.resource_name)
    return model

작업 상태 가져오기

일부 요청은 완료하는 데 시간이 걸리는 장기 실행 작업을 시작합니다. 이러한 요청은 작업 상태를 보거나 작업을 취소하는 데 사용할 수 있는 작업 이름을 반환합니다. Vertex AI는 장기 실행 작업을 호출하는 도우미 메서드를 제공합니다. 자세한 내용은 장기 실행 작업 다루기를 참조하세요.

배포된 모델을 사용하여 온라인 예측 수행

온라인 예측을 수행하려면 분석을 위해 하나 이상의 테스트 항목을 모델에 제출하면 모델이 모델의 목표에 따른 결과를 반환합니다. 예측 결과에 대한 자세한 내용은 결과 해석 페이지를 참조하세요.

콘솔

Google Cloud 콘솔을 사용하여 온라인 예측을 요청합니다. 모델을 엔드포인트에 배포해야 합니다.

Google Cloud 콘솔의 Vertex AI 섹션에서 모델 페이지로 이동합니다.

모델 페이지로 이동
모델 목록에서 예측을 요청할 모델의 이름을 클릭합니다.
배포 및 테스트 탭을 선택합니다.
모델 테스트 섹션에서 테스트 항목을 추가하여 예측을 요청합니다.

텍스트 목표에 대한 AutoML 모델을 사용하려면 텍스트 필드에 콘텐츠를 입력하고 예측을 클릭해야 합니다.

로컬 특성 중요도에 대한 자세한 내용은 설명 보기를 참조하세요.

예측이 완료되면 Vertex AI가 콘솔에 결과를 반환합니다.

API

Vertex AI API를 사용하여 온라인 예측을 요청합니다. 모델을 엔드포인트에 배포해야 합니다.

gcloud

다음 콘텐츠로 request.json라는 파일을 만듭니다.
```
{
  "instances": [{
    "mimeType": "text/plain",
    "content": "CONTENT"
  }]
}
```
다음을 바꿉니다.
- CONTENT: 예측에 사용되는 텍스트 스니펫입니다.
다음 명령어를 실행합니다.
```
gcloud ai endpoints predict ENDPOINT_ID \
  --region=LOCATION_ID \
  --json-request=request.json
```
다음을 바꿉니다.
- ENDPOINT_ID: 엔드포인트의 ID
- LOCATION_ID: Vertex AI를 사용하는 리전

REST

요청 데이터를 사용하기 전에 다음을 바꿉니다.

LOCATION_ID: 엔드포인트가 있는 리전. 예를 들면 us-central1입니다.
PROJECT_ID: 프로젝트 ID
ENDPOINT_ID: 엔드포인트의 ID입니다.
CONTENT: 예측에 사용되는 텍스트 스니펫입니다.
DEPLOYED_MODEL_ID: 예측을 위해 사용된 배포된 모델의 ID입니다.

HTTP 메서드 및 URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict

JSON 요청 본문:

{
  "instances": [{
    "mimeType": "text/plain",
    "content": "CONTENT"
  }]
}

요청을 보내려면 다음 옵션 중 하나를 선택합니다.

curl

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict"

PowerShell

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict" | Select-Object -Expand Content

다음과 비슷한 JSON 응답이 표시됩니다.

{
  "predictions": [
    {
      "ids": [
        "1234567890123456789",
        "2234567890123456789",
        "3234567890123456789"
      ],
      "displayNames": [
        "GreatService",
        "Suggestion",
        "InfoRequest"
      ],
      "confidences": [
        0.8986392080783844,
        0.81984345316886902,
        0.7722353458404541
      ]
    }
  ],
  "deployedModelId": "0123456789012345678"
}

Java

Vertex AI에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

import com.google.cloud.aiplatform.util.ValueConverter;
import com.google.cloud.aiplatform.v1.EndpointName;
import com.google.cloud.aiplatform.v1.PredictResponse;
import com.google.cloud.aiplatform.v1.PredictionServiceClient;
import com.google.cloud.aiplatform.v1.PredictionServiceSettings;
import com.google.cloud.aiplatform.v1.schema.predict.instance.TextClassificationPredictionInstance;
import com.google.cloud.aiplatform.v1.schema.predict.prediction.ClassificationPredictionResult;
import com.google.protobuf.Value;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class PredictTextClassificationSingleLabelSample {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "YOUR_PROJECT_ID";
    String content = "YOUR_TEXT_CONTENT";
    String endpointId = "YOUR_ENDPOINT_ID";

    predictTextClassificationSingleLabel(project, content, endpointId);
  }

  static void predictTextClassificationSingleLabel(
      String project, String content, String endpointId) throws IOException {
    PredictionServiceSettings predictionServiceSettings =
        PredictionServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (PredictionServiceClient predictionServiceClient =
        PredictionServiceClient.create(predictionServiceSettings)) {
      String location = "us-central1";
      EndpointName endpointName = EndpointName.of(project, location, endpointId);

      TextClassificationPredictionInstance predictionInstance =
          TextClassificationPredictionInstance.newBuilder().setContent(content).build();

      List<Value> instances = new ArrayList<>();
      instances.add(ValueConverter.toValue(predictionInstance));

      PredictResponse predictResponse =
          predictionServiceClient.predict(endpointName, instances, ValueConverter.EMPTY_VALUE);
      System.out.println("Predict Text Classification Response");
      System.out.format("\tDeployed Model Id: %s\n", predictResponse.getDeployedModelId());

      System.out.println("Predictions:\n\n");
      for (Value prediction : predictResponse.getPredictionsList()) {

        ClassificationPredictionResult.Builder resultBuilder =
            ClassificationPredictionResult.newBuilder();

        // Display names and confidences values correspond to
        // IDs in the ID list.
        ClassificationPredictionResult result =
            (ClassificationPredictionResult) ValueConverter.fromValue(resultBuilder, prediction);
        int counter = 0;
        for (Long id : result.getIdsList()) {
          System.out.printf("Label ID: %d\n", id);
          System.out.printf("Label: %s\n", result.getDisplayNames(counter));
          System.out.printf("Confidence: %.4f\n", result.getConfidences(counter));
          counter++;
        }
      }
    }
  }
}

Node.js

Vertex AI에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const text = 'YOUR_PREDICTION_TEXT';
// const endpointId = 'YOUR_ENDPOINT_ID';
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';
const aiplatform = require('@google-cloud/aiplatform');
const {instance, prediction} =
  aiplatform.protos.google.cloud.aiplatform.v1.schema.predict;

// Imports the Google Cloud Model Service Client library
const {PredictionServiceClient} = aiplatform.v1;

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const predictionServiceClient = new PredictionServiceClient(clientOptions);

async function predictTextClassification() {
  // Configure the resources
  const endpoint = `projects/${project}/locations/${location}/endpoints/${endpointId}`;

  const predictionInstance =
    new instance.TextClassificationPredictionInstance({
      content: text,
    });
  const instanceValue = predictionInstance.toValue();

  const instances = [instanceValue];
  const request = {
    endpoint,
    instances,
  };

  const [response] = await predictionServiceClient.predict(request);
  console.log('Predict text classification response');
  console.log(`\tDeployed model id : ${response.deployedModelId}\n\n`);

  console.log('Prediction results:');

  for (const predictionResultValue of response.predictions) {
    const predictionResult =
      prediction.ClassificationPredictionResult.fromValue(
        predictionResultValue
      );

    for (const [i, label] of predictionResult.displayNames.entries()) {
      console.log(`\tDisplay name: ${label}`);
      console.log(`\tConfidences: ${predictionResult.confidences[i]}`);
      console.log(`\tIDs: ${predictionResult.ids[i]}\n\n`);
    }
  }
}
predictTextClassification();

Python용 Vertex AI SDK

def predict_text_classification_single_label_sample(
    project, location, endpoint, content
):
    aiplatform.init(project=project, location=location)

    endpoint = aiplatform.Endpoint(endpoint)

    response = endpoint.predict(instances=[{"content": content}], parameters={})

    for prediction_ in response.predictions:
        print(prediction_)

일괄 예측 가져오기

일괄 예측 요청을 수행하려면 Vertex AI가 예측 결과를 저장하는 입력 소스와 출력 형식을 지정합니다.

입력 데이터 요구사항

일괄 요청의 입력은 예측을 위해 모델에 보낼 항목을 지정합니다. 텍스트 분류 모델의 경우 JSON Lines 파일을 사용하여 예측을 수행할 문서 목록을 지정한 다음 JSON Lines 파일을 Cloud Storage 버킷에 저장할 수 있습니다. 다음 샘플에서는 입력 JSON Lines 파일의 단일 줄을 보여줍니다.

{"content": "gs://sourcebucket/datasets/texts/source_text.txt", "mimeType": "text/plain"}

일괄 예측 요청

일괄 예측 요청의 경우 Google Cloud 콘솔 또는 Vertex AI API를 사용할 수 있습니다. 제출한 입력 항목 수에 따라 일괄 예측 태스크를 완료하는 데 다소 시간이 걸릴 수 있습니다.

Google Cloud 콘솔

Google Cloud 콘솔을 사용하여 일괄 예측을 요청합니다.

Google Cloud 콘솔의 Vertex AI 섹션에서 일괄 예측 페이지로 이동합니다.

일괄 예측 페이지로 이동
만들기를 클릭하여 새 일괄 예측 창을 열고 다음 단계를 완료합니다.
1. 일괄 예측의 이름을 입력합니다.
2. 모델 이름에서 이 일괄 예측에 사용할 모델의 이름을 선택합니다.
3. 소스 경로에서 JSON Lines 입력 파일이 있는 Cloud Storage 위치를 지정합니다.
4. 대상 경로에서 일괄 예측 결과가 저장되는 Cloud Storage 위치를 지정합니다. 출력 형식은 모델의 목표에 따라 결정됩니다. 텍스트 목표의 AutoML 모델은 JSON Lines 파일을 출력합니다.

API

Vertex AI API를 사용하여 일괄 예측 요청을 전송합니다.

REST

요청 데이터를 사용하기 전에 다음을 바꿉니다.

LOCATION_IS: 모델이 저장되고 일괄 예측 작업이 실행되는 리전입니다. 예를 들면 us-central1입니다.
PROJECT_ID: 프로젝트 ID입니다.
BATCH_JOB_NAME: 일괄 작업의 표시 이름입니다.
MODEL_ID: 예측을 수행하는 데 사용할 모델의 ID
URI: 입력 JSON Lines 파일이 있는 Cloud Storage URI
BUCKET: Cloud Storage 버킷
PROJECT_NUMBER: 프로젝트의 자동으로 생성된 프로젝트 번호

HTTP 메서드 및 URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs

JSON 요청 본문:

{
    "displayName": "BATCH_JOB_NAME",
    "model": "projects/PROJECT_ID/locations/LOCATION_ID/models/MODEL_ID",
    "inputConfig": {
        "instancesFormat": "jsonl",
        "gcsSource": {
            "uris": ["URI"]
        }
    },
    "outputConfig": {
        "predictionsFormat": "jsonl",
        "gcsDestination": {
            "outputUriPrefix": "OUTPUT_BUCKET"
        }
    }
}

요청을 보내려면 다음 옵션 중 하나를 선택합니다.

curl

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs"

PowerShell

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs" | Select-Object -Expand Content

다음과 비슷한 JSON 응답이 표시됩니다.

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION/batchPredictionJobs/BATCH_JOB_ID",
  "displayName": "BATCH_JOB_NAME",
  "model": "projects/PROJECT_NUMBER/locations/LOCATION/models/MODEL_ID",
  "inputConfig": {
    "instancesFormat": "jsonl",
    "gcsSource": {
      "uris": [
        "CONTENT"
      ]
    }
  },
  "outputConfig": {
    "predictionsFormat": "jsonl",
    "gcsDestination": {
      "outputUriPrefix": "BUCKET"
    }
  },
  "state": "JOB_STATE_PENDING",
  "completionStats": {
    "incompleteCount": "-1"
  },
  "createTime": "2022-12-19T20:33:48.906074Z",
  "updateTime": "2022-12-19T20:33:48.906074Z",
  "modelVersionId": "1"
}

작업 state가 JOB_STATE_SUCCEEDED가 될 때까지 BATCH_JOB_ID를 사용하여 일괄 작업의 상태를 폴링할 수 있습니다.

Java

Vertex AI에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

import com.google.api.gax.rpc.ApiException;
import com.google.cloud.aiplatform.v1.BatchPredictionJob;
import com.google.cloud.aiplatform.v1.GcsDestination;
import com.google.cloud.aiplatform.v1.GcsSource;
import com.google.cloud.aiplatform.v1.JobServiceClient;
import com.google.cloud.aiplatform.v1.JobServiceSettings;
import com.google.cloud.aiplatform.v1.LocationName;
import com.google.cloud.aiplatform.v1.ModelName;
import java.io.IOException;

public class CreateBatchPredictionJobTextClassificationSample {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "PROJECT";
    String location = "us-central1";
    String displayName = "DISPLAY_NAME";
    String modelId = "MODEL_ID";
    String gcsSourceUri = "GCS_SOURCE_URI";
    String gcsDestinationOutputUriPrefix = "GCS_DESTINATION_OUTPUT_URI_PREFIX";
    createBatchPredictionJobTextClassificationSample(
        project, location, displayName, modelId, gcsSourceUri, gcsDestinationOutputUriPrefix);
  }

  static void createBatchPredictionJobTextClassificationSample(
      String project,
      String location,
      String displayName,
      String modelId,
      String gcsSourceUri,
      String gcsDestinationOutputUriPrefix)
      throws IOException {
    // The AI Platform services require regional API endpoints.
    JobServiceSettings settings =
        JobServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (JobServiceClient client = JobServiceClient.create(settings)) {
      try {
        String modelName = ModelName.of(project, location, modelId).toString();
        GcsSource gcsSource = GcsSource.newBuilder().addUris(gcsSourceUri).build();
        BatchPredictionJob.InputConfig inputConfig =
            BatchPredictionJob.InputConfig.newBuilder()
                .setInstancesFormat("jsonl")
                .setGcsSource(gcsSource)
                .build();
        GcsDestination gcsDestination =
            GcsDestination.newBuilder().setOutputUriPrefix(gcsDestinationOutputUriPrefix).build();
        BatchPredictionJob.OutputConfig outputConfig =
            BatchPredictionJob.OutputConfig.newBuilder()
                .setPredictionsFormat("jsonl")
                .setGcsDestination(gcsDestination)
                .build();
        BatchPredictionJob batchPredictionJob =
            BatchPredictionJob.newBuilder()
                .setDisplayName(displayName)
                .setModel(modelName)
                .setInputConfig(inputConfig)
                .setOutputConfig(outputConfig)
                .build();
        LocationName parent = LocationName.of(project, location);
        BatchPredictionJob response = client.createBatchPredictionJob(parent, batchPredictionJob);
        System.out.format("response: %s\n", response);
      } catch (ApiException ex) {
        System.out.format("Exception: %s\n", ex.getLocalizedMessage());
      }
    }
  }
}

Node.js

Vertex AI에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const batchPredictionDisplayName = 'YOUR_BATCH_PREDICTION_DISPLAY_NAME';
// const modelId = 'YOUR_MODEL_ID';
// const gcsSourceUri = 'YOUR_GCS_SOURCE_URI';
// const gcsDestinationOutputUriPrefix = 'YOUR_GCS_DEST_OUTPUT_URI_PREFIX';
//    eg. "gs://<your-gcs-bucket>/destination_path"
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';

// Imports the Google Cloud Job Service Client library
const {JobServiceClient} = require('@google-cloud/aiplatform').v1;

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const jobServiceClient = new JobServiceClient(clientOptions);

async function createBatchPredictionJobTextClassification() {
  // Configure the parent resource
  const parent = `projects/${project}/locations/${location}`;
  const modelName = `projects/${project}/locations/${location}/models/${modelId}`;

  const inputConfig = {
    instancesFormat: 'jsonl',
    gcsSource: {uris: [gcsSourceUri]},
  };
  const outputConfig = {
    predictionsFormat: 'jsonl',
    gcsDestination: {outputUriPrefix: gcsDestinationOutputUriPrefix},
  };
  const batchPredictionJob = {
    displayName: batchPredictionDisplayName,
    model: modelName,
    inputConfig,
    outputConfig,
  };
  const request = {
    parent,
    batchPredictionJob,
  };

  // Create batch prediction job request
  const [response] = await jobServiceClient.createBatchPredictionJob(request);

  console.log('Create batch prediction job text classification response');
  console.log(`Name : ${response.name}`);
  console.log('Raw response:');
  console.log(JSON.stringify(response, null, 2));
}
createBatchPredictionJobTextClassification();

Python용 Vertex AI SDK

def create_batch_prediction_job_sample(
    project: str,
    location: str,
    model_resource_name: str,
    job_display_name: str,
    gcs_source: Union[str, Sequence[str]],
    gcs_destination: str,
    sync: bool = True,
):
    aiplatform.init(project=project, location=location)

    my_model = aiplatform.Model(model_resource_name)

    batch_prediction_job = my_model.batch_predict(
        job_display_name=job_display_name,
        gcs_source=gcs_source,
        gcs_destination_prefix=gcs_destination,
        sync=sync,
    )

    batch_prediction_job.wait()

    print(batch_prediction_job.display_name)
    print(batch_prediction_job.resource_name)
    print(batch_prediction_job.state)
    return batch_prediction_job

일괄 예측 결과 검색

일괄 예측 작업이 완료되면 요청에 지정한 Cloud Storage 버킷에 예측 결과가 저장됩니다.

일괄 예측 결과 예시

다음은 텍스트 분류 모델의 일괄 예측 결과 예시입니다.

{
  "instance": {"content": "gs://bucket/text.txt", "mimeType": "text/plain"},
  "predictions": [
    {
      "ids": [
        "1234567890123456789",
        "2234567890123456789",
        "3234567890123456789"
      ],
      "displayNames": [
        "GreatService",
        "Suggestion",
        "InfoRequest"
      ],
      "confidences": [
        0.8986392080783844,
        0.81984345316886902,
        0.7722353458404541
      ]
    }
  ]
}

모델 평가

결과 해석