온라인 추론 및 설명 가져오기

이 페이지에서는 Google Cloud 콘솔 또는 Vertex AI API를 사용하여 테이블 형식 분류 또는 회귀 모델에서 온라인(실시간) 추론 및 설명을 가져오는 방법을 보여줍니다.

온라인 추론은 비동기식 요청인 일괄 추론과 달리 동기식 요청입니다. 애플리케이션 입력에 대한 응답으로 요청하거나 적시의 추론이 필요한 다른 상황에서 요청하는 경우에는 온라인 추론을 사용하세요.

모델을 사용하여 온라인 추론을 제공하려면 먼저 엔드포인트에 모델을 배포해야 합니다. 모델을 배포하면 물리적 리소스가 모델과 연결되므로 짧은 지연 시간으로 온라인 추론을 서빙할 수 있습니다.

여기서 다루는 주제는 다음과 같습니다.

엔드포인트에 모델 배포
배포된 모델을 사용한 온라인 추론 가져오기
배포된 모델을 사용한 온라인 설명 가져오기

시작하기 전에

온라인 추론을 수행하려면 먼저 분류 또는 회귀 모델을 학습시키고 정확성을 평가해야 합니다.

엔드포인트에 모델 배포

엔드포인트 1개에 모델을 2개 이상 배포할 수 있고 2개 이상의 엔드포인트에 모델 1개를 배포할 수 있습니다. 모델 배포 옵션 및 사용 사례에 대한 자세한 내용은 모델 배포 정보를 참조하세요.

다음 방법 중 하나를 사용하여 모델을 배포합니다.

Google Cloud 콘솔

Google Cloud 콘솔의 Vertex AI 섹션에서 모델 페이지로 이동합니다.

모델 페이지로 이동
배포하려는 모델의 이름을 클릭하여 세부정보 페이지를 엽니다.
배포 및 테스트 탭을 선택합니다.

이미 엔드포인트에 배포된 모델은 모델 배포 섹션에 나열됩니다.
엔드포인트에 배포를 클릭합니다.
엔드포인트 정의 페이지에서 다음과 같이 구성합니다.
1. 모델을 새 엔드포인트나 기존 엔드포인트에 배포할 수 있습니다.
  - 모델을 새 엔드포인트에 배포하려면 새 엔드포인트 만들기를 선택하고 새 엔드포인트의 이름을 지정합니다.
  - 모델을 기존 엔드포인트에 배포하려면 기존 엔드포인트에 추가를 선택하고 드롭다운 목록에서 엔드포인트를 선택합니다.
  - 엔드포인트 1개에 모델을 2개 이상 추가할 수 있고 2개 이상의 엔드포인트에 모델 1개를 추가할 수 있습니다. 자세히 알아보기
2. 계속을 클릭합니다.
모델 설정 페이지에서 다음과 같이 구성합니다.
1. 모델을 새 엔드포인트에 배포하는 경우 트래픽 분할 값으로 100을 허용합니다. 모델이 하나 이상 배포된 기존 엔드포인트에 모델을 배포하는 경우 배포 중인 모델과 이미 배포된 모델의 트래픽 분할 비율을 업데이트하여 모든 비율 합계가 100%가 되도록 해야 합니다.
2. 모델에 제공할 최소 컴퓨팅 노드 수를 입력합니다.
  
  이 숫자는 항상 이 모델에 사용할 수 있는 노드 수입니다. 추론 트래픽이 없어도 추론 로드 처리나 대기(최소) 노드에 사용된 노드에 대한 요금이 청구됩니다. 자세한 내용은 가격 책정 페이지를 참조하세요.
3. 머신 유형을 선택합니다.
  
  머신 리소스가 클수록 추론 성능이 향상되고 비용이 증가합니다.
4. 추론 로깅의 기본 설정을 변경하는 방법 알아보기
5. 계속을 클릭합니다.
모델 모니터링 페이지에서 계속을 클릭합니다.
모니터링 목표 페이지에서 다음과 같이 구성합니다.
1. 학습 데이터 위치를 입력합니다.
2. 대상 열의 이름을 입력합니다.
배포를 클릭하여 모델을 엔드포인트에 배포합니다.

API

Vertex AI API를 사용하여 모델을 배포하는 경우 다음 단계를 완료합니다.

필요한 경우 엔드포인트를 만듭니다.
엔드포인트 ID를 가져옵니다.
모델을 엔드포인트에 배포합니다.

엔드포인트 만들기

기존 엔드포인트에 모델을 배포하는 경우 이 단계를 건너뛸 수 있습니다.

gcloud

다음 예시에서는 gcloud ai endpoints create 명령어를 사용합니다.

  gcloud ai endpoints create \
    --region=LOCATION \
    --display-name=ENDPOINT_NAME

다음을 바꿉니다.

LOCATION_ID: Vertex AI를 사용하는 리전
ENDPOINT_NAME: 엔드포인트의 표시 이름

Google Cloud CLI 도구가 엔드포인트를 만드는 데 몇 초 정도 걸릴 수 있습니다.

REST

요청 데이터를 사용하기 전에 다음을 바꿉니다.

LOCATION_ID: 리전
PROJECT_ID: 프로젝트 ID
ENDPOINT_NAME: 엔드포인트의 표시 이름

HTTP 메서드 및 URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints

JSON 요청 본문:

{
  "display_name": "ENDPOINT_NAME"
}

요청을 보내려면 다음 옵션 중 하나를 펼칩니다.

cURL(Linux, macOS, Cloud Shell)

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하거나 gcloud CLI에 자동으로 로그인하는 Cloud Shell을 사용하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints"

PowerShell(Windows)

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints" | Select-Object -Expand Content

다음과 비슷한 JSON 응답이 표시됩니다.

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateEndpointOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-11-05T17:45:42.812656Z",
      "updateTime": "2020-11-05T17:45:42.812656Z"
    }
  }
}

응답에 "done": true가 포함될 때까지 작업 상태를 폴링할 수 있습니다.

Java

이 샘플을 사용해 보기 전에 Vertex AI 빠른 시작: 클라이언트 라이브러리 사용의 Java 설정 안내를 따르세요. 자세한 내용은 Vertex AI Java API 참고 문서를 참조하세요.

Vertex AI에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.


import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.aiplatform.v1.CreateEndpointOperationMetadata;
import com.google.cloud.aiplatform.v1.Endpoint;
import com.google.cloud.aiplatform.v1.EndpointServiceClient;
import com.google.cloud.aiplatform.v1.EndpointServiceSettings;
import com.google.cloud.aiplatform.v1.LocationName;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class CreateEndpointSample {

  public static void main(String[] args)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "YOUR_PROJECT_ID";
    String endpointDisplayName = "YOUR_ENDPOINT_DISPLAY_NAME";
    createEndpointSample(project, endpointDisplayName);
  }

  static void createEndpointSample(String project, String endpointDisplayName)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    EndpointServiceSettings endpointServiceSettings =
        EndpointServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (EndpointServiceClient endpointServiceClient =
        EndpointServiceClient.create(endpointServiceSettings)) {
      String location = "us-central1";
      LocationName locationName = LocationName.of(project, location);
      Endpoint endpoint = Endpoint.newBuilder().setDisplayName(endpointDisplayName).build();

      OperationFuture<Endpoint, CreateEndpointOperationMetadata> endpointFuture =
          endpointServiceClient.createEndpointAsync(locationName, endpoint);
      System.out.format("Operation name: %s\n", endpointFuture.getInitialFuture().get().getName());
      System.out.println("Waiting for operation to finish...");
      Endpoint endpointResponse = endpointFuture.get(300, TimeUnit.SECONDS);

      System.out.println("Create Endpoint Response");
      System.out.format("Name: %s\n", endpointResponse.getName());
      System.out.format("Display Name: %s\n", endpointResponse.getDisplayName());
      System.out.format("Description: %s\n", endpointResponse.getDescription());
      System.out.format("Labels: %s\n", endpointResponse.getLabelsMap());
      System.out.format("Create Time: %s\n", endpointResponse.getCreateTime());
      System.out.format("Update Time: %s\n", endpointResponse.getUpdateTime());
    }
  }
}

Node.js

이 샘플을 사용해 보기 전에 Vertex AI 빠른 시작: 클라이언트 라이브러리 사용의 Node.js 설정 안내를 따르세요. 자세한 내용은 Vertex AI Node.js API 참고 문서를 참조하세요.

Vertex AI에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const endpointDisplayName = 'YOUR_ENDPOINT_DISPLAY_NAME';
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';

// Imports the Google Cloud Endpoint Service Client library
const {EndpointServiceClient} = require('@google-cloud/aiplatform');

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const endpointServiceClient = new EndpointServiceClient(clientOptions);

async function createEndpoint() {
  // Configure the parent resource
  const parent = `projects/${project}/locations/${location}`;
  const endpoint = {
    displayName: endpointDisplayName,
  };
  const request = {
    parent,
    endpoint,
  };

  // Get and print out a list of all the endpoints for this resource
  const [response] = await endpointServiceClient.createEndpoint(request);
  console.log(`Long running operation : ${response.name}`);

  // Wait for operation to complete
  await response.promise();
  const result = response.result;

  console.log('Create endpoint response');
  console.log(`\tName : ${result.name}`);
  console.log(`\tDisplay name : ${result.displayName}`);
  console.log(`\tDescription : ${result.description}`);
  console.log(`\tLabels : ${JSON.stringify(result.labels)}`);
  console.log(`\tCreate time : ${JSON.stringify(result.createTime)}`);
  console.log(`\tUpdate time : ${JSON.stringify(result.updateTime)}`);
}
createEndpoint();

Python

Vertex AI SDK for Python을 설치하거나 업데이트하는 방법은 Vertex AI SDK for Python 설치를 참조하세요. 자세한 내용은 Python API 참고 문서를 참조하세요.

def create_endpoint_sample(
    project: str,
    display_name: str,
    location: str,
):
    aiplatform.init(project=project, location=location)

    endpoint = aiplatform.Endpoint.create(
        display_name=display_name,
        project=project,
        location=location,
    )

    print(endpoint.display_name)
    print(endpoint.resource_name)
    return endpoint

엔드포인트 ID 가져오기

모델을 배포하려면 엔드포인트 ID가 필요합니다.

gcloud

다음 예시에서는 gcloud ai endpoints list 명령어를 사용합니다.

  gcloud ai endpoints list \
    --region=LOCATION \
    --filter=display_name=ENDPOINT_NAME

다음을 바꿉니다.

LOCATION_ID: Vertex AI를 사용하는 리전
ENDPOINT_NAME: 엔드포인트의 표시 이름

ENDPOINT_ID 열에 표시되는 번호를 확인합니다. 다음 단계에서 이 ID를 사용합니다.

REST

요청 데이터를 사용하기 전에 다음을 바꿉니다.

LOCATION_ID: Vertex AI를 사용하는 리전
PROJECT_ID: .
ENDPOINT_NAME: 엔드포인트의 표시 이름

HTTP 메서드 및 URL:

GET https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME

요청을 보내려면 다음 옵션 중 하나를 펼칩니다.

cURL(Linux, macOS, Cloud Shell)

다음 명령어를 실행합니다.

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME"

PowerShell(Windows)

다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME" | Select-Object -Expand Content

다음과 비슷한 JSON 응답이 표시됩니다.

{
  "endpoints": [
    {
      "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID",
      "displayName": "ENDPOINT_NAME",
      "etag": "AMEw9yPz5pf4PwBHbRWOGh0PcAxUdjbdX2Jm3QO_amguy3DbZGP5Oi_YUKRywIE-BtLx",
      "createTime": "2020-04-17T18:31:11.585169Z",
      "updateTime": "2020-04-17T18:35:08.568959Z"
    }
  ]
}

ENDPOINT_ID를 확인합니다.

모델 배포

아래에서 언어 또는 환경에 대한 탭을 선택하세요.

gcloud

다음 예시에서는 gcloud ai endpoints deploy-model 명령어를 사용합니다.

다음 예시는 GPU를 사용하지 않고 Model을 Endpoint에 배포하여 여러 DeployedModel 리소스 간에 트래픽을 분할하지 않고 예측 서빙 속도를 높입니다.

아래의 명령어 데이터를 사용하기 전에 다음을 바꿉니다.

ENDPOINT_ID: 엔드포인트의 ID
LOCATION_ID: Vertex AI를 사용하는 리전
MODEL_ID: 배포할 모델의 ID
DEPLOYED_MODEL_NAME: DeployedModel의 이름. DeployedModel의 Model 표시 이름도 사용할 수 있습니다.
MACHINE_TYPE: (선택사항) 이 배포의 각 노드에 사용되는 머신 리소스. 기본 설정은 n1-standard-2입니다. 머신 유형에 대해 자세히 알아보세요.
MIN_REPLICA_COUNT: 이 배포의 최소 노드 수. 추론 로드 시 필요에 따라 노드 수를 최대 노드 수까지 늘리거나 이 노드 수까지 줄일 수 있습니다. 값은 1 이상이어야 합니다. --min-replica-count 플래그가 생략된 경우 기본값은 1입니다.
MAX_REPLICA_COUNT: 이 배포의 최대 노드 수. 추론 로드 시 필요에 따라 이 노드 수를 노드 수까지 늘리거나 최소 노드 수까지 줄일 수 있습니다. --max-replica-count 플래그를 생략하면 최대 노드 수가 --min-replica-count 값으로 설정됩니다.

gcloud ai endpoints deploy-model 명령어를 실행합니다.

Linux, macOS 또는 Cloud Shell

gcloud ai endpoints deploy-model ENDPOINT_ID\
  --region=LOCATION_ID \
  --model=MODEL_ID \
  --display-name=DEPLOYED_MODEL_NAME \
  --machine-type=MACHINE_TYPE \
  --min-replica-count=MIN_REPLICA_COUNT \
  --max-replica-count=MAX_REPLICA_COUNT \
  --traffic-split=0=100

Windows(PowerShell)

gcloud ai endpoints deploy-model ENDPOINT_ID`
  --region=LOCATION_ID `
  --model=MODEL_ID `
  --display-name=DEPLOYED_MODEL_NAME `
  --machine-type=MACHINE_TYPE `
  --min-replica-count=MIN_REPLICA_COUNT `
  --max-replica-count=MAX_REPLICA_COUNT `
  --traffic-split=0=100

Windows(cmd.exe)

gcloud ai endpoints deploy-model ENDPOINT_ID^
  --region=LOCATION_ID ^
  --model=MODEL_ID ^
  --display-name=DEPLOYED_MODEL_NAME ^
  --machine-type=MACHINE_TYPE ^
  --min-replica-count=MIN_REPLICA_COUNT ^
  --max-replica-count=MAX_REPLICA_COUNT ^
  --traffic-split=0=100

트래픽 분할

앞의 예시에서 --traffic-split=0=100 플래그는 Endpoint가 수신하는 예측 트래픽의 100%를 새 DeployedModel로 전송하며 임시 ID는 0으로 표현됩니다. Endpoint에 이미 다른 DeployedModel 리소스가 있으면 새 DeployedModel 및 이전 모델 간에 트래픽을 분할할 수 있습니다. 예를 들어 트래픽의 20%를 새 DeployedModel로, 80%를 이전 모델로 전송하려면 다음 명령어를 실행합니다.

아래의 명령어 데이터를 사용하기 전에 다음을 바꿉니다.

OLD_DEPLOYED_MODEL_ID: 기존 DeployedModel의 ID

gcloud ai endpoints deploy-model 명령어를 실행합니다.

Linux, macOS 또는 Cloud Shell

gcloud ai endpoints deploy-model ENDPOINT_ID\
  --region=LOCATION_ID \
  --model=MODEL_ID \
  --display-name=DEPLOYED_MODEL_NAME \ 
  --machine-type=MACHINE_TYPE \
  --min-replica-count=MIN_REPLICA_COUNT \
  --max-replica-count=MAX_REPLICA_COUNT \
  --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

Windows(PowerShell)

gcloud ai endpoints deploy-model ENDPOINT_ID`
  --region=LOCATION_ID `
  --model=MODEL_ID `
  --display-name=DEPLOYED_MODEL_NAME \ 
  --machine-type=MACHINE_TYPE `
  --min-replica-count=MIN_REPLICA_COUNT `
  --max-replica-count=MAX_REPLICA_COUNT `
  --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

Windows(cmd.exe)

gcloud ai endpoints deploy-model ENDPOINT_ID^
  --region=LOCATION_ID ^
  --model=MODEL_ID ^
  --display-name=DEPLOYED_MODEL_NAME \ 
  --machine-type=MACHINE_TYPE ^
  --min-replica-count=MIN_REPLICA_COUNT ^
  --max-replica-count=MAX_REPLICA_COUNT ^
  --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

REST

endpoints.predict 메서드를 사용하여 온라인 추론을 요청합니다.

모델을 배포합니다.

요청 데이터를 사용하기 전에 다음을 바꿉니다.

LOCATION_ID: Vertex AI를 사용하는 리전
PROJECT_ID: .
ENDPOINT_ID: 엔드포인트의 ID
MODEL_ID: 배포할 모델의 ID
DEPLOYED_MODEL_NAME: DeployedModel의 이름. DeployedModel의 Model 표시 이름도 사용할 수 있습니다.
MACHINE_TYPE: (선택사항) 이 배포의 각 노드에 사용되는 머신 리소스. 기본 설정은 n1-standard-2입니다. 머신 유형에 대해 자세히 알아보세요.
ACCELERATOR_TYPE: 머신에 연결할 가속기 유형. ACCELERATOR_COUNT가 지정되지 않았거나 0인 경우 선택사항입니다. GPU가 아닌 이미지를 사용하는 AutoML 모델 또는 커스텀 학습 모델에 사용하지 않는 것이 좋습니다. 자세히 알아보기
ACCELERATOR_COUNT: 사용할 각 복제본의 가속기 수. (선택사항) GPU가 아닌 이미지를 사용하는 AutoML 모델 또는 커스텀 학습 모델의 경우 0이거나 지정되지 않은 상태여야 합니다.
MIN_REPLICA_COUNT: 이 배포의 최소 노드 수. 추론 로드 시 필요에 따라 노드 수를 최대 노드 수까지 늘리거나 이 노드 수까지 줄일 수 있습니다. 값은 1 이상이어야 합니다.
MAX_REPLICA_COUNT: 이 배포의 최대 노드 수. 추론 로드 시 필요에 따라 이 노드 수를 노드 수까지 늘리거나 최소 노드 수까지 줄일 수 있습니다.
REQUIRED_REPLICA_COUNT: 선택사항. 이 배포가 성공으로 표시되기 위해 필요한 노드 수입니다. 1 이상이고 최소 노드 수 이하여야 합니다. 지정하지 않으면 기본값은 최소 노드 수입니다.
TRAFFIC_SPLIT_THIS_MODEL: 이 작업과 함께 배포되는 모델로 라우팅될 이 엔드포인트에 대한 예측 트래픽 비율입니다. 기본값은 100입니다. 모든 트래픽 비율의 합은 100이 되어야 합니다. 트래픽 분할에 대해 자세히 알아보기
DEPLOYED_MODEL_ID_N: 선택사항. 다른 모델이 이 엔드포인트에 배포된 경우 모든 비율의 합이 100이 되도록 트래픽 분할 비율을 업데이트해야 합니다.
TRAFFIC_SPLIT_MODEL_N: 배포된 모델 ID 키의 트래픽 분할 비율 값
PROJECT_NUMBER: 프로젝트의 자동으로 생성된 프로젝트 번호

HTTP 메서드 및 URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel

JSON 요청 본문:

{
  "deployedModel": {
    "model": "projects/PROJECT/locations/us-central1/models/MODEL_ID",
    "displayName": "DEPLOYED_MODEL_NAME",
    "dedicatedResources": {
       "machineSpec": {
         "machineType": "MACHINE_TYPE",
         "acceleratorType": "ACCELERATOR_TYPE",
         "acceleratorCount": "ACCELERATOR_COUNT"
       },
       "minReplicaCount": MIN_REPLICA_COUNT,
       "maxReplicaCount": MAX_REPLICA_COUNT,
       "requiredReplicaCount": REQUIRED_REPLICA_COUNT
     },
  },
  "trafficSplit": {
    "0": TRAFFIC_SPLIT_THIS_MODEL,
    "DEPLOYED_MODEL_ID_1": TRAFFIC_SPLIT_MODEL_1,
    "DEPLOYED_MODEL_ID_2": TRAFFIC_SPLIT_MODEL_2
  },
}

요청을 보내려면 다음 옵션 중 하나를 펼칩니다.

cURL(Linux, macOS, Cloud Shell)

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel"

PowerShell(Windows)

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel" | Select-Object -Expand Content

다음과 비슷한 JSON 응답이 표시됩니다.

{
  "name": "projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployModelOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-10-19T17:53:16.502088Z",
      "updateTime": "2020-10-19T17:53:16.502088Z"
    }
  }
}

Java

Vertex AI에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.aiplatform.v1.DedicatedResources;
import com.google.cloud.aiplatform.v1.DeployModelOperationMetadata;
import com.google.cloud.aiplatform.v1.DeployModelResponse;
import com.google.cloud.aiplatform.v1.DeployedModel;
import com.google.cloud.aiplatform.v1.EndpointName;
import com.google.cloud.aiplatform.v1.EndpointServiceClient;
import com.google.cloud.aiplatform.v1.EndpointServiceSettings;
import com.google.cloud.aiplatform.v1.MachineSpec;
import com.google.cloud.aiplatform.v1.ModelName;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.ExecutionException;

public class DeployModelCustomTrainedModelSample {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "PROJECT";
    String endpointId = "ENDPOINT_ID";
    String modelName = "MODEL_NAME";
    String deployedModelDisplayName = "DEPLOYED_MODEL_DISPLAY_NAME";
    deployModelCustomTrainedModelSample(project, endpointId, modelName, deployedModelDisplayName);
  }

  static void deployModelCustomTrainedModelSample(
      String project, String endpointId, String model, String deployedModelDisplayName)
      throws IOException, ExecutionException, InterruptedException {
    EndpointServiceSettings settings =
        EndpointServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();
    String location = "us-central1";

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (EndpointServiceClient client = EndpointServiceClient.create(settings)) {
      MachineSpec machineSpec = MachineSpec.newBuilder().setMachineType("n1-standard-2").build();
      DedicatedResources dedicatedResources =
          DedicatedResources.newBuilder().setMinReplicaCount(1).setMachineSpec(machineSpec).build();

      String modelName = ModelName.of(project, location, model).toString();
      DeployedModel deployedModel =
          DeployedModel.newBuilder()
              .setModel(modelName)
              .setDisplayName(deployedModelDisplayName)
              // `dedicated_resources` must be used for non-AutoML models
              .setDedicatedResources(dedicatedResources)
              .build();
      // key '0' assigns traffic for the newly deployed model
      // Traffic percentage values must add up to 100
      // Leave dictionary empty if endpoint should not accept any traffic
      Map<String, Integer> trafficSplit = new HashMap<>();
      trafficSplit.put("0", 100);
      EndpointName endpoint = EndpointName.of(project, location, endpointId);
      OperationFuture<DeployModelResponse, DeployModelOperationMetadata> response =
          client.deployModelAsync(endpoint, deployedModel, trafficSplit);

      // You can use OperationFuture.getInitialFuture to get a future representing the initial
      // response to the request, which contains information while the operation is in progress.
      System.out.format("Operation name: %s\n", response.getInitialFuture().get().getName());

      // OperationFuture.get() will block until the operation is finished.
      DeployModelResponse deployModelResponse = response.get();
      System.out.format("deployModelResponse: %s\n", deployModelResponse);
    }
  }
}

Python

Vertex AI SDK for Python을 설치하거나 업데이트하는 방법은 Vertex AI SDK for Python 설치를 참조하세요. 자세한 내용은 Python API 참고 문서를 참조하세요.

def deploy_model_with_dedicated_resources_sample(
    project,
    location,
    model_name: str,
    machine_type: str,
    endpoint: Optional[aiplatform.Endpoint] = None,
    deployed_model_display_name: Optional[str] = None,
    traffic_percentage: Optional[int] = 0,
    traffic_split: Optional[Dict[str, int]] = None,
    min_replica_count: int = 1,
    max_replica_count: int = 1,
    accelerator_type: Optional[str] = None,
    accelerator_count: Optional[int] = None,
    explanation_metadata: Optional[explain.ExplanationMetadata] = None,
    explanation_parameters: Optional[explain.ExplanationParameters] = None,
    metadata: Optional[Sequence[Tuple[str, str]]] = (),
    sync: bool = True,
):
    """
    model_name: A fully-qualified model resource name or model ID.
          Example: "projects/123/locations/us-central1/models/456" or
          "456" when project and location are initialized or passed.
    """

    aiplatform.init(project=project, location=location)

    model = aiplatform.Model(model_name=model_name)

    # The explanation_metadata and explanation_parameters should only be
    # provided for a custom trained model and not an AutoML model.
    model.deploy(
        endpoint=endpoint,
        deployed_model_display_name=deployed_model_display_name,
        traffic_percentage=traffic_percentage,
        traffic_split=traffic_split,
        machine_type=machine_type,
        min_replica_count=min_replica_count,
        max_replica_count=max_replica_count,
        accelerator_type=accelerator_type,
        accelerator_count=accelerator_count,
        explanation_metadata=explanation_metadata,
        explanation_parameters=explanation_parameters,
        metadata=metadata,
        sync=sync,
    )

    model.wait()

    print(model.display_name)
    print(model.resource_name)
    return model

Node.js

Vertex AI에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

const automl = require('@google-cloud/automl');
const client = new automl.v1beta1.AutoMlClient();

/**
 * Demonstrates using the AutoML client to create a model.
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = '[PROJECT_ID]' e.g., "my-gcloud-project";
// const computeRegion = '[REGION_NAME]' e.g., "us-central1";
// const datasetId = '[DATASET_ID]' e.g., "TBL2246891593778855936";
// const tableId = '[TABLE_ID]' e.g., "1991013247762825216";
// const columnId = '[COLUMN_ID]' e.g., "773141392279994368";
// const modelName = '[MODEL_NAME]' e.g., "testModel";
// const trainBudget = '[TRAIN_BUDGET]' e.g., "1000",
// `Train budget in milli node hours`;

// A resource that represents Google Cloud Platform location.
const projectLocation = client.locationPath(projectId, computeRegion);

// Get the full path of the column.
const columnSpecId = client.columnSpecPath(
  projectId,
  computeRegion,
  datasetId,
  tableId,
  columnId
);

// Set target column to train the model.
const targetColumnSpec = {name: columnSpecId};

// Set tables model metadata.
const tablesModelMetadata = {
  targetColumnSpec: targetColumnSpec,
  trainBudgetMilliNodeHours: trainBudget,
};

// Set datasetId, model name and model metadata for the dataset.
const myModel = {
  datasetId: datasetId,
  displayName: modelName,
  tablesModelMetadata: tablesModelMetadata,
};

// Create a model with the model metadata in the region.
client
  .createModel({parent: projectLocation, model: myModel})
  .then(responses => {
    const initialApiResponse = responses[1];
    console.log(`Training operation name: ${initialApiResponse.name}`);
    console.log('Training started...');
  })
  .catch(err => {
    console.error(err);
  });

추론 로깅의 기본 설정을 변경하는 방법 알아보기

작업 상태 가져오기

일부 요청은 완료하는 데 시간이 걸리는 장기 실행 작업을 시작합니다. 이러한 요청은 작업 상태를 보거나 작업을 취소하는 데 사용할 수 있는 작업 이름을 반환합니다. Vertex AI는 장기 실행 작업을 호출하는 도우미 메서드를 제공합니다. 자세한 내용은 장기 실행 작업 다루기를 참조하세요.

배포된 모델을 사용한 온라인 추론 가져오기

온라인 추론을 수행하려면 분석을 위해 하나 이상의 테스트 항목을 모델에 제출하면 모델이 모델의 목표에 따른 결과를 반환합니다. Google Cloud 콘솔 또는 Vertex AI API를 사용하여 온라인 추론을 요청합니다.

Google Cloud 콘솔

Google Cloud 콘솔의 Vertex AI 섹션에서 모델 페이지로 이동합니다.

모델 페이지로 이동
모델 목록에서 추론을 요청할 모델의 이름을 클릭합니다.
배포 및 테스트 탭을 선택합니다.
모델 테스트 섹션에서 테스트 항목을 추가하여 추론을 요청합니다. 기준 추론 데이터가 자동으로 입력되거나, 자체 추론 데이터를 입력하고 예측을 클릭하면 됩니다.

추론이 완료되면 Vertex AI가 콘솔에 결과를 반환합니다.

API: 분류

gcloud

다음 콘텐츠로 request.json라는 파일을 만듭니다.
```
      {
  "instances": [
    {
      PREDICTION_DATA_ROW
    }
  ]
}
    
```
다음을 바꿉니다.
- PREDICTION_DATA_ROW: 키는 특성 이름이고 값은 해당 특성 값인 JSON 객체. 예를 들어 숫자, 문자열 배열, 카테고리가 있는 데이터 세트의 경우 데이터 행은 다음 예시 요청과 유사합니다.
```
"length":3.6,
"material":"cotton",
"tag_array": ["abc","def"]
```
  학습에 포함된 모든 기능에 값을 제공해야 합니다. 예측에 사용되는 데이터의 형식은 학습에 사용되는 형식과 일치해야 합니다. 자세한 내용은 예측용 데이터 형식을 참조하세요.
다음 명령어를 실행합니다.
```
gcloud ai endpoints predict ENDPOINT_ID \
  --region=LOCATION_ID \
  --json-request=request.json
```
다음을 바꿉니다.
- ENDPOINT_ID: 엔드포인트의 ID
- LOCATION_ID: Vertex AI를 사용하는 리전

REST

endpoints.predict 메서드를 사용하여 온라인 추론을 요청합니다.

요청 데이터를 사용하기 전에 다음을 바꿉니다.

LOCATION_ID: 엔드포인트가 있는 리전. 예를 들면 us-central1입니다.
PROJECT_ID: 프로젝트 ID
ENDPOINT_ID: 엔드포인트의 ID
PREDICTION_DATA_ROW: 키는 특성 이름이고 값은 해당 특성 값인 JSON 객체. 예를 들어 숫자, 문자열 배열, 카테고리가 있는 데이터 세트의 경우 데이터 행은 다음 예시 요청과 유사합니다.
```
"length":3.6,
"material":"cotton",
"tag_array": ["abc","def"]
```
학습에 포함된 모든 기능에 값을 제공해야 합니다. 예측에 사용되는 데이터의 형식은 학습에 사용되는 형식과 일치해야 합니다. 자세한 내용은 예측용 데이터 형식을 참조하세요.
DEPLOYED_MODEL_ID: predict 메서드에 의해 출력되고 explain 메서드에 의해 입력으로 수락되는 값. 추론을 생성하는 데 사용되는 모델의 ID입니다. 이전에 요청한 추론에 대한 설명을 요청해야 하고 2개 이상의 모델을 배포했다면 이 ID를 사용하여 이전 추론을 제공한 동일한 모델에 대한 설명을 반환할 수 있습니다.

HTTP 메서드 및 URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict

JSON 요청 본문:

{
  "instances": [
    {
      PREDICTION_DATA_ROW
    }
  ]
}

요청을 보내려면 다음 옵션 중 하나를 선택합니다.

curl

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict"

PowerShell

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict" | Select-Object -Expand Content

다음과 비슷한 JSON 응답이 표시됩니다.

   {
     "predictions": [
      {
         "scores": [
           0.96771615743637085,
           0.032283786684274673
         ],
         "classes": [
           "0",
           "1"
         ]
      }
     ]
     "deployedModelId": "2429510197"
   }

Java

Vertex AI에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.


import com.google.cloud.aiplatform.util.ValueConverter;
import com.google.cloud.aiplatform.v1.EndpointName;
import com.google.cloud.aiplatform.v1.PredictResponse;
import com.google.cloud.aiplatform.v1.PredictionServiceClient;
import com.google.cloud.aiplatform.v1.PredictionServiceSettings;
import com.google.cloud.aiplatform.v1.schema.predict.prediction.TabularClassificationPredictionResult;
import com.google.protobuf.ListValue;
import com.google.protobuf.Value;
import com.google.protobuf.util.JsonFormat;
import java.io.IOException;
import java.util.List;

public class PredictTabularClassificationSample {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "YOUR_PROJECT_ID";
    String instance = "[{ “feature_column_a”: “value”, “feature_column_b”: “value”}]";
    String endpointId = "YOUR_ENDPOINT_ID";
    predictTabularClassification(instance, project, endpointId);
  }

  static void predictTabularClassification(String instance, String project, String endpointId)
      throws IOException {
    PredictionServiceSettings predictionServiceSettings =
        PredictionServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (PredictionServiceClient predictionServiceClient =
        PredictionServiceClient.create(predictionServiceSettings)) {
      String location = "us-central1";
      EndpointName endpointName = EndpointName.of(project, location, endpointId);

      ListValue.Builder listValue = ListValue.newBuilder();
      JsonFormat.parser().merge(instance, listValue);
      List<Value> instanceList = listValue.getValuesList();

      Value parameters = Value.newBuilder().setListValue(listValue).build();
      PredictResponse predictResponse =
          predictionServiceClient.predict(endpointName, instanceList, parameters);
      System.out.println("Predict Tabular Classification Response");
      System.out.format("\tDeployed Model Id: %s\n", predictResponse.getDeployedModelId());

      System.out.println("Predictions");
      for (Value prediction : predictResponse.getPredictionsList()) {
        TabularClassificationPredictionResult.Builder resultBuilder =
            TabularClassificationPredictionResult.newBuilder();
        TabularClassificationPredictionResult result =
            (TabularClassificationPredictionResult)
                ValueConverter.fromValue(resultBuilder, prediction);

        for (int i = 0; i < result.getClassesCount(); i++) {
          System.out.printf("\tClass: %s", result.getClasses(i));
          System.out.printf("\tScore: %f", result.getScores(i));
        }
      }
    }
  }
}

Node.js

Vertex AI에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const endpointId = 'YOUR_ENDPOINT_ID';
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';
const aiplatform = require('@google-cloud/aiplatform');
const {prediction} =
  aiplatform.protos.google.cloud.aiplatform.v1.schema.predict;

// Imports the Google Cloud Prediction service client
const {PredictionServiceClient} = aiplatform.v1;

// Import the helper module for converting arbitrary protobuf.Value objects.
const {helpers} = aiplatform;

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const predictionServiceClient = new PredictionServiceClient(clientOptions);

async function predictTablesClassification() {
  // Configure the endpoint resource
  const endpoint = `projects/${project}/locations/${location}/endpoints/${endpointId}`;
  const parameters = helpers.toValue({});

  const instance = helpers.toValue({
    petal_length: '1.4',
    petal_width: '1.3',
    sepal_length: '5.1',
    sepal_width: '2.8',
  });

  const instances = [instance];
  const request = {
    endpoint,
    instances,
    parameters,
  };

  // Predict request
  const [response] = await predictionServiceClient.predict(request);

  console.log('Predict tabular classification response');
  console.log(`\tDeployed model id : ${response.deployedModelId}\n`);
  const predictions = response.predictions;
  console.log('Predictions :');
  for (const predictionResultVal of predictions) {
    const predictionResultObj =
      prediction.TabularClassificationPredictionResult.fromValue(
        predictionResultVal
      );
    for (const [i, class_] of predictionResultObj.classes.entries()) {
      console.log(`\tClass: ${class_}`);
      console.log(`\tScore: ${predictionResultObj.scores[i]}\n\n`);
    }
  }
}
predictTablesClassification();

Python

Vertex AI SDK for Python을 설치하거나 업데이트하는 방법은 Vertex AI SDK for Python 설치를 참조하세요. 자세한 내용은 Python API 참고 문서를 참조하세요.

def predict_tabular_classification_sample(
    project: str,
    location: str,
    endpoint_name: str,
    instances: List[Dict],
):
    """
    Args
        project: Your project ID or project number.
        location: Region where Endpoint is located. For example, 'us-central1'.
        endpoint_name: A fully qualified endpoint name or endpoint ID. Example: "projects/123/locations/us-central1/endpoints/456" or
               "456" when project and location are initialized or passed.
        instances: A list of one or more instances (examples) to return a prediction for.
    """
    aiplatform.init(project=project, location=location)

    endpoint = aiplatform.Endpoint(endpoint_name)

    response = endpoint.predict(instances=instances)

    for prediction_ in response.predictions:
        print(prediction_)

API: 회귀

gcloud

다음 내용으로 `request.json`이라는 파일을 만듭니다.
```
      {
  "instances": [
    {
      PREDICTION_DATA_ROW
    }
  ]
}
    
```
다음을 바꿉니다.
- PREDICTION_DATA_ROW: 키는 특성 이름이고 값은 해당 특성 값인 JSON 객체. 예를 들어 숫자, 숫자 배열, 카테고리가 있는 데이터 세트의 경우 데이터 행은 다음 예시 요청과 유사합니다.
```
"age":3.6,
"sq_ft":5392,
"code": "90331"
```
  학습에 포함된 모든 기능에 값을 제공해야 합니다. 예측에 사용되는 데이터의 형식은 학습에 사용되는 형식과 일치해야 합니다. 자세한 내용은 예측용 데이터 형식을 참조하세요.
다음 명령어를 실행합니다.
```
gcloud ai endpoints predict ENDPOINT_ID \
  --region=LOCATION_ID \
  --json-request=request.json
```
다음을 바꿉니다.
- ENDPOINT_ID: 엔드포인트의 ID
- LOCATION_ID: Vertex AI를 사용하는 리전

REST

endpoints.predict 메서드를 사용하여 온라인 추론을 요청합니다.

요청 데이터를 사용하기 전에 다음을 바꿉니다.

LOCATION_ID: 엔드포인트가 있는 리전. 예를 들면 us-central1입니다.
PROJECT_ID: .
ENDPOINT_ID: 엔드포인트의 ID
PREDICTION_DATA_ROW: 키는 특성 이름이고 값은 해당 특성 값인 JSON 객체. 예를 들어 숫자, 숫자 배열, 카테고리가 있는 데이터 세트의 경우 데이터 행은 다음 예시 요청과 유사합니다.
```
"age":3.6,
"sq_ft":5392,
"code": "90331"
```
학습에 포함된 모든 기능에 값을 제공해야 합니다. 예측에 사용되는 데이터의 형식은 학습에 사용되는 형식과 일치해야 합니다. 자세한 내용은 예측용 데이터 형식을 참조하세요.
DEPLOYED_MODEL_ID: predict 메서드에 의해 출력되고 explain 메서드에 의해 입력으로 수락되는 값. 추론을 생성하는 데 사용되는 모델의 ID입니다. 이전에 요청한 추론에 대한 설명을 요청해야 하고 2개 이상의 모델을 배포했다면 이 ID를 사용하여 이전 추론을 제공한 동일한 모델에 대한 설명을 반환할 수 있습니다.

HTTP 메서드 및 URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict

JSON 요청 본문:

{
  "instances": [
    {
      PREDICTION_DATA_ROW
    }
  ]
}

요청을 보내려면 다음 옵션 중 하나를 선택합니다.

curl

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict"

PowerShell

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict" | Select-Object -Expand Content

다음과 비슷한 JSON 응답이 표시됩니다.


{
  "predictions": [
    [
      {
        "value": 65.14233,
        "lower_bound": 4.6572,
        "upper_bound": 164.0279
      }
    ]
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID"
}

Java

Vertex AI에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.


import com.google.cloud.aiplatform.util.ValueConverter;
import com.google.cloud.aiplatform.v1.EndpointName;
import com.google.cloud.aiplatform.v1.PredictResponse;
import com.google.cloud.aiplatform.v1.PredictionServiceClient;
import com.google.cloud.aiplatform.v1.PredictionServiceSettings;
import com.google.cloud.aiplatform.v1.schema.predict.prediction.TabularRegressionPredictionResult;
import com.google.protobuf.ListValue;
import com.google.protobuf.Value;
import com.google.protobuf.util.JsonFormat;
import java.io.IOException;
import java.util.List;

public class PredictTabularRegressionSample {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "YOUR_PROJECT_ID";
    String instance = "[{ “feature_column_a”: “value”, “feature_column_b”: “value”}]";
    String endpointId = "YOUR_ENDPOINT_ID";
    predictTabularRegression(instance, project, endpointId);
  }

  static void predictTabularRegression(String instance, String project, String endpointId)
      throws IOException {
    PredictionServiceSettings predictionServiceSettings =
        PredictionServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (PredictionServiceClient predictionServiceClient =
        PredictionServiceClient.create(predictionServiceSettings)) {
      String location = "us-central1";
      EndpointName endpointName = EndpointName.of(project, location, endpointId);

      ListValue.Builder listValue = ListValue.newBuilder();
      JsonFormat.parser().merge(instance, listValue);
      List<Value> instanceList = listValue.getValuesList();

      Value parameters = Value.newBuilder().setListValue(listValue).build();
      PredictResponse predictResponse =
          predictionServiceClient.predict(endpointName, instanceList, parameters);
      System.out.println("Predict Tabular Regression Response");
      System.out.format("\tDisplay Model Id: %s\n", predictResponse.getDeployedModelId());

      System.out.println("Predictions");
      for (Value prediction : predictResponse.getPredictionsList()) {
        TabularRegressionPredictionResult.Builder resultBuilder =
            TabularRegressionPredictionResult.newBuilder();

        TabularRegressionPredictionResult result =
            (TabularRegressionPredictionResult) ValueConverter.fromValue(resultBuilder, prediction);

        System.out.printf("\tUpper bound: %f\n", result.getUpperBound());
        System.out.printf("\tLower bound: %f\n", result.getLowerBound());
        System.out.printf("\tValue: %f\n", result.getValue());
      }
    }
  }
}

Node.js

Vertex AI에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const endpointId = 'YOUR_ENDPOINT_ID';
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';
const aiplatform = require('@google-cloud/aiplatform');
const {prediction} =
  aiplatform.protos.google.cloud.aiplatform.v1.schema.predict;

// Imports the Google Cloud Prediction service client
const {PredictionServiceClient} = aiplatform.v1;

// Import the helper module for converting arbitrary protobuf.Value objects.
const {helpers} = aiplatform;

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const predictionServiceClient = new PredictionServiceClient(clientOptions);

async function predictTablesRegression() {
  // Configure the endpoint resource
  const endpoint = `projects/${project}/locations/${location}/endpoints/${endpointId}`;
  const parameters = helpers.toValue({});

  // TODO (erschmid): Make this less painful
  const instance = helpers.toValue({
    BOOLEAN_2unique_NULLABLE: false,
    DATETIME_1unique_NULLABLE: '2019-01-01 00:00:00',
    DATE_1unique_NULLABLE: '2019-01-01',
    FLOAT_5000unique_NULLABLE: 1611,
    FLOAT_5000unique_REPEATED: [2320, 1192],
    INTEGER_5000unique_NULLABLE: '8',
    NUMERIC_5000unique_NULLABLE: 16,
    STRING_5000unique_NULLABLE: 'str-2',
    STRUCT_NULLABLE: {
      BOOLEAN_2unique_NULLABLE: false,
      DATE_1unique_NULLABLE: '2019-01-01',
      DATETIME_1unique_NULLABLE: '2019-01-01 00:00:00',
      FLOAT_5000unique_NULLABLE: 1308,
      FLOAT_5000unique_REPEATED: [2323, 1178],
      FLOAT_5000unique_REQUIRED: 3089,
      INTEGER_5000unique_NULLABLE: '1777',
      NUMERIC_5000unique_NULLABLE: 3323,
      TIME_1unique_NULLABLE: '23:59:59.999999',
      STRING_5000unique_NULLABLE: 'str-49',
      TIMESTAMP_1unique_NULLABLE: '1546387199999999',
    },
    TIMESTAMP_1unique_NULLABLE: '1546387199999999',
    TIME_1unique_NULLABLE: '23:59:59.999999',
  });

  const instances = [instance];
  const request = {
    endpoint,
    instances,
    parameters,
  };

  // Predict request
  const [response] = await predictionServiceClient.predict(request);

  console.log('Predict tabular regression response');
  console.log(`\tDeployed model id : ${response.deployedModelId}`);
  const predictions = response.predictions;
  console.log('\tPredictions :');
  for (const predictionResultVal of predictions) {
    const predictionResultObj =
      prediction.TabularRegressionPredictionResult.fromValue(
        predictionResultVal
      );
    console.log(`\tUpper bound: ${predictionResultObj.upper_bound}`);
    console.log(`\tLower bound: ${predictionResultObj.lower_bound}`);
    console.log(`\tLower bound: ${predictionResultObj.value}`);
  }
}
predictTablesRegression();

Python

Vertex AI SDK for Python을 설치하거나 업데이트하는 방법은 Vertex AI SDK for Python 설치를 참조하세요. 자세한 내용은 Python API 참고 문서를 참조하세요.

def predict_tabular_regression_sample(
    project: str,
    location: str,
    endpoint_name: str,
    instances: List[Dict],
):
    aiplatform.init(project=project, location=location)

    endpoint = aiplatform.Endpoint(endpoint_name)

    response = endpoint.predict(instances=instances)

    for prediction_ in response.predictions:
        print(prediction_)

예측 결과 해석

분류

분류 모델은 신뢰도 점수를 반환합니다.

신뢰도 점수는 모델이 각 클래스 또는 라벨을 테스트 항목과 얼마나 밀접하게 연관시키는지를 전달합니다. 숫자가 높을수록 해당 항목에 라벨이 적용되어야 하는 모델의 신뢰도가 높아집니다. 모델의 결과를 수락할 신뢰도 점수를 얼마나 높게 책정할지 결정합니다.

회귀

회귀 모델은 추론 값을 반환합니다. BigQuery 대상의 경우 추론 간격도 반환합니다. 추론 간격은 신뢰도가 95%인 모델이 실제 결과를 포함하는 값 범위를 제공합니다.

배포된 모델을 사용한 온라인 설명 가져오기

설명(특성 기여 분석이라고도 함)이 포함된 추론을 요청하여 모델이 추론에 어떻게 도착했는지 확인할 수 있습니다. 로컬 특성 중요도 값은 각 특성이 추론 결과에 얼마나 기여했는지 나타냅니다. 특성 기여 분석은 Vertex Explainable AI를 통한 Vertex AI 추론에 포함되어 있습니다.

콘솔

Google Cloud 콘솔을 사용하여 온라인 추론을 요청하면 로컬 특성 중요도 값이 자동으로 반환됩니다.

미리 채워진 예측 값을 사용한 경우 로컬 특성 중요도 값은 모두 0입니다. 미리 채워진 값은 기준 예측 데이터이므로 반환되는 예측이 기준 예측 값입니다.

gcloud

다음 콘텐츠로 request.json라는 파일을 만듭니다.
```
{
  "instances": [
    {
      PREDICTION_DATA_ROW
    }
  ]
}
```
다음을 바꿉니다.
- PREDICTION_DATA_ROW: 키는 특성 이름이고 값은 해당 특성 값인 JSON 객체. 예를 들어 숫자, 문자열 배열, 카테고리가 있는 데이터 세트의 경우 데이터 행은 다음 예시 요청과 유사합니다.
```
"length":3.6,
"material":"cotton",
"tag_array": ["abc","def"]
```
  학습에 포함된 모든 기능에 값을 제공해야 합니다. 예측에 사용되는 데이터의 형식은 학습에 사용되는 형식과 일치해야 합니다. 자세한 내용은 예측용 데이터 형식을 참조하세요.
다음 명령어를 실행합니다.
```
gcloud ai endpoints explain ENDPOINT_ID \
  --region=LOCATION_ID \
  --json-request=request.json
```
다음을 바꿉니다.
- ENDPOINT_ID: 엔드포인트의 ID
- LOCATION_ID: Vertex AI를 사용하는 리전
원하는 경우 Endpoint의 특정 DeployedModel에 설명 요청을 보내려면 --deployed-model-id 플래그를 지정할 수 있습니다.
```
gcloud ai endpoints explain ENDPOINT_ID \
  --region=LOCATION \
  --deployed-model-id=DEPLOYED_MODEL_ID \
  --json-request=request.json
```
앞에서 설명한 자리표시자 외에도 다음을 바꿉니다.
- DEPLOYED_MODEL_ID: (선택사항) 설명을 가져올 배포된 모델의 ID. ID는 predict 메서드의 응답에 포함됩니다. 특정 모델에 대한 설명을 요청해야 하며 동일 엔드포인트에 배포된 모델이 2개 이상 있는 경우, 이 ID를 사용하여 특정 모델에 대한 설명이 반환되도록 할 수 있습니다.

REST

다음 예시는 로컬 특성 기여 분석이 있는 테이블 형식 분류 모델에 대한 온라인 추론 요청을 보여줍니다. 요청 형식은 회귀 모델의 경우에도 동일합니다.

요청 데이터를 사용하기 전에 다음을 바꿉니다.

LOCATION: 엔드포인트가 있는 리전. 예를 들면 us-central1입니다.
PROJECT: .
ENDPOINT_ID: 엔드포인트의 ID
PREDICTION_DATA_ROW: 키는 특성 이름이고 값은 해당 특성 값인 JSON 객체. 예를 들어 숫자, 문자열 배열, 카테고리가 있는 데이터 세트의 경우 데이터 행은 다음 예시 요청과 유사합니다.
```
"length":3.6,
"material":"cotton",
"tag_array": ["abc","def"]
```
학습에 포함된 모든 기능에 값을 제공해야 합니다. 예측에 사용되는 데이터의 형식은 학습에 사용되는 형식과 일치해야 합니다. 자세한 내용은 예측용 데이터 형식을 참조하세요.
DEPLOYED_MODEL_ID: (선택사항) 설명을 가져올 배포된 모델의 ID. ID는 predict 메서드의 응답에 포함됩니다. 특정 모델에 대한 설명을 요청해야 하며 동일 엔드포인트에 배포된 모델이 2개 이상 있는 경우, 이 ID를 사용하여 특정 모델에 대한 설명이 반환되도록 할 수 있습니다.

HTTP 메서드 및 URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:explain

JSON 요청 본문:

{
  "instances": [
    {
      PREDICTION_DATA_ROW
    }
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID"
}

요청을 보내려면 다음 옵션 중 하나를 선택합니다.

curl

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:explain"

PowerShell

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:explain" | Select-Object -Expand Content

Python

Vertex AI SDK for Python을 설치하거나 업데이트하는 방법은 Vertex AI SDK for Python 설치를 참조하세요. 자세한 내용은 Python API 참고 문서를 참조하세요.

def explain_sample(project: str, location: str, endpoint_id: str, instance_dict: Dict):

    aiplatform.init(project=project, location=location)

    endpoint = aiplatform.Endpoint(endpoint_id)

    response = endpoint.explain(instances=[instance_dict], parameters={})

    for explanation in response.explanations:
        print(" explanation")
        # Feature attributions.
        attributions = explanation.attributions
        for attribution in attributions:
            print("  attribution")
            print("   baseline_output_value:", attribution.baseline_output_value)
            print("   instance_output_value:", attribution.instance_output_value)
            print("   output_display_name:", attribution.output_display_name)
            print("   approximation_error:", attribution.approximation_error)
            print("   output_name:", attribution.output_name)
            output_index = attribution.output_index
            for output_index in output_index:
                print("   output_index:", output_index)

    for prediction in response.predictions:
        print(prediction)

이전에 반환된 예측에 대한 설명 가져오기

설명은 리소스 사용량을 증가시키므로, 필요한 경우 상황에 대한 설명 요청을 예약하는 것이 좋습니다. 간혹 추론이 이상점이거나 합리적이지 않은 경우 이미 수신된 추론 결과에 대한 설명을 요청하는 것이 유용할 수 있습니다.

모든 추론이 동일한 모델에서 오는 경우 이번에는 요청된 설명과 함께 요청 데이터를 다시 전송하면 됩니다. 그러나 추론을 반환하는 모델이 여러 개 있는 경우에는 올바른 모델로 설명 요청을 보내야 합니다. 원래 추론 요청의 응답에 포함되기도 한, 요청의 배포된 모델 ID deployedModelID를 포함하여 특정 모델에 대한 설명을 확인할 수 있습니다. 배포된 모델 ID는 모델 ID와 다릅니다.

설명 결과 해석

로컬 특성 중요도를 계산하려면 먼저 기준 추론 점수를 계산합니다. 기준 값은 숫자 특성의 중앙 값과 범주형 특성의 최빈값을 사용해 학습 데이터에서 계산됩니다. 기준 값에서 생성된 추론은 기준 추론 점수가 됩니다. 기준 값은 모델에 대해 한 번 계산되며 변경되지 않습니다.

특정 추론에서 각 특성의 로컬 특성 중요도는 기준 추론 점수와 비교할 때 해당 특성이 결과를 얼마나 가감했는지 알려줍니다. 모든 특성 중요도 값의 합계는 기준 추론 점수와 추론 결과 간의 차이입니다.

분류 모델의 경우 점수는 항상 0.0과 1.0 사이(포함)입니다. 따라서 분류 모델의 로컬 특성 중요도 값은 항상 -1.0과 1.0(포함) 사이입니다.

특성 기여 분석 쿼리의 예시와 자세한 내용은 분류 및 회귀용 특성 기여 분석을 참조하세요.

추론 및 설명 출력 예시

분류

특성 중요도가 있는 테이블 형식 분류 모델의 온라인 추론 반환 페이로드는 다음 예시와 유사합니다.

0.928652400970459의 instanceOutputValue는 최고 점수 클래스의 신뢰도 점수입니다(이 경우 class_a). baselineOutputValue 필드에는 기준 추론 점수인 0.808652400970459가 포함됩니다. 이 결과에 가장 크게 기여한 특성은 feature_3입니다.

{
"predictions": [
  {
    "scores": [
      0.928652400970459,
      0.071347599029541
    ],
    "classes": [
      "class_a",
      "class_b"
    ]
  }
]
"explanations": [
  {
    "attributions": [
      {
        "baselineOutputValue": 0.808652400970459,
        "instanceOutputValue": 0.928652400970459,
        "approximationError":  0.0058915703929231,
        "featureAttributions": {
          "feature_1": 0.012394922231235,
          "feature_2": 0.050212341234556,
          "feature_3": 0.057392736534209,
        },
        "outputIndex": [
          0
        ],
        "outputName": "scores"
      }
    ],
  }
]
"deployedModelId": "234567"
}

회귀

특성 중요도가 있는 테이블 형식 회귀 모델의 온라인 추론 반환 페이로드는 이 JSON 예시와 유사합니다.

1795.1246466281819의 instanceOutputValue는 예측 값으로, lower_bound 및 upper_bound 필드가 95% 신뢰 구간을 제공합니다. baselineOutputValue 필드에는 기준 추론 점수인 1788.7423095703125가 포함됩니다. 이 결과에 가장 크게 기여한 특성은 feature_3입니다.

{
"predictions": [
  {
    "value": 1795.1246466281819,
    "lower_bound": 246.32196807861328,
    "upper_bound": 8677.51904296875
  }
]
"explanations": [
  {
    "attributions": [
      {
        "baselineOutputValue": 1788.7423095703125,
        "instanceOutputValue": 1795.1246466281819,
        "approximationError": 0.0038215703911553,
        "featureAttributions": {
          "feature_1": 0.123949222312359,
          "feature_2": 0.802123412345569,
          "feature_3": 5.456264423211472,
        },
        "outputIndex": [
          -1
        ]
      }
    ]
  }
],
"deployedModelId": "345678"
}

다음 단계

모델 내보내기 방법 알아보기