Vertex AI의 최신 멀티모달 모델인 Gemini 1.5 모델을 사용해 보고 100만 개의 토큰 컨텍스트 윈도우로 빌드할 수 있는 항목을 확인해 보세요. Vertex AI의 최신 멀티모달 모델인 Gemini 1.5 모델을 사용해 보고 100만 개의 토큰 컨텍스트 윈도우로 빌드할 수 있는 항목을 확인해 보세요.

자르기 힌트 튜토리얼

대상

이 가이드의 목표는 사용자가 Vision API 자르기 힌트 기능을 사용하여 애플리케이션을 개발하도록 돕는 것입니다. 프로그램의 구조와 기법에 대한 기초적인 지식을 보유한 사용자를 대상으로 하지만, 초보 프로그래머라도 어려움 없이 가이드를 따라 실행해 보고, Cloud Vision API 참조 문서를 활용하여 기본적인 애플리케이션을 만들 수 있도록 구성되어 있습니다.

이 가이드에서는 Vision API 애플리케이션을 단계별로 설명하면서 Vision API를 호출하여 자르기 힌트 기능을 사용하는 방법을 보여줍니다.

기본 요건

Google Cloud Console에서 Vision API 프로젝트를 설정합니다.
애플리케이션 기본 사용자 인증 정보를 사용할 환경을 설정합니다.

Python

개요

이 가이드에서는 Crop Hints 요청을 사용하는 기본 Vision API 애플리케이션에 대해 설명합니다. 처리할 이미지를 Cloud Storage URI(Cloud Storage 버킷 위치)를 통해 제공하거나 요청에 포함할 수도 있습니다. 성공하는 경우 Crop Hints 응답으로 이미지의 주요 객체나 얼굴을 감싸는 경계 상자의 좌표가 반환됩니다.

코드 목록

코드를 보다가 잘 이해되지 않는 부분은 Cloud Vision API Python 참조에서 확인하시기 바랍니다.

import argparse

from typing import MutableSequence

from google.cloud import vision
from PIL import Image, ImageDraw

def get_crop_hint(path: str) -> MutableSequence[vision.Vertex]:
    """Detect crop hints on a single image and return the first result.

    Args:
        path: path to the image file.

    Returns:
        The vertices for the bounding polygon.
    """
    client = vision.ImageAnnotatorClient()

    with open(path, "rb") as image_file:
        content = image_file.read()

    image = vision.Image(content=content)

    crop_hints_params = vision.CropHintsParams(aspect_ratios=[1.77])
    image_context = vision.ImageContext(crop_hints_params=crop_hints_params)

    response = client.crop_hints(image=image, image_context=image_context)
    hints = response.crop_hints_annotation.crop_hints

    # Get bounds for the first crop hint using an aspect ratio of 1.77.
    vertices = hints[0].bounding_poly.vertices

    return vertices

def draw_hint(image_file: str) -> None:
    """Draw a border around the image using the hints in the vector list.

    Args:
        image_file: path to the image file.
    """
    vects = get_crop_hint(image_file)

    im = Image.open(image_file)
    draw = ImageDraw.Draw(im)
    draw.polygon(
        [
            vects[0].x,
            vects[0].y,
            vects[1].x,
            vects[1].y,
            vects[2].x,
            vects[2].y,
            vects[3].x,
            vects[3].y,
        ],
        None,
        "red",
    )
    im.save("output-hint.jpg", "JPEG")
    print("Saved new image to output-hint.jpg")

def crop_to_hint(image_file: str) -> None:
    """Crop the image using the hints in the vector list.

    Args:
        image_file: path to the image file.
    """
    vects = get_crop_hint(image_file)

    im = Image.open(image_file)
    im2 = im.crop([vects[0].x, vects[0].y, vects[2].x - 1, vects[2].y - 1])
    im2.save("output-crop.jpg", "JPEG")
    print("Saved new image to output-crop.jpg")

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("image_file", help="The image you'd like to crop.")
    parser.add_argument("mode", help='Set to "crop" or "draw".')
    args = parser.parse_args()

    if args.mode == "crop":
        crop_to_hint(args.image_file)
    elif args.mode == "draw":
        draw_hint(args.image_file)

자세히 살펴보기

라이브러리 가져오기

import argparse

from typing import MutableSequence

from google.cloud import vision
from PIL import Image, ImageDraw

표준 라이브러리 가져오기:

argparse: 애플리케이션에서 입력 파일 이름을 인수로 사용
io: 파일 I/O에 사용

기타 가져오기:

google.cloud.vision 라이브러리의 ImageAnnotatorClient 클래스: Vision API에 액세스
google.cloud.vision 라이브러리의 types 모듈: 요청 생성
Python Imaging Library(PIL)의 Image 및 ImageDraw 모듈: 입력 이미지에 경계 상자 그리기

애플리케이션 실행

parser = argparse.ArgumentParser()
parser.add_argument("image_file", help="The image you'd like to crop.")
parser.add_argument("mode", help='Set to "crop" or "draw".')
args = parser.parse_args()

if args.mode == "crop":
    crop_to_hint(args.image_file)
elif args.mode == "draw":
    draw_hint(args.image_file)

여기에서는 간단하게 로컬 이미지 파일 이름을 지정하는 전달 인수를 파싱하고, 함수로 전달하여 이미지를 자르거나 힌트를 그립니다.

API 인증

Vision API 서비스와 통신하려면 우선 이전에 획득한 사용자 인증 정보를 사용하여 서비스를 인증해야 합니다. 애플리케이션 내에서 사용자 인증 정보를 얻는 가장 간단한 방법은 애플리케이션 기본 사용자 인증 정보(ADC)를 사용하는 것입니다. 기본적으로 클라이언트 라이브러리는 GOOGLE_APPLICATION_CREDENTIALS 환경 변수에서 사용자 인증 정보를 가져오려고 시도하며, 이 환경 변수는 서비스 계정의 JSON 키 파일을 가리키도록 설정되어야 합니다. 자세한 내용은 서비스 계정 설정을 참조하세요.

이미지의 자르기 힌트 주석 가져오기

이제 Vision 클라이언트 라이브러리가 인증되었으며 ImageAnnotatorClient 인스턴스의 crop_hints 메서드를 호출하여 서비스에 액세스할 수 있습니다. 출력의 가로세로 비율은 ImageContext 객체에 지정됩니다. 여러 개의 가로세로 비율을 전달하면 각 비율마다 하나씩 여러 개의 자르기 힌트가 반환됩니다.

"""Detect crop hints on a single image and return the first result.

Args:
    path: path to the image file.

Returns:
    The vertices for the bounding polygon.
"""
client = vision.ImageAnnotatorClient()

with open(path, "rb") as image_file:
    content = image_file.read()

image = vision.Image(content=content)

crop_hints_params = vision.CropHintsParams(aspect_ratios=[1.77])
image_context = vision.ImageContext(crop_hints_params=crop_hints_params)

response = client.crop_hints(image=image, image_context=image_context)
hints = response.crop_hints_annotation.crop_hints

# Get bounds for the first crop hint using an aspect ratio of 1.77.
vertices = hints[0].bounding_poly.vertices

클라이언트 라이브러리는 API 요청과 응답의 세부정보를 캡슐화합니다. 요청 구조에 대한 자세한 내용은 Vision API 참조를 확인하세요.

응답을 사용하여 힌트의 경계 상자 자르기 또는 그리기

작업이 정상적으로 완료되면 API 응답에 하나 이상의 cropHint의 경계 상자 좌표가 포함됩니다. draw_hint 메서드는 자르기 힌트 경계 상자 주위에 선을 그리고 이미지를 output-hint.jpg에 기록합니다.

vects = get_crop_hint(image_file)

im = Image.open(image_file)
draw = ImageDraw.Draw(im)
draw.polygon(
    [
        vects[0].x,
        vects[0].y,
        vects[1].x,
        vects[1].y,
        vects[2].x,
        vects[2].y,
        vects[3].x,
        vects[3].y,
    ],
    None,
    "red",
)
im.save("output-hint.jpg", "JPEG")
print("Saved new image to output-hint.jpg")

crop_to_hint 메서드는 제시된 자르기 힌트를 사용하여 이미지를 자릅니다.

vects = get_crop_hint(image_file)

im = Image.open(image_file)
im2 = im.crop([vects[0].x, vects[0].y, vects[2].x - 1, vects[2].y - 1])
im2.save("output-crop.jpg", "JPEG")
print("Saved new image to output-crop.jpg")

애플리케이션 실행

애플리케이션을 실행하려면 이 cat.jpg 파일을 다운로드한 다음(링크를 마우스 오른쪽 버튼으로 클릭) 로컬 머신에 파일을 다운로드한 위치를 가이드 애플리케이션(crop_hints.py)에 전달합니다.

다음은 Python 명령어 및 JSON cropHintsAnnotation 응답이 표시된 Console 출력입니다. 이 응답은 cropHints 경계 상자의 좌표를 포함합니다. 자르기 영역의 너비-높이 가로세로 비율을 1.77로 요청했으며 반환된 자르기 사각형의 좌상단 및 우하단의 x,y 좌표는 0,336,1100,967입니다.

python crop_hints.py cat.jpeg crop

{
 "responses": [
  {
   "cropHintsAnnotation": {
    "cropHints": [
     {
      "boundingPoly": {
       "vertices": [
        {
         "y": 336
        },
        {
         "x": 1100,
         "y": 336
        },
        {
         "x": 1100,
         "y": 967
        },
        {
         "y": 967
        }
       ]
      },
      "confidence": 0.79999995,
      "importanceFraction": 0.69
     }
    ]
   }
  }
 ]
}

자른 이미지는 다음과 같습니다.

수고하셨습니다. Cloud Vision Crop Hints API를 실행하여 이미지에서 감지한 주요 물체를 감싸는 최적화된 경계 상자의 좌표를 반환했습니다.