Vertex AI の最先端マルチモーダルモデルである Gemini 1.5 Pro をお試しになり、100 万トークンのコンテキストウィンドウで何を構築できるかご確認ください。Vertex AI の最先端マルチモーダルモデルである Gemini 1.5 Pro をお試しになり、100 万トークンのコンテキストウィンドウで何を構築できるかご確認ください。

ドキュメント内のテキストの検出: 境界

ドキュメント内で検出されたテキストを囲むボックスの境界を返します。

このコードサンプルを含む詳細なドキュメントについては、以下をご覧ください。

高密度ドキュメントのテキスト検出のチュートリアル

コードサンプル

Python

このサンプルを試す前に、Vision クイックスタート: クライアントライブラリの使用にある Python の設定手順を完了してください。詳細については、Vision Python API のリファレンスドキュメントをご覧ください。

Vision に対する認証を行うには、アプリケーションのデフォルト認証情報を設定します。詳細については、ローカル開発環境の認証を設定するをご覧ください。

def get_document_bounds(image_file, feature):
    """Finds the document bounds given an image and feature type.

    Args:
        image_file: path to the image file.
        feature: feature type to detect.

    Returns:
        List of coordinates for the corresponding feature type.
    """
    client = vision.ImageAnnotatorClient()

    bounds = []

    with open(image_file, "rb") as image_file:
        content = image_file.read()

    image = vision.Image(content=content)

    response = client.document_text_detection(image=image)
    document = response.full_text_annotation

    # Collect specified feature bounds by enumerating all document features
    for page in document.pages:
        for block in page.blocks:
            for paragraph in block.paragraphs:
                for word in paragraph.words:
                    for symbol in word.symbols:
                        if feature == FeatureType.SYMBOL:
                            bounds.append(symbol.bounding_box)

                    if feature == FeatureType.WORD:
                        bounds.append(word.bounding_box)

                if feature == FeatureType.PARA:
                    bounds.append(paragraph.bounding_box)

            if feature == FeatureType.BLOCK:
                bounds.append(block.bounding_box)

    # The list `bounds` contains the coordinates of the bounding boxes.
    return bounds

次のステップ

他の Google Cloud プロダクトに関連するコードサンプルの検索およびフィルタ検索を行うには、Google Cloud のサンプルをご覧ください。

ドキュメント内のテキストの検出: 境界

もっと見る

コードサンプル

Python

次のステップ