本頁面由 Cloud Translation API 翻譯而成。

密集文件文字偵測教學課程

目標對象

本教學課程的目標是協助您使用 Google Cloud Vision API 文件文字偵測功能開發應用程式。本教學課程假設您熟悉基本程式設計結構和技術，但即使您是程式設計新手，也應該能夠輕鬆完成本教學課程並執行程式碼，然後使用 Cloud Vision API 參考說明文件建立基本應用程式。

必要條件

在 Google Cloud 控制台中設定 Cloud Vision API 專案。
設定環境，以便使用應用程式預設憑證。

Python

安裝 Python。
安裝 pip。
安裝 Google Cloud 用戶端程式庫和 Python Imaging Library。

使用 Document Text OCR 為圖片加上註解

本教學課程會逐步引導您操作基本的 Vision API 應用程式，該應用程式會發出 DOCUMENT_TEXT_DETECTION 要求，然後處理 fullTextAnnotation 回應。

請注意，標準 TEXT_DETECTION 和 DOCUMENT_TEXT_DETECTION 都會傳回 fullTextAnnotation，如下所述。不過，付費版 DOCUMENT_TEXT_DETECTION 功能沒有輸入字元限制。此外，如果 Cloud Vision 要求同時指定 TEXT_DETECTION 和 DOCUMENT_TEXT_DETECTION，系統會優先採用 DOCUMENT_TEXT_DETECTION。

fullTextAnnotation 是從圖片擷取的 UTF-8 文字結構化階層式回應，依「頁面」→「區塊」→「段落」→「字詞」→「符號」的順序排列：

Page 是區塊的集合，以及頁面的中繼資訊：大小、解析度 (X 解析度和 Y 解析度可能不同)。
Block 代表網頁的「邏輯」元素，例如文字涵蓋的區域，或是欄之間的圖片或分隔符。文字和表格區塊包含擷取文字所需的主要資訊。
Paragraph 是文字的結構單元，代表依序排列的字詞。根據預設，系統會將換行符視為字詞分隔符。
Word 是最小的文字單位。這個值會以符號陣列表示。
Symbol 代表字元或標點符號。

fullTextAnnotation 也可以提供網頁圖片的網址，這些圖片與要求中的圖片部分或完全相符。

系統會繼續支援先前的 textAnnotations OCR 輸出內容，並以 textAnnotations 的形式提供 JSON 回應。

完整程式碼清單

閱讀程式碼時，建議您一併參閱 Cloud Vision API Python 參考資料。

import argparse
from enum import Enum

from google.cloud import vision
from PIL import Image, ImageDraw



class FeatureType(Enum):
    PAGE = 1
    BLOCK = 2
    PARA = 3
    WORD = 4
    SYMBOL = 5


def draw_boxes(image, bounds, color):
    """Draws a border around the image using the hints in the vector list.

    Args:
        image: the input image object.
        bounds: list of coordinates for the boxes.
        color: the color of the box.

    Returns:
        An image with colored bounds added.
    """
    draw = ImageDraw.Draw(image)

    for bound in bounds:
        draw.polygon(
            [
                bound.vertices[0].x,
                bound.vertices[0].y,
                bound.vertices[1].x,
                bound.vertices[1].y,
                bound.vertices[2].x,
                bound.vertices[2].y,
                bound.vertices[3].x,
                bound.vertices[3].y,
            ],
            None,
            color,
        )
    return image


def get_document_bounds(image_file, feature):
    """Finds the document bounds given an image and feature type.

    Args:
        image_file: path to the image file.
        feature: feature type to detect.

    Returns:
        List of coordinates for the corresponding feature type.
    """
    client = vision.ImageAnnotatorClient()

    bounds = []

    with open(image_file, "rb") as image_file:
        content = image_file.read()

    image = vision.Image(content=content)

    response = client.document_text_detection(image=image)
    document = response.full_text_annotation

    # Collect specified feature bounds by enumerating all document features
    for page in document.pages:
        for block in page.blocks:
            for paragraph in block.paragraphs:
                for word in paragraph.words:
                    for symbol in word.symbols:
                        if feature == FeatureType.SYMBOL:
                            bounds.append(symbol.bounding_box)

                    if feature == FeatureType.WORD:
                        bounds.append(word.bounding_box)

                if feature == FeatureType.PARA:
                    bounds.append(paragraph.bounding_box)

            if feature == FeatureType.BLOCK:
                bounds.append(block.bounding_box)

    # The list `bounds` contains the coordinates of the bounding boxes.
    return bounds




def render_doc_text(filein, fileout):
    """Outlines document features (blocks, paragraphs and words) given an image.

    Args:
        filein: path to the input image.
        fileout: path to the output image.
    """
    image = Image.open(filein)
    bounds = get_document_bounds(filein, FeatureType.BLOCK)
    draw_boxes(image, bounds, "blue")
    bounds = get_document_bounds(filein, FeatureType.PARA)
    draw_boxes(image, bounds, "red")
    bounds = get_document_bounds(filein, FeatureType.WORD)
    draw_boxes(image, bounds, "yellow")

    if fileout != 0:
        image.save(fileout)
    else:
        image.show()


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("detect_file", help="The image for text detection.")
    parser.add_argument("-out_file", help="Optional output file", default=0)
    args = parser.parse_args()

    render_doc_text(args.detect_file, args.out_file)

此簡易應用程式會執行下列工作：

匯入執行應用程式時需要的程式庫
採用三個引數並傳遞至 main() 函式：
- image_file：要加上註解的輸入圖片檔案
- output_file：輸出檔案名稱，Cloud Vision 會在其中生成繪製多邊形方塊的輸出圖片
建立 ImageAnnotatorClient 例項，與服務互動
傳送要求並傳回回應
建立輸出圖片，並在文字周圍繪製方塊

深入瞭解程式碼

匯入程式庫

import argparse
from enum import Enum

from google.cloud import vision
from PIL import Image, ImageDraw

我們匯入標準程式庫：

argparse，以允許應用程式接受輸入檔案名稱做為引數
enum，適用於 FeatureType 列舉
io 適用於檔案 I/O

其他匯入項目：

google.cloud.vision 程式庫中的 ImageAnnotatorClient 類別，用於存取 Vision API。
google.cloud.vision 程式庫中的 types 模組，用於建構要求。
PIL 程式庫中的 Image 和 ImageDraw 程式庫用於建立輸出圖片，並在輸入圖片上繪製方塊。

執行應用程式

parser = argparse.ArgumentParser()
parser.add_argument("detect_file", help="The image for text detection.")
parser.add_argument("-out_file", help="Optional output file", default=0)
args = parser.parse_args()

render_doc_text(args.detect_file, args.out_file)

我們在此只剖析傳遞的引數，並將其傳遞至 render_doc_text() 函式。

向 API 進行驗證

與 Vision API 服務通訊之前，您必須使用先前取得的憑證驗證服務。要在應用程式中取得憑證，最簡單的方式是使用應用程式預設憑證 (ADC)。根據預設，Cloud 用戶端程式庫會嘗試從 GOOGLE_APPLICATION_CREDENTIALS 環境變數取得憑證，該變數應設為指向服務帳戶的 JSON 金鑰檔案 (詳情請參閱「設定服務帳戶」)。

提出 API 要求，並從回應中讀取文字界線

Vision API 服務準備就緒後，即可呼叫 ImageAnnotatorClient 執行個體的 document_text_detection 方法，存取這項服務。

用戶端程式庫會封裝 API 要求與回應的詳細資料。如要完整瞭解要求的結構，請參閱 Vision API 參考資料。

def get_document_bounds(image_file, feature):
    """Finds the document bounds given an image and feature type.

    Args:
        image_file: path to the image file.
        feature: feature type to detect.

    Returns:
        List of coordinates for the corresponding feature type.
    """
    client = vision.ImageAnnotatorClient()

    bounds = []

    with open(image_file, "rb") as image_file:
        content = image_file.read()

    image = vision.Image(content=content)

    response = client.document_text_detection(image=image)
    document = response.full_text_annotation

    # Collect specified feature bounds by enumerating all document features
    for page in document.pages:
        for block in page.blocks:
            for paragraph in block.paragraphs:
                for word in paragraph.words:
                    for symbol in word.symbols:
                        if feature == FeatureType.SYMBOL:
                            bounds.append(symbol.bounding_box)

                    if feature == FeatureType.WORD:
                        bounds.append(word.bounding_box)

                if feature == FeatureType.PARA:
                    bounds.append(paragraph.bounding_box)

            if feature == FeatureType.BLOCK:
                bounds.append(block.bounding_box)

    # The list `bounds` contains the coordinates of the bounding boxes.
    return bounds

用戶端程式庫處理要求後，我們的回應會包含 AnnotateImageResponse，其中包含圖片註解結果清單，要求中傳送的每張圖片都會對應一個結果。因為我們在要求中只傳送了一張圖片，所以會逐步完成完整的 TextAnnotation，並收集指定文件特徵的界線。

執行應用程式

如要執行應用程式，請下載這個 receipt.jpg 檔案 (可能需要按一下滑鼠右鍵)，然後將檔案在本機電腦上的下載位置傳遞至教學課程應用程式 (doctext.py)。

以下是 Python 指令，以及文字註解輸出圖片。

$ python doctext.py receipt.jpg -out_file out.jpg

下圖顯示黃色方塊中的字詞和紅色方塊中的句子。

恭喜！您已使用 Google Cloud Vision 完整文字註解執行文字偵測！