ドキュメント テキスト検出のサンプル

ドキュメント テキスト検出は、光学式文字認識を実行します。この機能は、画像内の高密度ドキュメント テキストを検出します。

ローカル画像内のドキュメント テキストの検出

プロトコル

詳しくは、images:annotate API エンドポイントをご覧ください。

ドキュメント テキスト検出を行うには、POST リクエストを作成し、適切なリクエスト本文を指定します。

POST https://vision.googleapis.com/v1/images:annotate?key=YOUR_API_KEY
{
  "requests": [
    {
      "image": {
        "content": "/9j/7QBEUGhvdG9zaG9...base64-encoded-image-content...fXNWzvDEeYxxxzj/Coa6Bax//Z"
      },
      "features": [
        {
          "type": "DOCUMENT_TEXT_DETECTION"
        }
      ]
    }
  ]
}

リクエスト本文の構成について詳しくは、AnnotateImageRequest のリファレンス ドキュメントをご覧ください。

リクエストが成功すると、サーバーは 200 OK HTTP ステータス コードと JSON 形式のレスポンスを返します。

{
  "responses": [
    {
      "textAnnotations": [
        {
          "locale": "en",
          "description": "O Google Cloud Platform\n",
          "boundingPoly": {
            "vertices": [
              {
                "x": 14, "y": 11
              },
              {
                "x": 279, "y": 11
              },
              {
                "x": 279, "y": 37
              },
              {
                "x": 14, "y": 37
              }
            ]
          }
        },
      ],
      "fullTextAnnotation": {
        "pages": [
          {
            "property": {
              "detectedLanguages": [
                {
                  "languageCode": "en"
                }
              ]
            },
            "width": 281,
            "height": 44,
            "blocks": [
              {
                "property": {
                  "detectedLanguages": [
                    {
                      "languageCode": "en"
                    }
                  ]
                },
                "boundingBox": {
                  "vertices": [
                    {
                      "x": 14, "y": 11
                    },
                    {
                      "x": 279, "y": 11
                    },
                    {
                      "x": 279, "y": 37
                    },
                    {
                      "x": 14, "y": 37
                    }
                  ]
                },
                "paragraphs": [
                  {
                    "property": {
                      "detectedLanguages": [
                        {
                          "languageCode": "en"
                        }
                      ]
                    },
                    "boundingBox": {
                      "vertices": [
                        {
                          "x": 14, "y": 11
                        },
                        {
                          "x": 279, "y": 11
                        },
                        {
                          "x": 279, "y": 37
                        },
                        {
                          "x": 14, "y": 37
                        }
                      ]
                    },
                    "words": [
                      {
                        "property": {
                          "detectedLanguages": [
                            {
                              "languageCode": "en"
                            }
                          ]
                        },
                        "boundingBox": {
                          "vertices": [
                            {
                              "x": 14, "y": 11
                            },
                            {
                              "x": 23, "y": 11
                            },
                            {
                              "x": 23, "y": 37
                            },
                            {
                              "x": 14, "y": 37
                            }
                          ]
                        },
                        "symbols": [
                          {
                            "property": {
                              "detectedLanguages": [
                                {
                                  "languageCode": "en"
                                }
                              ],
                              "detectedBreak": {
                                "type": "SPACE"
                              }
                            },
                            "boundingBox": {
                              "vertices": [
                                {
                                  "x": 14, "y": 11
                                },
                                {
                                  "x": 23, "y": 11
                                },
                                {
                                  "x": 23, "y": 37
                                },
                                {
                                  "x": 14, "y": 37
                                }
                              ]
                            },
                            "text": "O"
                          }
                        ]
                      },
                    ]
                  }
                ],
                "blockType": "TEXT"
              }
            ]
          }
        ],
        "text": "Google Cloud Platform\n"
      }
    }
  ]
}

C#

Vision API クライアントのインストールと作成の詳細については、Vision API クライアント ライブラリをご覧ください。

// Load an image from a local file.
var image = Image.FromFile(filePath);
var client = ImageAnnotatorClient.Create();
var response = client.DetectDocumentText(image);
foreach (var page in response.Pages)
{
    foreach (var block in page.Blocks)
    {
        foreach (var paragraph in block.Paragraphs)
        {
            Console.WriteLine(string.Join("\n", paragraph.Words));
        }
    }
}

Go

Vision API クライアントのインストールと作成の詳細については、Vision API クライアント ライブラリをご覧ください。

// detectDocumentText gets the full document text from the Vision API for an image at the given file path.
func detectDocumentText(w io.Writer, file string) error {
	ctx := context.Background()

	client, err := vision.NewImageAnnotatorClient(ctx)
	if err != nil {
		return err
	}

	f, err := os.Open(file)
	if err != nil {
		return err
	}
	defer f.Close()

	image, err := vision.NewImageFromReader(f)
	if err != nil {
		return err
	}
	annotation, err := client.DetectDocumentText(ctx, image, nil)
	if err != nil {
		return err
	}

	if annotation == nil {
		fmt.Fprintln(w, "No text found.")
	} else {
		fmt.Fprintln(w, "Document Text:")
		fmt.Fprintf(w, "%q\n", annotation.Text)

		fmt.Fprintln(w, "Pages:")
		for _, page := range annotation.Pages {
			fmt.Fprintf(w, "\tConfidence: %f, Width: %d, Height: %d\n", page.Confidence, page.Width, page.Height)
			fmt.Fprintln(w, "\tBlocks:")
			for _, block := range page.Blocks {
				fmt.Fprintf(w, "\t\tConfidence: %f, Block type: %v\n", block.Confidence, block.BlockType)
				fmt.Fprintln(w, "\t\tParagraphs:")
				for _, paragraph := range block.Paragraphs {
					fmt.Fprintf(w, "\t\t\tConfidence: %f", paragraph.Confidence)
					fmt.Fprintln(w, "\t\t\tWords:")
					for _, word := range paragraph.Words {
						symbols := make([]string, len(word.Symbols))
						for i, s := range word.Symbols {
							symbols[i] = s.Text
						}
						wordText := strings.Join(symbols, "")
						fmt.Fprintf(w, "\t\t\t\tConfidence: %f, Symbols: %s\n", word.Confidence, wordText)
					}
				}
			}
		}
	}

	return nil
}

Java

Vision API クライアントのインストールと作成の詳細については、Vision API クライアント ライブラリをご覧ください。

public static void detectDocumentText(String filePath, PrintStream out) throws Exception,
    IOException {
  List<AnnotateImageRequest> requests = new ArrayList<>();

  ByteString imgBytes = ByteString.readFrom(new FileInputStream(filePath));

  Image img = Image.newBuilder().setContent(imgBytes).build();
  Feature feat = Feature.newBuilder().setType(Type.DOCUMENT_TEXT_DETECTION).build();
  AnnotateImageRequest request =
      AnnotateImageRequest.newBuilder().addFeatures(feat).setImage(img).build();
  requests.add(request);

  try (ImageAnnotatorClient client = ImageAnnotatorClient.create()) {
    BatchAnnotateImagesResponse response = client.batchAnnotateImages(requests);
    List<AnnotateImageResponse> responses = response.getResponsesList();
    client.close();

    for (AnnotateImageResponse res : responses) {
      if (res.hasError()) {
        out.printf("Error: %s\n", res.getError().getMessage());
        return;
      }

      // For full list of available annotations, see http://g.co/cloud/vision/docs
      TextAnnotation annotation = res.getFullTextAnnotation();
      for (Page page: annotation.getPagesList()) {
        String pageText = "";
        for (Block block : page.getBlocksList()) {
          String blockText = "";
          for (Paragraph para : block.getParagraphsList()) {
            String paraText = "";
            for (Word word: para.getWordsList()) {
              String wordText = "";
              for (Symbol symbol: word.getSymbolsList()) {
                wordText = wordText + symbol.getText();
                out.format("Symbol text: %s (confidence: %f)\n", symbol.getText(),
                    symbol.getConfidence());
              }
              out.format("Word text: %s (confidence: %f)\n\n", wordText, word.getConfidence());
              paraText = String.format("%s %s", paraText, wordText);
            }
            // Output Example using Paragraph:
            out.println("\nParagraph: \n" + paraText);
            out.format("Paragraph Confidence: %f\n", para.getConfidence());
            blockText = blockText + paraText;
          }
          pageText = pageText + blockText;
        }
      }
      out.println("\nComplete annotation:");
      out.println(annotation.getText());
    }
  }
}

Node.js

Vision API クライアントのインストールと作成の詳細については、Vision API クライアント ライブラリをご覧ください。

// Imports the Google Cloud client library
const vision = require('@google-cloud/vision');

// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const fileName = 'Local image file, e.g. /path/to/image.png';

// Read a local image as a text document
client
  .documentTextDetection(fileName)
  .then(results => {
    const fullTextAnnotation = results[0].fullTextAnnotation;
    console.log(`Full text: ${fullTextAnnotation.text}`);

    fullTextAnnotation.pages.forEach(page => {
      page.blocks.forEach(block => {
        console.log(`Block confidence: ${block.confidence}`);
        block.paragraphs.forEach(paragraph => {
          console.log(`Paragraph confidence: ${paragraph.confidence}`);
          paragraph.words.forEach(word => {
            const wordText = word.symbols.map(s => s.text).join('');
            console.log(`Word text: ${wordText}`);
            console.log(`Word confidence: ${word.confidence}`);
            word.symbols.forEach(symbol => {
              console.log(`Symbol text: ${symbol.text}`);
              console.log(`Symbol confidence: ${symbol.confidence}`);
            });
          });
        });
      });
    });
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

PHP

Vision API クライアントのインストールと作成の詳細については、Vision API クライアント ライブラリをご覧ください。

namespace Google\Cloud\Samples\Vision;

use Google\Cloud\Vision\V1\ImageAnnotatorClient;

// $path = 'path/to/your/image.jpg'

function detect_document_text($path)
{
    $imageAnnotator = new ImageAnnotatorClient();

    # annotate the image
    $image = file_get_contents($path);
    $response = $imageAnnotator->documentTextDetection($image);
    $annotation = $response->getFullTextAnnotation();

    # print out detailed and structured information about document text
    if ($annotation) {
        foreach ($annotation->getPages() as $page) {
            foreach ($page->getBlocks() as $block) {
                $block_text = '';
                foreach ($block->getParagraphs() as $paragraph) {
                    foreach ($paragraph->getWords() as $word) {
                        foreach ($word->getSymbols() as $symbol) {
                            $block_text .= $symbol->getText();
                        }
                        $block_text .= ' ';
                    }
                    $block_text .= "\n";
                }
                printf('Block content: %s', $block_text);
                printf('Block confidence: %f' . PHP_EOL,
                    $block->getConfidence());

                # get bounds
                $vertices = $block->getBoundingBox()->getVertices();
                $bounds = [];
                foreach ($vertices as $vertex) {
                    $bounds[] = sprintf('(%d,%d)', $vertex->getX(),
                        $vertex->getY());
                }
                print('Bounds: ' . join(', ',$bounds) . PHP_EOL);
                print(PHP_EOL);
            }
        }
    } else {
        print('No text found' . PHP_EOL);
    }
}

Python

Vision API クライアントのインストールと作成の詳細については、Vision API クライアント ライブラリをご覧ください。

def detect_document(path):
    """Detects document features in an image."""
    client = vision.ImageAnnotatorClient()

    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.types.Image(content=content)

    response = client.document_text_detection(image=image)

    for page in response.full_text_annotation.pages:
        for block in page.blocks:
            print('\nBlock confidence: {}\n'.format(block.confidence))

            for paragraph in block.paragraphs:
                print('Paragraph confidence: {}'.format(
                    paragraph.confidence))

                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    print('Word text: {} (confidence: {})'.format(
                        word_text, word.confidence))

                    for symbol in word.symbols:
                        print('\tSymbol: {} (confidence: {})'.format(
                            symbol.text, symbol.confidence))

Ruby

Vision API クライアントのインストールと作成の詳細については、Vision API クライアント ライブラリをご覧ください。

# project_id = "Your Google Cloud project ID"
# image_path = "Path to local image file, eg. './image.png'"

require "google/cloud/vision"

vision = Google::Cloud::Vision.new project: project_id
image  = vision.image image_path

document = image.document

puts document.text

リモート画像内でのドキュメント テキストの検出

Vision API は、Google Cloud Storage またはウェブに存在する画像ファイルに対して直接ドキュメント テキスト検出を実行できるようになっています。その画像ファイルの内容をリクエストの本文に入れて送信する必要はありません。

プロトコル

詳しくは、images:annotate API エンドポイントをご覧ください。

ドキュメント テキスト検出を行うには、POST リクエストを作成し、適切なリクエスト本文を指定します。

POST https://vision.googleapis.com/v1/images:annotate?key=YOUR_API_KEY
{
  "requests": [
    {
      "image": {
        "source": {
          "gcsImageUri": "gs://YOUR_BUCKET_NAME/YOUR_FILE_NAME"
        }
      },
      "features": [
        {
          "type": "DOCUMENT_TEXT_DETECTION"
        }
      ]
    }
  ]
}

リクエスト本文の構成について詳しくは、AnnotateImageRequest のリファレンス ドキュメントをご覧ください。

リクエストが成功すると、サーバーは 200 OK HTTP ステータス コードと JSON 形式のレスポンスを返します。

{
  "responses": [
    {
      "textAnnotations": [
        {
          "locale": "en",
          "description": "O Google Cloud Platform\n",
          "boundingPoly": {
            "vertices": [
              {
                "x": 14, "y": 11
              },
              {
                "x": 279, "y": 11
              },
              {
                "x": 279, "y": 37
              },
              {
                "x": 14, "y": 37
              }
            ]
          }
        },
      ],
      "fullTextAnnotation": {
        "pages": [
          {
            "property": {
              "detectedLanguages": [
                {
                  "languageCode": "en"
                }
              ]
            },
            "width": 281,
            "height": 44,
            "blocks": [
              {
                "property": {
                  "detectedLanguages": [
                    {
                      "languageCode": "en"
                    }
                  ]
                },
                "boundingBox": {
                  "vertices": [
                    {
                      "x": 14, "y": 11
                    },
                    {
                      "x": 279, "y": 11
                    },
                    {
                      "x": 279, "y": 37
                    },
                    {
                      "x": 14, "y": 37
                    }
                  ]
                },
                "paragraphs": [
                  {
                    "property": {
                      "detectedLanguages": [
                        {
                          "languageCode": "en"
                        }
                      ]
                    },
                    "boundingBox": {
                      "vertices": [
                        {
                          "x": 14, "y": 11
                        },
                        {
                          "x": 279, "y": 11
                        },
                        {
                          "x": 279, "y": 37
                        },
                        {
                          "x": 14, "y": 37
                        }
                      ]
                    },
                    "words": [
                      {
                        "property": {
                          "detectedLanguages": [
                            {
                              "languageCode": "en"
                            }
                          ]
                        },
                        "boundingBox": {
                          "vertices": [
                            {
                              "x": 14, "y": 11
                            },
                            {
                              "x": 23, "y": 11
                            },
                            {
                              "x": 23, "y": 37
                            },
                            {
                              "x": 14, "y": 37
                            }
                          ]
                        },
                        "symbols": [
                          {
                            "property": {
                              "detectedLanguages": [
                                {
                                  "languageCode": "en"
                                }
                              ],
                              "detectedBreak": {
                                "type": "SPACE"
                              }
                            },
                            "boundingBox": {
                              "vertices": [
                                {
                                  "x": 14, "y": 11
                                },
                                {
                                  "x": 23, "y": 11
                                },
                                {
                                  "x": 23, "y": 37
                                },
                                {
                                  "x": 14, "y": 37
                                }
                              ]
                            },
                            "text": "O"
                          }
                        ]
                      },
                    ]
                  }
                ],
                "blockType": "TEXT"
              }
            ]
          }
        ],
        "text": "Google Cloud Platform\n"
      }
    }
  ]
}

C#

Vision API クライアントのインストールと作成の詳細については、Vision API クライアント ライブラリをご覧ください。

// Specify a Google Cloud Storage uri for the image
// or a publicly accessible HTTP or HTTPS uri.
var image = Image.FromUri(uri);
var client = ImageAnnotatorClient.Create();
var response = client.DetectDocumentText(image);
foreach (var page in response.Pages)
{
    foreach (var block in page.Blocks)
    {
        foreach (var paragraph in block.Paragraphs)
        {
            Console.WriteLine(string.Join("\n", paragraph.Words));
        }
    }
}

Go

Vision API クライアントのインストールと作成の詳細については、Vision API クライアント ライブラリをご覧ください。

// detectDocumentText gets the full document text from the Vision API for an image at the given file path.
func detectDocumentTextURI(w io.Writer, file string) error {
	ctx := context.Background()

	client, err := vision.NewImageAnnotatorClient(ctx)
	if err != nil {
		return err
	}

	image := vision.NewImageFromURI(file)
	annotation, err := client.DetectDocumentText(ctx, image, nil)
	if err != nil {
		return err
	}

	if annotation == nil {
		fmt.Fprintln(w, "No text found.")
	} else {
		fmt.Fprintln(w, "Document Text:")
		fmt.Fprintf(w, "%q\n", annotation.Text)

		fmt.Fprintln(w, "Pages:")
		for _, page := range annotation.Pages {
			fmt.Fprintf(w, "\tConfidence: %f, Width: %d, Height: %d\n", page.Confidence, page.Width, page.Height)
			fmt.Fprintln(w, "\tBlocks:")
			for _, block := range page.Blocks {
				fmt.Fprintf(w, "\t\tConfidence: %f, Block type: %v\n", block.Confidence, block.BlockType)
				fmt.Fprintln(w, "\t\tParagraphs:")
				for _, paragraph := range block.Paragraphs {
					fmt.Fprintf(w, "\t\t\tConfidence: %f", paragraph.Confidence)
					fmt.Fprintln(w, "\t\t\tWords:")
					for _, word := range paragraph.Words {
						symbols := make([]string, len(word.Symbols))
						for i, s := range word.Symbols {
							symbols[i] = s.Text
						}
						wordText := strings.Join(symbols, "")
						fmt.Fprintf(w, "\t\t\t\tConfidence: %f, Symbols: %s\n", word.Confidence, wordText)
					}
				}
			}
		}
	}

	return nil
}

Java

Vision API クライアントのインストールと作成の詳細については、Vision API クライアント ライブラリをご覧ください。

public static void detectDocumentTextGcs(String gcsPath, PrintStream out) throws Exception,
    IOException {
  List<AnnotateImageRequest> requests = new ArrayList<>();

  ImageSource imgSource = ImageSource.newBuilder().setGcsImageUri(gcsPath).build();
  Image img = Image.newBuilder().setSource(imgSource).build();
  Feature feat = Feature.newBuilder().setType(Type.DOCUMENT_TEXT_DETECTION).build();
  AnnotateImageRequest request =
      AnnotateImageRequest.newBuilder().addFeatures(feat).setImage(img).build();
  requests.add(request);

  try (ImageAnnotatorClient client = ImageAnnotatorClient.create()) {
    BatchAnnotateImagesResponse response = client.batchAnnotateImages(requests);
    List<AnnotateImageResponse> responses = response.getResponsesList();
    client.close();

    for (AnnotateImageResponse res : responses) {
      if (res.hasError()) {
        out.printf("Error: %s\n", res.getError().getMessage());
        return;
      }
      // For full list of available annotations, see http://g.co/cloud/vision/docs
      TextAnnotation annotation = res.getFullTextAnnotation();
      for (Page page: annotation.getPagesList()) {
        String pageText = "";
        for (Block block : page.getBlocksList()) {
          String blockText = "";
          for (Paragraph para : block.getParagraphsList()) {
            String paraText = "";
            for (Word word: para.getWordsList()) {
              String wordText = "";
              for (Symbol symbol: word.getSymbolsList()) {
                wordText = wordText + symbol.getText();
                out.format("Symbol text: %s (confidence: %f)\n", symbol.getText(),
                    symbol.getConfidence());
              }
              out.format("Word text: %s (confidence: %f)\n\n", wordText, word.getConfidence());
              paraText = String.format("%s %s", paraText, wordText);
            }
            // Output Example using Paragraph:
            out.println("\nParagraph: \n" + paraText);
            out.format("Paragraph Confidence: %f\n", para.getConfidence());
            blockText = blockText + paraText;
          }
          pageText = pageText + blockText;
        }
      }
      out.println("\nComplete annotation:");
      out.println(annotation.getText());
    }
  }
}

Node.js

Vision API クライアントのインストールと作成の詳細については、Vision API クライアント ライブラリをご覧ください。

// Imports the Google Cloud client libraries
const vision = require('@google-cloud/vision');

// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const bucketName = 'Bucket where the file resides, e.g. my-bucket';
// const fileName = 'Path to file within bucket, e.g. path/to/image.png';

// Read a remote image as a text document
client
  .documentTextDetection(`gs://${bucketName}/${fileName}`)
  .then(results => {
    const fullTextAnnotation = results[0].fullTextAnnotation;
    console.log(fullTextAnnotation.text);
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

PHP

Vision API クライアントのインストールと作成の詳細については、Vision API クライアント ライブラリをご覧ください。

namespace Google\Cloud\Samples\Vision;

use Google\Cloud\Vision\V1\ImageAnnotatorClient;

// $path = 'gs://path/to/your/image.jpg'

function detect_document_text_gcs($path)
{
    $imageAnnotator = new ImageAnnotatorClient();

    # annotate the image
    $response = $imageAnnotator->documentTextDetection($path);
    $annotation = $response->getFullTextAnnotation();

    # print out detailed and structured information about document text
    if ($annotation) {
        foreach ($annotation->getPages() as $page) {
            foreach ($page->getBlocks() as $block) {
                $block_text = '';
                foreach ($block->getParagraphs() as $paragraph) {
                    foreach ($paragraph->getWords() as $word) {
                        foreach ($word->getSymbols() as $symbol) {
                            $block_text .= $symbol->getText();
                        }
                        $block_text .= ' ';
                    }
                    $block_text .= "\n";
                }
                printf('Block content: %s', $block_text);
                printf('Block confidence: %f' . PHP_EOL,
                    $block->getConfidence());

                # get bounds
                $vertices = $block->getBoundingBox()->getVertices();
                $bounds = [];
                foreach ($vertices as $vertex) {
                    $bounds[] = sprintf('(%d,%d)', $vertex->getX(),
                        $vertex->getY());
                }
                print('Bounds: ' . join(', ',$bounds) . PHP_EOL);

                print(PHP_EOL);
            }
        }
    } else {
        print('No text found' . PHP_EOL);
    }
}

Python

Vision API クライアントのインストールと作成の詳細については、Vision API クライアント ライブラリをご覧ください。

def detect_document_uri(uri):
    """Detects document features in the file located in Google Cloud
    Storage."""
    client = vision.ImageAnnotatorClient()
    image = vision.types.Image()
    image.source.image_uri = uri

    response = client.document_text_detection(image=image)

    for page in response.full_text_annotation.pages:
        for block in page.blocks:
            print('\nBlock confidence: {}\n'.format(block.confidence))

            for paragraph in block.paragraphs:
                print('Paragraph confidence: {}'.format(
                    paragraph.confidence))

                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    print('Word text: {} (confidence: {})'.format(
                        word_text, word.confidence))

                    for symbol in word.symbols:
                        print('\tSymbol: {} (confidence: {})'.format(
                            symbol.text, symbol.confidence))

Ruby

Vision API クライアントのインストールと作成の詳細については、Vision API クライアント ライブラリをご覧ください。

# project_id = "Your Google Cloud project ID"
# image_path = "Google Cloud Storage URI, eg. 'gs://my-bucket/image.png'"

require "google/cloud/vision"

vision = Google::Cloud::Vision.new project: project_id
image  = vision.image image_path

document = image.document

puts document.text

このページは役立ちましたか?評価をお願いいたします。

フィードバックを送信...

Cloud Vision API ドキュメント